Compare commits

...

65 Commits

Author SHA1 Message Date
Sam Herring dff5481e58 Eval splits for holdout sets 2026-03-03 14:42:45 -05:00
Sam Herring fe17b5ff08 Changing return type to be ScoredDataGroup to account for multiple trajectories 2026-03-02 11:35:06 -08:00
Sam Herring 6fdb38ed29 Added task sppecific metris and evals 2026-02-27 11:20:18 -08:00
Sam Herring b7e713b101 Wandb changes 2026-02-26 10:41:24 -08:00
Sam Herring 0e694b954a Updating config 2026-02-24 19:23:05 -08:00
Sam Herring c12e46cf13 Adding config init method 2026-02-24 19:19:39 -08:00
Sam Herring b93ad43191 Updating path vars and dataset loading 2026-02-24 18:58:19 -08:00
Sam Herring f1c2f8a414 Updating to use hermes-agent backend and parse container definition out of provided .sif files 2026-02-24 16:35:18 -08:00
Sam Herring 9139eeaa60 Adding endless terminal environment after rebase: 2026-02-17 16:56:17 -08:00
Sam Herring d0f82e6dcc Removing random project notes doc 2026-02-17 08:02:29 -08:00
teknium1 49e1f9ea89 Refactor TODO.md to summarize future improvements for the Hermes Agent, focusing on subagent architecture, task management, dynamic skills expansion, and interactive clarifying questions. Key ideas include context isolation for subagents, task decomposition, progress tracking, and skill acquisition from successful tasks. 2026-02-17 03:24:38 -08:00
teknium1 6731230d73 Add special handling for 'process' tool in _build_tool_preview function
- Enhanced the _build_tool_preview function to include specific formatting for the 'process' tool, displaying action, session_id, data, and timeout when applicable.
- This update improves the clarity of tool previews, particularly for actions that require session tracking and timeout management.
2026-02-17 03:18:27 -08:00
teknium1 ec59d71e60 Update PTY write handling in ProcessRegistry to ensure data is encoded as bytes before writing. This change improves compatibility with string inputs and clarifies the expected data type in comments. 2026-02-17 03:14:47 -08:00
teknium1 bdac541d1e Rename OPENAI_API_KEY to HERMES_OPENAI_API_KEY in configuration and codebase for clarity and to avoid conflicts. Update related documentation and error messages to reflect the new key name, ensuring backward compatibility with existing setups. 2026-02-17 03:11:17 -08:00
teknium1 061fa70907 Add background process management with process tool, wait, PTY, and stdin support
New process registry and tool for managing long-running background processes
across all terminal backends (local, Docker, Singularity, Modal, SSH).

Process Registry (tools/process_registry.py):
- ProcessSession tracking with rolling 200KB output buffer
- spawn_local() with optional PTY via ptyprocess for interactive CLIs
- spawn_via_env() for non-local backends (runs inside sandbox, never on host)
- Background reader threads per process (Popen stdout or PTY)
- wait() with timeout clamping, interrupt support, and transparent limit reporting
- JSON checkpoint to ~/.hermes/processes.json for gateway crash recovery
- Module-level singleton shared across agent loop, gateway, and RL

Process Tool (model_tools.py):
- 7 actions: list, poll, log, wait, kill, write, submit
- Paired with terminal in all toolsets (CLI, messaging, RL)
- Timeout clamping with transparent notes in response

Terminal Tool Updates (tools/terminal_tool.py):
- Replaced nohup background mode with registry spawn (returns session_id)
- Added workdir parameter for per-command working directory
- Added check_interval parameter for gateway auto-check watchers
- Added pty parameter for interactive CLI tools (Codex, Claude Code)
- Updated TERMINAL_TOOL_DESCRIPTION with full background workflow docs
- Cleanup thread now respects active background processes (won't reap sandbox)

Gateway Integration (gateway/run.py, session.py, config.py):
- Session reset protection: sessions with active processes exempt from reset
- Default idle timeout increased from 2 hours to 24 hours
- from_dict fallback aligned to match (was 120, now 1440)
- session_key env var propagated to process registry for session mapping
- Crash recovery on gateway startup via checkpoint probe
- check_interval watcher: asyncio task polls process, delivers updates to platform

RL Safety (environments/):
- tool_context.py cleanup() kills background processes on episode end
- hermes_base_env.py warns when enabled_toolsets is None (loads all tools)
- Process tool safe in RL via wait() blocking the agent loop

Also:
- Added ptyprocess as optional dependency (in pyproject.toml [pty] extra + [all])
- Fixed pre-existing bug: rl_test_inference missing from TOOL_TO_TOOLSET_MAP
- Updated AGENTS.md with process management docs and project structure
- Updated README.md terminal section with process management overview
2026-02-17 02:51:31 -08:00
teknium1 48b5cfd085 Add skip_context_files option to AIAgent for batch processing
- Introduced a new parameter `skip_context_files` in the AIAgent class to control the inclusion of context files (SOUL.md, AGENTS.md, .cursorrules) in the system prompt.
- Updated the _process_single_prompt function to set `skip_context_files` to True, preventing pollution of trajectories during batch processing and data generation.
2026-02-16 22:40:31 -08:00
teknium1 a7609c97be Update docs to match backend key rename and CWD behavior
- cli-config.yaml.example: env_type → backend everywhere, matching the
  documented config key that hermes_cli/config.py and README already use
- cli-config.yaml.example: added comments clarifying cwd is a path
  INSIDE the target environment for non-local backends
- AGENTS.md: updated terminal.cwd description to explain "." only
  resolves to host CWD for the local backend
- .env.example: updated TERMINAL_CWD comment to warn against using
  host-local paths with remote backends, lists per-backend defaults
2026-02-16 22:31:41 -08:00
teknium1 c33feb6dc9 Fix host CWD leaking into non-local terminal backends
When using Modal, Docker, SSH, or Singularity as the terminal backend
from the CLI, the agent resolved cwd: "." to the host machine's local
path (e.g. /Users/rewbs/code/hermes-agent) and passed it to the remote
sandbox, where it doesn't exist. All commands failed with "No such file
or directory".

Root cause: cli.py unconditionally resolved "." to os.getcwd() and wrote
it to TERMINAL_CWD regardless of backend type. Every tool then used that
host-local path as the working directory inside the remote environment.

Fixes:
- cli.py: only resolve "." to os.getcwd() for the local backend. For all
  remote backends (ssh, docker, modal, singularity), leave TERMINAL_CWD
  unset so the tool layer uses per-backend defaults (/root, /, ~, etc.)
- terminal_tool.py: added sanity check -- if TERMINAL_CWD contains a
  host-local prefix (/Users/, /home/, C:\) for a non-local backend, log
  a warning and fall back to the backend's default
- terminal_tool.py: SSH default CWD is now ~ instead of os.getcwd()
- file_operations.py: last-resort CWD fallback changed from os.getcwd()
  to "/" so host paths never leak into remote file operations
2026-02-16 22:30:04 -08:00
teknium1 2c7deb41f6 Fix Modal backend not working from CLI
Two config systems used different key names for the terminal backend:
- hermes_cli/config.py, README, and all docs use "terminal.backend"
- cli.py's env var mapping only recognized "terminal.env_type"

Users following the docs who set `backend: modal` in ~/.hermes/config.yaml
had it silently ignored -- TERMINAL_ENV always defaulted to "local".

Additionally, when no config file existed, cli.py's hardcoded defaults
overwrote any TERMINAL_ENV=modal set in .env, despite the comment saying
"env vars take precedence."

Fixes:
- cli.py now normalizes "backend" -> "env_type" (backend takes precedence)
- Defaults no longer overwrite .env when no config file terminal section exists
- hermes status reads from config as fallback when env var isn't set

Also fixes four related bugs found in the Modal/sandbox lifecycle:
- file_tools cache not cleared on sandbox cleanup (stale ops on dead sandbox)
- Global lock held during slow Modal teardown (blocked all tool calls 10-15s)
- Race condition in file_tools between existence check and access (KeyError)
- Per-task creation locks never cleaned up (memory leak)
2026-02-16 19:47:23 -08:00
teknium1 8117d0adab Refactor file operations and environment management in file_tools and terminal_tool
- Improved the caching mechanism for ShellFileOperations to ensure stale entries are invalidated when environments are cleaned up.
- Enhanced thread safety by refining the use of locks during environment creation and cleanup processes.
- Streamlined the cleanup of inactive environments to prevent blocking other tool calls, ensuring efficient resource management.
- Added error handling and messaging improvements for better user feedback during environment cleanup.
2026-02-16 19:37:40 -08:00
teknium1 01a3a6ab0d Implement cleanup guard to prevent multiple executions on exit
- Introduced a new cleanup function that ensures terminal and browser sessions are cleaned up only once during application exit.
- Updated atexit registration to use the new cleanup function, enhancing resource management and preventing potential issues from multiple cleanup calls.
- Modified terminal cleanup messaging to only display when environments are cleaned, improving user feedback.
2026-02-16 02:43:45 -08:00
teknium1 45a8098d3a Remove browserbase SDK check and add Node.js and agent-browser validation in doctor script
- Removed the check for the browserbase SDK from the optional packages list.
- Added validation for Node.js installation and the presence of the agent-browser package, providing feedback on their status for browser automation tools.
2026-02-16 02:41:24 -08:00
teknium1 60812ae041 Enhance configuration checks and persona file creation in doctor and install scripts
- Updated the doctor script to load environment variables from user-specific and project-specific `.env` files, improving configuration management.
- Added checks for the existence of the `SOUL.md` persona file, providing feedback on its status and creating it with a template if missing.
- Enhanced install scripts to create the `SOUL.md` file if it doesn't exist, ensuring users can easily customize the agent's personality.
2026-02-16 02:38:19 -08:00
teknium1 635bec06cb Update tool definitions handling in GatewayRunner
- Modified the retrieval of tool definitions to use the agent result's "tools" key, ensuring accurate logging in the transcript.
- Enhanced the response structure to include tools in the final output, improving the clarity of tool usage in session interactions.
2026-02-16 00:55:18 -08:00
teknium1 0f58dfdea4 Enhance agent response handling and transcript logging
- Refactored the agent response processing to return a comprehensive result dictionary, including final responses and full message history.
- Improved transcript logging to capture the complete conversation, including tool calls and intermediate reasoning, facilitating session resumption and debugging.
- Added handling for fresh sessions to include tool definitions in the transcript for clarity.
- Implemented logic to filter and timestamp new messages, ensuring accurate logging of user and assistant interactions.
2026-02-16 00:53:17 -08:00
teknium1 dd5fe334f3 Refactor configuration handling to improve user experience
- Implemented deep copy of DEFAULT_CONFIG to prevent mutations during config loading.
- Enhanced user config merging process to clarify the deep merge of user values over defaults.
- Added newline handling when appending environment variables to ensure proper formatting.
- Updated the set_config_value function to write only user-specific configurations back to the file, avoiding overwriting default values.
2026-02-16 00:33:45 -08:00
teknium1 e0c9d495ef Refine configuration migration process to improve user experience
- Updated prompts for the OPENAI_BASE_URL to clarify its use for custom endpoints.
- Enhanced the migration function to skip "advanced" environment variables during interactive configuration, streamlining the setup for standard users.
- Improved messaging for missing optional API keys, ensuring clearer guidance for users during configuration.
2026-02-15 21:53:59 -08:00
teknium1 2f34e6fd30 Update OpenAI configuration prompts for clarity and detail
- Revised descriptions and prompts for the OPENAI_BASE_URL and OPENAI_API_KEY environment variables to enhance user understanding.
- Added a URL reference for the OPENAI_API_KEY to guide users in obtaining their API key.
- Specified the use of the API key for voice transcription and custom endpoints, improving the overall configuration documentation.
2026-02-15 21:48:07 -08:00
teknium1 69aa35a51c Add messaging platform enhancements: STT, stickers, Discord UX, Slack, pairing, hooks
Major feature additions inspired by OpenClaw/ClawdBot integration analysis:

Voice Message Transcription (STT):
- Auto-transcribe voice/audio messages via OpenAI Whisper API
- Download voice to ~/.hermes/audio_cache/ on Telegram/Discord/WhatsApp
- Inject transcript as text so all models can understand voice input
- Configurable model (whisper-1, gpt-4o-mini-transcribe, gpt-4o-transcribe)

Telegram Sticker Understanding:
- Describe static stickers via vision tool with JSON-backed cache
- Cache keyed by file_unique_id avoids redundant API calls
- Animated/video stickers get emoji-based fallback description

Discord Rich UX:
- Native slash commands (/ask, /reset, /status, /stop) via app_commands
- Button-based exec approvals (Allow Once / Always Allow / Deny)
- ExecApprovalView with user authorization and timeout handling

Slack Integration:
- Full SlackAdapter using slack-bolt with Socket Mode
- DMs, channel messages (mention-gated), /hermes slash command
- File attachment handling with bot-token-authenticated downloads

DM Pairing System:
- Code-based user authorization as alternative to static allowlists
- 8-char codes from unambiguous alphabet, 1-hour expiry
- Rate limiting, lockout after failed attempts, chmod 0600 on data
- CLI: hermes pairing list/approve/revoke/clear-pending

Event Hook System:
- File-based hook discovery from ~/.hermes/hooks/
- HOOK.yaml + handler.py per hook, sync/async handler support
- Events: gateway:startup, session:start/reset, agent:start/step/end
- Wildcard matching (command:* catches all command events)

Cross-Channel Messaging:
- send_message agent tool for delivering to any connected platform
- Enables cron job delivery and cross-platform notifications

Human-Like Response Pacing:
- Configurable delays between message chunks (off/natural/custom)
- HERMES_HUMAN_DELAY_MODE env var with min/max ms settings

Warm Injection Message Style:
- Retrofitted image vision messages with friendly kawaii-consistent tone
- All new injection messages (STT, stickers, errors) use warm style

Also: updated config migration to prompt for optional keys interactively,
bumped config version, updated README, AGENTS.md, .env.example,
cli-config.yaml.example, install scripts, pyproject.toml, and toolsets.
2026-02-15 21:38:59 -08:00
teknium1 5404a8fcd8 Enhance image handling and analysis capabilities across platforms
- Updated the vision tool to accept both HTTP/HTTPS URLs and local file paths for image analysis.
- Implemented caching of user-uploaded images in local directories to ensure reliable access for the vision tool, addressing issues with ephemeral URLs.
- Enhanced platform adapters (Discord, Telegram, WhatsApp) to download and cache images, allowing for immediate analysis and enriched message context.
- Added a new method to auto-analyze images attached by users, enriching the conversation with detailed descriptions.
- Improved documentation for image handling processes and updated related functions for clarity and efficiency.
2026-02-15 16:10:50 -08:00
teknium1 eb49936a60 Update documentation and installation scripts for TTS audio formats
- Clarified the requirements for Telegram voice bubbles, specifying the need for ffmpeg when using Edge TTS.
- Enhanced README and messaging documentation to detail audio delivery formats across platforms.
- Improved installation script messages to inform users about the necessity of ffmpeg for proper audio playback on Telegram.
2026-02-14 16:16:54 -08:00
teknium1 ff9ea6c4b1 Enhance TTS tool to support platform-specific audio formats
- Added detection of the platform from the environment variable to determine the appropriate audio output format.
- Implemented logic to output Opus (.ogg) files for Telegram when using compatible TTS providers, while defaulting to MP3 for others.
2026-02-14 16:13:26 -08:00
teknium1 586b0a7047 Add Text-to-Speech (TTS) support with Edge TTS and ElevenLabs integration
- Updated `pyproject.toml` to include Edge TTS and ElevenLabs as dependencies.
- Enhanced documentation to detail voice message capabilities across platforms and TTS provider options.
- Modified the GatewayRunner to handle MEDIA tags from TTS tool responses, ensuring proper delivery of audio messages.
2026-02-14 16:08:14 -08:00
teknium1 84718d183a Add platform-specific formatting hints and identity for AIAgent
- Introduced a default agent identity prompt to ensure consistent behavior across platforms.
- Added platform-specific formatting hints for CLI, WhatsApp, Telegram, and Discord to guide the agent's output style.
- Updated the AIAgent initialization to accept a platform parameter, enhancing adaptability to different interfaces.
2026-02-12 16:11:16 -08:00
teknium1 3099a2f53c Add timestamp to active system prompt in AIAgent
- Appended the current local date and time to the active system prompt to provide context for the model, addressing potential misinterpretations due to training cutoffs.
2026-02-12 15:59:31 -08:00
teknium1 ed010752dd Update .env.example to use new Docker, Singularity, and Modal images for Python 3.11 with Node.js 20 support 2026-02-12 10:07:03 -08:00
teknium1 f5be6177b2 Add Text-to-Speech (TTS) functionality with multiple providers
Add tool previews

Add AGENTS and SOUL.md support

Add Exec Approval
2026-02-12 10:05:08 -08:00
teknium 89c6f24d48 Merge branch 'main' of github.com:nousresearch/hermes-agent 2026-02-12 05:38:15 +00:00
teknium f23856df8e Add kill_modal script to manage Modal applications and better handling of file and terminal tools
- Introduced a new script, `kill_modal.sh`, to facilitate stopping running Modal apps, including the ability to stop all apps or specific swe-rex sandboxes.
- Enhanced user experience with clear usage instructions and feedback during the stopping process.
- Improved error handling to ensure smooth execution even if some apps fail to stop.
2026-02-12 05:37:14 +00:00
teknium 1b7bc299f3 Enhance TerminalBench2 environment with task filtering due to incompat with modal and logging improvements
- Updated task filter descriptions for clarity and added a new skip task feature to exclude incompatible tasks.
- Introduced a set of modal incompatible tasks to prevent execution errors in cloud environments.
- Implemented streaming JSONL logging for task results, preserving data even on interruptions.
- Refactored task evaluation logic to include skipped task reporting and improved error handling.
2026-02-12 05:36:45 +00:00
teknium a291cc99cf more extra kwarg support for provider selection etc on openrouter in agent rl envs and evals 2026-02-12 05:36:25 +00:00
teknium 389ac5e017 pass extrabody for agentloop to ban and allowlist providers on openrouter, control thinking, etc 2026-02-12 05:35:48 +00:00
nightwing fc792a4be9 Update Project_notes.md: grailed-embedding-search status and TODOs (June 2025) 2026-02-11 17:54:47 -07:00
nightwing 07501bef14 Add Project_notes.md — centralized status tracker for all side projects 2026-02-11 17:36:18 -07:00
teknium1 137ce05324 Add image generation tool to toolsets for messaging platforms
- Included "image_generate" in the toolsets for web, vision, and skills categories, expanding functionality for image-related tasks.
- Updated comments for clarity on the new tool's purpose, ensuring users understand its integration within the existing framework.
2026-02-10 21:04:24 -08:00
teknium1 ada0b4f131 Enhance image handling in platform adapters
- Updated the image generation function description to clarify usage with markdown.
- Added `send_image` method to `BasePlatformAdapter` for native image sending across platforms.
- Implemented `send_image` in `DiscordAdapter` and `TelegramAdapter` to handle image attachments directly.
- Introduced `extract_images` method to extract image URLs from markdown and HTML, improving content processing.
- Enhanced message handling to support sending images as attachments while maintaining text content.
2026-02-10 21:02:40 -08:00
teknium abe925e212 Update hermes-discord toolset to enable full terminal access with safety checks
- Revised the description to reflect full access capabilities, including terminal usage with a dangerous command approval system.
- Added terminal and file manipulation tools to the toolset, enhancing functionality for users.
- Updated comments for clarity on tool purposes, ensuring better understanding of available features.
2026-02-11 04:44:30 +00:00
teknium1 8fb44608bf Update SKILL.md and related references to implement container binding for labeled shapes and arrows in Excalidraw
- Revised the labeled shape and arrow sections to utilize container binding instead of the deprecated "label" property, ensuring proper text rendering.
- Added warnings about the invalidity of the "label" property and emphasized the use of `boundElements` for text elements.
- Updated examples in dark-mode and general references to reflect the new binding approach, enhancing clarity and usability for users creating diagrams.
2026-02-10 20:05:23 -08:00
teknium1 153cd5bb44 Refactor skills tool integration and enhance system prompt
- Removed the skills_categories tool from the skills toolset, streamlining the skills functionality to focus on skills_list and skill_view.
- Updated the system prompt to dynamically build a compact skills index, allowing the model to quickly reference available skills without additional tool calls.
- Cleaned up related code and documentation to reflect the removal of skills_categories, ensuring clarity and consistency across the codebase.
2026-02-10 19:48:38 -08:00
teknium1 669545f551 Add diagramming skills for Excalidraw
- Introduced a new DESCRIPTION.md file outlining diagram creation skills for visual diagrams and flowcharts using Excalidraw.
- Added SKILL.md for the Excalidraw skill, detailing its functionality, usage, and workflow for creating hand-drawn style diagrams.
- Created references for color palettes, dark mode diagrams, and example diagrams to assist users in utilizing the Excalidraw skill effectively.
- Implemented an upload script for sharing diagrams via Excalidraw.com, ensuring user-friendly access to generated diagrams.
2026-02-10 19:30:46 -08:00
teknium1 cfe2f3fe15 Implement interrupt handling for long-running tool executions in AIAgent
- Added functionality to signal and terminate long-running terminal commands when a new user message is received, allowing for immediate agent response.
- Introduced a global interrupt event in the terminal tool to facilitate early termination of subprocesses.
- Updated the AIAgent class to handle interrupts gracefully, ensuring that remaining tool calls are skipped and appropriate messages are returned to maintain valid message sequences.
2026-02-10 16:34:27 -08:00
teknium1 140d609e0c Refine agent history conversion logic in GatewayRunner
- Enhanced the conversion of message history to agent format by distinguishing between normal and rich agent messages.
- Implemented logic to preserve full message structure for tool-related messages, ensuring valid assistant-to-tool sequences.
- Simplified handling of simple text messages by stripping unnecessary fields while retaining essential role and content information.
2026-02-10 16:16:30 -08:00
teknium a32ad1a656 Fix infinite interrupt loop in gateway by consuming pending messages with .pop() and clearing interrupt events before recursion
- Added logic to clear the adapter's interrupt event to prevent infinite loops during message processing.
- Updated the get_pending_message method to pop messages from the pending queue, ensuring proper message handling.
2026-02-11 00:05:30 +00:00
teknium1 62ba69a29d Fix gateway exit code to enable systemd auto-restart on connection failure
- Updated the start_gateway function to return a boolean indicating success or failure, allowing for better control over exit codes.
- Modified the main function to handle gateway startup failures, ensuring systemd can automatically restart on transient errors.
- Enhanced error handling in the hermes_cli gateway to exit with code 1 if the gateway fails to connect to any platform.
2026-02-10 16:01:00 -08:00
teknium1 9b0f2a16ca Enhance CLI functionality with retry and undo commands
- Added /retry command to resend the last user message, improving user experience by allowing message re-sending without retyping.
- Introduced /undo command to remove the last user/assistant exchange from conversation history, providing better control over conversation flow.
- Updated save_config_value function to respect user and project config precedence, enhancing configuration management.
- Improved prompt handling and visual output for user input, adapting to terminal width for better readability.
2026-02-10 15:59:46 -08:00
teknium 85e629e915 Add cleanup functionality for orphaned sandboxes in TerminalBench2EvalEnv
- Implemented a cleanup process to terminate any remaining sandboxes after evaluation, addressing issues with orphaned thread pool workers.
- Enhanced logging to inform users about the cleanup process, ensuring better resource management and user awareness.
2026-02-10 23:48:49 +00:00
teknium 999a28062d Implement graceful exit cleanup for terminal tool
- Added a new `_atexit_cleanup` function to handle cleanup of active environments and stop the cleanup thread upon program exit.
- Enhanced logging to inform users about the number of remaining sandboxes being shut down during cleanup.
2026-02-10 22:53:44 +00:00
teknium ba3fea24f1 Enhance TerminalBench 2 configuration and evaluation handling
- Added task_timeout parameter to enforce a maximum wall-clock time for each task, automatically scoring as FAIL if exceeded.
- Introduced terminal_timeout and tool_pool_size parameters to improve command execution and concurrency management.
- Updated logging to provide detailed task execution times and timeout handling, enhancing overall monitoring.
- Removed outdated evaluate_config.yaml file to streamline configuration management.
2026-02-10 22:53:24 +00:00
teknium 6b4a8d0b17 Add terminal configuration options and enhance environment setup
- Introduced terminal_timeout and terminal_lifetime parameters to control command execution and sandbox inactivity.
- Updated environment variable handling to allow configuration overrides for terminal settings.
- Enhanced logging to provide detailed information about terminal settings during initialization.
- Added tool_pool_size parameter to dynamically resize the thread pool for tool execution, improving concurrency management.
2026-02-10 22:51:50 +00:00
teknium 5ec75e38b9 Enhance tool execution and logging in HermesAgentLoop
- Increased thread pool size for tool execution from 8 to 128 to improve concurrency and prevent starvation.
- Added a function to resize the tool executor dynamically based on configuration.
- Enhanced logging to track API call durations and tool execution times, including warnings for slow tools.
- Improved overall performance monitoring by logging detailed information for each turn in the agent loop.
2026-02-10 22:51:18 +00:00
teknium ad042fdd68 Update terminalbench_2 configuration for enhanced performance and evaluation
- Increased max_token_length from 16000 to 32000 to allow for longer inputs.
- Adjusted agent_temperature from 0.6 to 0.8 for more varied responses.
- Extended test_timeout from 180 to 600 seconds to accommodate longer evaluations.
- Updated data directory path for saving evaluations to ensure proper organization.
2026-02-10 19:48:41 +00:00
teknium 35ad3146a8 Add new environments and enhance tool context functionality
- Introduced new environments: Terminal Test Environment and SWE Environment, each with default configurations for testing and software engineering tasks.
- Added TerminalBench 2.0 evaluation environment with comprehensive setup for agentic LLMs, including task execution and verification.
- Enhanced ToolContext with methods for uploading and downloading files, ensuring binary-safe operations.
- Updated documentation across environments to reflect new features and usage instructions.
- Refactored existing environment configurations for consistency and clarity.
2026-02-10 19:39:05 +00:00
teknium e8343f2d87 Refactor Singularity environment for persistent container management
- Updated the _SingularityEnvironment class to utilize a persistent Apptainer instance, allowing state (files, installs, environment changes) to persist across commands.
- Enhanced the initialization process to start a background instance with full isolation and writable filesystem.
- Modified the execute method to connect to the running instance, ensuring commands run within the same container context.
- Implemented cleanup functionality to stop the persistent instance on cleanup or destruction, improving resource management.
- Updated class documentation to reflect new features and usage of the persistent environment.
2026-02-10 06:49:58 +00:00
teknium 1b1307d0d1 Implement Anthropic prompt caching for Claude models via OpenRouter
- Introduced a caching strategy that reduces input token costs by ~75% on multi-turn conversations by caching the conversation prefix.
- Added functions to apply cache control markers to messages, enhancing efficiency in token usage.
- Updated AIAgent to auto-enable prompt caching for Claude models, with configurable cache TTL.
- Enhanced logging to track cache hit statistics when caching is active, improving monitoring of token usage.
2026-02-10 06:49:41 +00:00
teknium 7a11be9f3f Enhance browser tool functionality and cleanup process
- Added checks for local installation of the agent-browser CLI in the `_find_agent_browser` function, improving installation guidance.
- Implemented per-task socket directory management in `_run_browser_command` to prevent concurrency issues.
- Updated `cleanup_browser` to remove per-task socket directories, ensuring proper resource cleanup after task completion.
- Refactored comments for clarity and improved documentation throughout the browser tool code.
2026-02-09 04:36:37 +00:00
69 changed files with 9574 additions and 1190 deletions
+39 -5
View File
@@ -42,13 +42,16 @@ TERMINAL_ENV=local
# Container images (for singularity/docker/modal backends)
TERMINAL_DOCKER_IMAGE=python:3.11
TERMINAL_SINGULARITY_IMAGE=docker://python:3.11
TERMINAL_MODAL_IMAGE=python:3.11
TERMINAL_DOCKER_IMAGE=nikolaik/python-nodejs:python3.11-nodejs20
TERMINAL_SINGULARITY_IMAGE=docker://nikolaik/python-nodejs:python3.11-nodejs20
TERMINAL_MODAL_IMAGE=nikolaik/python-nodejs:python3.11-nodejs20
# Working directory for terminal commands
# For CLI: "." means current directory (resolved automatically from config.yaml)
# For containers (docker/singularity/modal): absolute path inside the container
# For local backend: "." means current directory (resolved automatically)
# For remote backends (ssh/docker/modal/singularity): use an absolute path
# INSIDE the target environment, or leave unset for the backend's default
# (/root for modal, / for docker, ~ for ssh). Do NOT use a host-local path.
# Usually managed by config.yaml (terminal.cwd) — uncomment to override
# TERMINAL_CWD=.
@@ -139,6 +142,37 @@ BROWSER_INACTIVITY_TIMEOUT=120
# Format: logs/session_YYYYMMDD_HHMMSS_UUID.json
# Contains full conversation history in trajectory format for debugging/replay
# =============================================================================
# VOICE TRANSCRIPTION & OPENAI TTS
# =============================================================================
# Required for voice message transcription (Whisper) and OpenAI TTS voices.
# Uses OpenAI's API directly (not via OpenRouter).
# Named HERMES_OPENAI_API_KEY to avoid interference with OpenRouter.
# Get at: https://platform.openai.com/api-keys
HERMES_OPENAI_API_KEY=
# =============================================================================
# SLACK INTEGRATION
# =============================================================================
# Slack Bot Token - From Slack App settings (OAuth & Permissions)
# Get at: https://api.slack.com/apps
# SLACK_BOT_TOKEN=xoxb-...
# Slack App Token - For Socket Mode (App-Level Tokens in Slack App settings)
# SLACK_APP_TOKEN=xapp-...
# Slack allowed users (comma-separated Slack user IDs)
# SLACK_ALLOWED_USERS=
# =============================================================================
# RESPONSE PACING
# =============================================================================
# Human-like delays between message chunks on messaging platforms.
# Makes the bot feel less robotic.
# HERMES_HUMAN_DELAY_MODE=off # off | natural | custom
# HERMES_HUMAN_DELAY_MIN_MS=800 # Min delay in ms (custom mode)
# HERMES_HUMAN_DELAY_MAX_MS=2500 # Max delay in ms (custom mode)
# =============================================================================
# LEGACY/OPTIONAL API KEYS
# =============================================================================
+77 -1
View File
@@ -25,7 +25,14 @@ hermes-agent/
│ ├── uninstall.py # Uninstaller
│ └── cron.py # Cron job management
├── tools/ # Tool implementations
│ ├── process_registry.py # Background process management (spawn, poll, wait, kill)
│ ├── transcription_tools.py # Speech-to-text (Whisper API)
├── gateway/ # Messaging platform adapters
│ ├── pairing.py # DM pairing code system
│ ├── hooks.py # Event hook system
│ ├── sticker_cache.py # Telegram sticker vision cache
│ ├── platforms/
│ │ └── slack.py # Slack adapter (slack-bolt)
├── cron/ # Scheduler implementation
├── skills/ # Knowledge documents
├── cli.py # Interactive CLI (Rich UI)
@@ -39,6 +46,11 @@ hermes-agent/
**User Configuration** (stored in `~/.hermes/`):
- `~/.hermes/config.yaml` - Settings (model, terminal, toolsets, etc.)
- `~/.hermes/.env` - API keys and secrets
- `~/.hermes/pairing/` - DM pairing data
- `~/.hermes/hooks/` - Custom event hooks
- `~/.hermes/image_cache/` - Cached user images
- `~/.hermes/audio_cache/` - Cached user voice messages
- `~/.hermes/sticker_cache.json` - Telegram sticker descriptions
## File Dependency Chain
@@ -179,6 +191,7 @@ The unified `hermes` command provides all functionality:
| `hermes gateway` | Start messaging gateway |
| `hermes cron list` | View scheduled jobs |
| `hermes version` | Show version info |
| `hermes pairing list/approve/revoke` | Manage DM pairing codes |
---
@@ -225,6 +238,33 @@ Users can find their IDs:
- **Telegram**: Message [@userinfobot](https://t.me/userinfobot)
- **Discord**: Enable Developer Mode, right-click name → Copy ID
### DM Pairing System
Instead of static allowlists, users can pair via one-time codes:
1. Unknown user DMs the bot → receives pairing code
2. Owner runs `hermes pairing approve <platform> <code>`
3. User is permanently authorized
Security: 8-char codes, 1-hour expiry, rate-limited (1/10min/user), max 3 pending per platform, lockout after 5 failed attempts, `chmod 0600` on data files.
Files: `gateway/pairing.py`, `hermes_cli/pairing.py`
### Event Hooks
Hooks fire at lifecycle points. Place hook directories in `~/.hermes/hooks/`:
```
~/.hermes/hooks/my-hook/
├── HOOK.yaml # name, description, events list
└── handler.py # async def handle(event_type, context): ...
```
Events: `gateway:startup`, `session:start`, `session:reset`, `agent:start`, `agent:step`, `agent:end`, `command:*`
The `agent:step` event fires each iteration of the tool-calling loop with tool names and results.
Files: `gateway/hooks.py`
### Tool Progress Notifications
When `HERMES_TOOL_PROGRESS=true`, the bot sends status messages as it works:
@@ -325,7 +365,7 @@ API keys are loaded from `~/.hermes/.env`:
Terminal tool configuration (in `~/.hermes/config.yaml`):
- `terminal.backend` - Backend: local, docker, singularity, modal, or ssh
- `terminal.cwd` - Working directory for CLI ("." = current directory)
- `terminal.cwd` - Working directory ("." = host CWD for local only; for remote backends set an absolute path inside the target, or omit to use the backend's default)
- `terminal.docker_image` - Image for Docker backend
- `terminal.singularity_image` - Image for Singularity backend
- `terminal.modal_image` - Image for Modal backend
@@ -336,6 +376,11 @@ Agent behavior (in `~/.hermes/.env`):
- `MESSAGING_CWD` - Working directory for messaging platforms (default: ~)
- `HERMES_TOOL_PROGRESS` - Enable tool progress messages (`true`/`false`)
- `HERMES_TOOL_PROGRESS_MODE` - Progress mode: `new` (tool changes) or `all`
- `OPENAI_API_KEY` - Voice transcription (Whisper STT)
- `SLACK_BOT_TOKEN` / `SLACK_APP_TOKEN` - Slack integration (Socket Mode)
- `SLACK_ALLOWED_USERS` - Comma-separated Slack user IDs
- `HERMES_HUMAN_DELAY_MODE` - Response pacing: off/natural/custom
- `HERMES_HUMAN_DELAY_MIN_MS` / `HERMES_HUMAN_DELAY_MAX_MS` - Custom delay range
### Dangerous Command Approval
@@ -368,6 +413,37 @@ The terminal tool includes safety checks for potentially destructive commands (e
---
## Background Process Management
The `process` tool works alongside `terminal` for managing long-running background processes:
**Starting a background process:**
```python
terminal(command="pytest -v tests/", background=true)
# Returns: {"session_id": "proc_abc123", "pid": 12345, ...}
```
**Managing it with the process tool:**
- `process(action="list")` -- show all running/recent processes
- `process(action="poll", session_id="proc_abc123")` -- check status + new output
- `process(action="log", session_id="proc_abc123")` -- full output with pagination
- `process(action="wait", session_id="proc_abc123", timeout=600)` -- block until done
- `process(action="kill", session_id="proc_abc123")` -- terminate
- `process(action="write", session_id="proc_abc123", data="y")` -- send stdin
- `process(action="submit", session_id="proc_abc123", data="yes")` -- send + Enter
**Key behaviors:**
- Background processes execute through the configured terminal backend (local/Docker/Modal/SSH/Singularity) -- never directly on the host unless `TERMINAL_ENV=local`
- The `wait` action blocks the tool call until the process finishes, times out, or is interrupted by a new user message
- PTY mode (`pty=true` on terminal) enables interactive CLI tools (Codex, Claude Code)
- In RL training, background processes are auto-killed when the episode ends (`tool_context.cleanup()`)
- In the gateway, sessions with active background processes are exempt from idle reset
- The process registry checkpoints to `~/.hermes/processes.json` for crash recovery
Files: `tools/process_registry.py` (registry), `model_tools.py` (tool definition + handler), `tools/terminal_tool.py` (spawn integration)
---
## Adding New Tools
Follow this strict order to maintain consistency:
+132 -4
View File
@@ -37,8 +37,9 @@ All your settings are stored in `~/.hermes/` for easy access:
```
~/.hermes/
├── config.yaml # Settings (model, terminal, compression, etc.)
├── config.yaml # Settings (model, terminal, TTS, compression, etc.)
├── .env # API keys and secrets
├── SOUL.md # Optional: global persona (agent embodies this personality)
├── cron/ # Scheduled jobs
├── sessions/ # Gateway sessions
└── logs/ # Logs
@@ -76,7 +77,11 @@ You need at least one LLM provider:
| Web scraping | [Firecrawl](https://firecrawl.dev/) | `FIRECRAWL_API_KEY` |
| Browser automation | [Browserbase](https://browserbase.com/) | `BROWSERBASE_API_KEY`, `BROWSERBASE_PROJECT_ID` |
| Image generation | [FAL](https://fal.ai/) | `FAL_KEY` |
| Premium TTS voices | [ElevenLabs](https://elevenlabs.io/) | `ELEVENLABS_API_KEY` |
| OpenAI TTS voices | [OpenAI](https://platform.openai.com/api-keys) | `OPENAI_API_KEY` |
| RL Training | [Tinker](https://tinker-console.thinkingmachines.ai/) + [WandB](https://wandb.ai/) | `TINKER_API_KEY`, `WANDB_API_KEY` |
| Voice transcription | [OpenAI](https://platform.openai.com/api-keys) | `OPENAI_API_KEY` |
| Slack integration | [Slack](https://api.slack.com/apps) | `SLACK_BOT_TOKEN`, `SLACK_APP_TOKEN` |
| Messaging | Telegram, Discord | `TELEGRAM_BOT_TOKEN`, `DISCORD_BOT_TOKEN` |
---
@@ -96,6 +101,7 @@ hermes update # Update to latest version (prompts for new config)
hermes uninstall # Uninstall (can keep configs for later reinstall)
hermes gateway # Start messaging gateway
hermes cron list # View scheduled jobs
hermes pairing list # View/manage DM pairing codes
hermes version # Show version info
```
@@ -128,11 +134,98 @@ hermes --toolsets "web,terminal"
hermes --list-tools
```
**Available toolsets:** `web`, `terminal`, `browser`, `vision`, `creative`, `reasoning`, `skills`, `cronjob`, and more.
**Available toolsets:** `web`, `terminal`, `browser`, `vision`, `creative`, `reasoning`, `skills`, `tts`, `cronjob`, and more.
### 🖥️ Terminal Backend
### 🔊 Text-to-Speech
The terminal tool can execute commands in different environments:
Convert text to speech with three providers:
| Provider | Quality | Cost | API Key |
|----------|---------|------|---------|
| **Edge TTS** (default) | Good | Free | None needed |
| **ElevenLabs** | Excellent | Paid | `ELEVENLABS_API_KEY` |
| **OpenAI TTS** | Good | Paid | `OPENAI_API_KEY` |
On Telegram, audio plays as native voice bubbles (the round, inline-playable kind). On Discord/WhatsApp, sent as audio file attachments. In CLI mode, saved to `~/voice-memos/`.
**Configure in `~/.hermes/config.yaml`:**
```yaml
tts:
provider: "edge" # "edge" | "elevenlabs" | "openai"
edge:
voice: "en-US-AriaNeural" # 322 voices, 74 languages
elevenlabs:
voice_id: "pNInz6obpgDQGcFmaJgB" # Adam
model_id: "eleven_multilingual_v2"
openai:
model: "gpt-4o-mini-tts"
voice: "alloy" # alloy, echo, fable, onyx, nova, shimmer
```
**Telegram voice bubbles & ffmpeg:**
Telegram voice bubbles require Opus/OGG audio format. OpenAI and ElevenLabs produce Opus natively — no extra dependencies needed. Edge TTS (the default free provider) outputs MP3 and needs **ffmpeg** to convert to Opus:
```bash
# Ubuntu/Debian
sudo apt install ffmpeg
# macOS
brew install ffmpeg
# Fedora
sudo dnf install ffmpeg
```
Without ffmpeg, Edge TTS audio is sent as a regular audio file (playable, but shows as a rectangular player instead of a voice bubble). If you want voice bubbles without installing ffmpeg, switch to the OpenAI or ElevenLabs provider.
### 🎙️ Voice Message Transcription
Voice messages sent on Telegram, Discord, WhatsApp, or Slack are automatically transcribed using OpenAI's Whisper API and injected as text into the conversation. The agent sees the transcript as normal text -- no special handling needed.
| Provider | Model | Quality | Cost |
|----------|-------|---------|------|
| **OpenAI Whisper** | `whisper-1` (default) | Good | Low |
| **OpenAI GPT-4o** | `gpt-4o-mini-transcribe` | Better | Medium |
| **OpenAI GPT-4o** | `gpt-4o-transcribe` | Best | Higher |
Requires `OPENAI_API_KEY` in `~/.hermes/.env`. Configure the model in `~/.hermes/config.yaml`:
```yaml
stt:
enabled: true
model: "whisper-1"
```
### 📄 Context Files (SOUL.md, AGENTS.md, .cursorrules)
Drop these files in your project directory and the agent automatically picks them up:
| File | Purpose |
|------|---------|
| `AGENTS.md` | Project-specific instructions, coding conventions, tool usage guidelines |
| `SOUL.md` | Persona definition -- the agent embodies this personality and tone |
| `.cursorrules` | Cursor IDE rules (also detected) |
| `.cursor/rules/*.mdc` | Cursor rule files (also detected) |
- **AGENTS.md** is hierarchical: if subdirectories also have `AGENTS.md`, all are combined (like Codex/Cline).
- **SOUL.md** checks cwd first, then `~/.hermes/SOUL.md` as a global fallback.
- All context files are capped at 20,000 characters with smart truncation.
### 🛡️ Exec Approval (Messaging Platforms)
When the agent tries to run a potentially dangerous command (rm -rf, chmod 777, etc.) on Telegram/Discord/WhatsApp, instead of blocking it silently, it asks the user for approval:
> ⚠️ This command is potentially dangerous (recursive delete). Reply "yes" to approve.
Reply "yes"/"y" to approve or "no"/"n" to deny. In CLI mode, the existing interactive approval prompt (once/session/always/deny) is preserved.
### 🖥️ Terminal & Process Management
The terminal tool can execute commands in different environments, with full background process management via the `process` tool:
**Background processes:** Start with `terminal(command="...", background=true)`, then use `process(action="poll/wait/log/kill/write")` to monitor, wait for completion, read output, terminate, or send input. The `wait` action blocks until the process finishes -- no polling loops needed. PTY mode (`pty=true`) enables interactive CLI tools like Codex and Claude Code.
**Execution environments:**
| Backend | Description | Use Case |
|---------|-------------|----------|
@@ -224,6 +317,40 @@ DISCORD_BOT_TOKEN=MTIz...
DISCORD_ALLOWED_USERS=YOUR_USER_ID
```
#### Slack Setup
1. **Create an app:** Go to [Slack API](https://api.slack.com/apps), create a new app
2. **Enable Socket Mode:** In app settings → Socket Mode → Enable
3. **Get tokens:**
- Bot Token (`xoxb-...`): OAuth & Permissions → Install to Workspace
- App Token (`xapp-...`): Basic Information → App-Level Tokens → Generate
4. **Configure:**
```bash
# Add to ~/.hermes/.env:
SLACK_BOT_TOKEN=xoxb-...
SLACK_APP_TOKEN=xapp-...
SLACK_ALLOWED_USERS=U01234ABCDE # Comma-separated Slack user IDs
```
5. **Start the gateway:** `hermes gateway`
#### DM Pairing (Alternative to Allowlists)
Instead of manually configuring user IDs in allowlists, you can use the pairing system. When an unknown user DMs your bot, they receive a one-time pairing code:
```bash
# The user sees: "Pairing code: XKGH5N7P"
# You approve them with:
hermes pairing approve telegram XKGH5N7P
# Other pairing commands:
hermes pairing list # View pending + approved users
hermes pairing revoke telegram 123456789 # Remove access
```
Pairing codes expire after 1 hour, are rate-limited, and use cryptographic randomness.
#### Security (Important!)
**Without an allowlist, anyone who finds your bot can use it!**
@@ -243,6 +370,7 @@ DISCORD_ALLOWED_USERS=123456789012345678
|---------|-------------|
| `/new` or `/reset` | Start fresh conversation |
| `/status` | Show session info |
| `/hermes` (Discord) | Slash command — ask, reset, status, stop |
#### Working Directory
+29 -555
View File
@@ -1,589 +1,63 @@
# Hermes Agent - Future Improvements
> Ideas for enhancing the agent's capabilities, generated from self-analysis of the codebase.
---
## 1. Subagent Architecture (Context Isolation) 🎯
**Problem:** Long-running tools (terminal commands, browser automation, complex file operations) consume massive context. A single `ls -la` can add hundreds of lines. Browser snapshots, debugging sessions, and iterative terminal work quickly bloat the main conversation, leaving less room for actual reasoning.
**Solution:** The main agent becomes an **orchestrator** that delegates context-heavy tasks to **subagents**.
**Architecture:**
```
┌─────────────────────────────────────────────────────────────────┐
│ ORCHESTRATOR (main agent) │
│ - Receives user request │
│ - Plans approach │
│ - Delegates heavy tasks to subagents │
│ - Receives summarized results │
│ - Maintains clean, focused context │
└─────────────────────────────────────────────────────────────────┘
│ │ │
▼ ▼ ▼
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ TERMINAL AGENT │ │ BROWSER AGENT │ │ CODE AGENT │
│ - terminal tool │ │ - browser tools │ │ - file tools │
│ - file tools │ │ - web_search │ │ - terminal │
│ │ │ - web_extract │ │ │
│ Isolated context│ │ Isolated context│ │ Isolated context│
│ Returns summary │ │ Returns summary │ │ Returns summary │
└─────────────────┘ └─────────────────┘ └─────────────────┘
```
**How it works:**
1. User asks: "Set up a new Python project with FastAPI and tests"
2. Orchestrator plans: "I need to create files, install deps, write code"
3. Orchestrator calls: `terminal_task(goal="Create venv, install fastapi pytest", context="New project in ~/myapp")`
4. **Subagent spawns** with fresh context, only terminal/file tools
5. Subagent iterates (may take 10+ tool calls, lots of output)
6. Subagent completes → returns summary: "Created venv, installed fastapi==0.109.0, pytest==8.0.0"
7. Orchestrator receives **only the summary**, context stays clean
8. Orchestrator continues with next subtask
**Key tools to implement:**
- [ ] `terminal_task(goal, context, cwd?)` - Delegate terminal/shell work
- [ ] `browser_task(goal, context, start_url?)` - Delegate web research/automation
- [ ] `code_task(goal, context, files?)` - Delegate code writing/modification
- [ ] Generic `delegate_task(goal, context, toolsets=[])` - Flexible delegation
**Implementation details:**
- [ ] Subagent uses same `run_agent.py` but with:
- Fresh/empty conversation history
- Limited toolset (only what's needed)
- Smaller max_iterations (focused task)
- Task-specific system prompt
- [ ] Subagent returns structured result:
```python
{
"success": True,
"summary": "Installed 3 packages, created 2 files",
"details": "Optional longer explanation if needed",
"artifacts": ["~/myapp/requirements.txt", "~/myapp/main.py"], # Files created
"errors": [] # Any issues encountered
}
```
- [ ] Orchestrator sees only the summary in its context
- [ ] Full subagent transcript saved separately for debugging
**Benefits:**
- 🧹 **Clean context** - Orchestrator stays focused, doesn't drown in tool output
- 📊 **Better token efficiency** - 50 terminal outputs → 1 summary paragraph
- 🎯 **Focused subagents** - Each agent has just the tools it needs
- 🔄 **Parallel potential** - Independent subtasks could run concurrently
- 🐛 **Easier debugging** - Each subtask has its own isolated transcript
**When to use subagents vs direct tools:**
- **Subagent**: Multi-step tasks, iteration likely, lots of output expected
- **Direct**: Quick one-off commands, simple file reads, user needs to see output
**Files to modify:** `run_agent.py` (add orchestration mode), new `tools/delegate_tools.py`, new `subagent_runner.py`
---
The main agent becomes an orchestrator that delegates context-heavy tasks to subagents with isolated context. Each subagent returns a summary, keeping the orchestrator's context clean. `delegate_task(goal, context, toolsets=[])` with fresh conversation, limited toolset, task-specific system prompt.
## 2. Planning & Task Management 📋
**Problem:** Agent handles tasks reactively without explicit planning. Complex multi-step tasks lack structure, progress tracking, and the ability to decompose work into manageable chunks.
**Ideas:**
- [ ] **Task decomposition tool** - Break complex requests into subtasks:
```
User: "Set up a new Python project with FastAPI, tests, and Docker"
Agent creates plan:
├── 1. Create project structure and requirements.txt
├── 2. Implement FastAPI app skeleton
├── 3. Add pytest configuration and initial tests
├── 4. Create Dockerfile and docker-compose.yml
└── 5. Verify everything works together
```
- Each subtask becomes a trackable unit
- Agent can report progress: "Completed 3/5 tasks"
- [ ] **Progress checkpoints** - Periodic self-assessment:
- After N tool calls or time elapsed, pause to evaluate
- "What have I accomplished? What remains? Am I on track?"
- Detect if stuck in loops or making no progress
- Could trigger replanning if approach isn't working
- [ ] **Explicit plan storage** - Persist plan in conversation:
- Store as structured data (not just in context)
- Update status as tasks complete
- User can ask "What's the plan?" or "What's left?"
- Survives context compression (plans are protected)
- [ ] **Failure recovery with replanning** - When things go wrong:
- Record what failed and why
- Revise plan to work around the issue
- "Step 3 failed because X, adjusting approach to Y"
- Prevents repeating failed strategies
**Files to modify:** `run_agent.py` (add planning hooks), new `tools/planning_tool.py`
---
Task decomposition tool, progress checkpoints after N tool calls, persistent plan storage that survives context compression, failure recovery with replanning.
## 3. Dynamic Skills Expansion 📚
**Problem:** Skills system is elegant but static. Skills must be manually created and added.
Skill acquisition from successful tasks, parameterized skill templates, skill chaining with dependency graphs.
**Ideas:**
- [ ] **Skill acquisition from successful tasks** - After completing a complex task:
- "This approach worked well. Save as a skill?"
- Extract: goal, steps taken, tools used, key decisions
- Generate SKILL.md automatically
- Store in user's skills directory
- [ ] **Skill templates** - Common patterns that can be parameterized:
```markdown
# Debug {language} Error
1. Reproduce the error
2. Search for error message: `web_search("{error_message} {language}")`
3. Check common causes: {common_causes}
4. Apply fix and verify
```
- [ ] **Skill chaining** - Combine skills for complex workflows:
- Skills can reference other skills as dependencies
- "To do X, first apply skill Y, then skill Z"
- Directed graph of skill dependencies
## 4. Interactive Clarifying Questions ❓
**Files to modify:** `tools/skills_tool.py`, `skills/` directory structure, new `skill_generator.py`
Multiple-choice prompt tool with rich terminal UI. Up to 4 choices + free-text. CLI-only with graceful fallback for non-interactive modes.
---
## 5. Memory System 🧠
## 4. Interactive Clarifying Questions Tool ❓
Daily memory logs, long-term curated MEMORY.md, vector/semantic search, pre-compaction memory flush, user profile, learning store for error patterns and discovered fixes. *Inspired by ClawdBot's memory system.*
**Problem:** Agent sometimes makes assumptions or guesses when it should ask the user. Currently can only ask via text, which gets lost in long outputs.
## 6. Heartbeat System 💓
**Ideas:**
- [ ] **Multiple-choice prompt tool** - Let agent present structured choices to user:
```
ask_user_choice(
question="Should the language switcher enable only German or all languages?",
choices=[
"Only enable German - works immediately",
"Enable all, mark untranslated - show fallback notice",
"Let me specify something else"
]
)
```
- Renders as interactive terminal UI with arrow key / Tab navigation
- User selects option, result returned to agent
- Up to 4 choices + optional free-text option
- [ ] **Implementation:**
- Use `inquirer` or `questionary` Python library for rich terminal prompts
- Tool returns selected option text (or user's custom input)
- **CLI-only** - only works when running via `cli.py` (not API/programmatic use)
- Graceful fallback: if not in interactive mode, return error asking agent to rephrase as text
- [ ] **Use cases:**
- Clarify ambiguous requirements before starting work
- Confirm destructive operations with clear options
- Let user choose between implementation approaches
- Checkpoint complex multi-step workflows
Periodic agent wake-up that reads HEARTBEAT.md for instructions. Runs inside the main session with full context. Triggers on interval, exec completion, cron events, or manual wake. HEARTBEAT_OK suppression when nothing needs attention. *Inspired by ClawdBot's heartbeat.*
**Files to modify:** New `tools/ask_user_tool.py`, `cli.py` (detect interactive mode), `model_tools.py`
## 7. Local Browser Control via CDP 🌐
---
Support both local Chrome (via CDP, free) and Browserbase (cloud, paid) as browser backends. Local gives persistent login sessions but lacks CAPTCHA solving.
## 5. Collaborative Problem Solving 🤝
## 8. Signal Integration 📡
**Problem:** Interaction is command/response. Complex problems benefit from dialogue.
New platform adapter using signal-cli daemon (JSON-RPC HTTP + SSE). Requires Java runtime and phone number registration.
**Ideas:**
- [ ] **Assumption surfacing** - Make implicit assumptions explicit:
- "I'm assuming you want Python 3.11+. Correct?"
- "This solution assumes you have sudo access..."
- Let user correct before going down wrong path
## 9. Session Transcript Search 🔍
- [ ] **Checkpoint & confirm** - For high-stakes operations:
- "About to delete 47 files. Here's the list - proceed?"
- "This will modify your database. Want a backup first?"
- Configurable threshold for when to ask
`hermes sessions search <query>` CLI command and `session_search` agent tool. Text-based first (ripgrep over JSONL), vector search later.
**Files to modify:** `run_agent.py`, system prompt configuration
## 10. Plugin/Extension System 🔌
---
Python plugin interface with `plugin.yaml` + `handler.py`. Discovery from `~/.hermes/plugins/`. Plugins can register tools, hooks, and CLI commands. *Inspired by ClawdBot's 36-plugin extension system.*
## 6. Project-Local Context 💾
## 11. Native Companion Apps 📱
**Problem:** Valuable context lost between sessions.
macOS (Swift/SwiftUI), iOS, Android apps connecting via WebSocket. Prerequisite: WS API on gateway. MVP: web UI with Flask/FastAPI. *Inspired by ClawdBot's companion apps.*
**Ideas:**
- [ ] **Project awareness** - Remember project-specific context:
- Store `.hermes/context.md` in project directory
- "This is a Django project using PostgreSQL"
- Coding style preferences, deployment setup, etc.
- Load automatically when working in that directory
## 12. Evaluation System 📏
- [ ] **Handoff notes** - Leave notes for future sessions:
- Write to `.hermes/notes.md` in project
- "TODO for next session: finish implementing X"
- "Known issues: Y doesn't work on Windows"
LLM grader mode for batch_runner, action comparison against expected tool calls, string matching baselines.
**Files to modify:** New `project_context.py`, auto-load in `run_agent.py`
## 13. Layered Context Architecture 📊
## 6. Tools & Skills Wishlist 🧰
Structured hierarchy: project context > skills > user profile > learnings > external knowledge > runtime introspection.
*Things that would need new tool implementations (can't do well with current tools):*
## 14. Tools Wishlist 🧰
### High-Impact
- [ ] **Audio/Video Transcription** 🎬 *(See also: Section 16 for detailed spec)*
- Transcribe audio files, podcasts, YouTube videos
- Extract key moments from video
- Voice memo transcription for messaging integrations
- *Provider options: Whisper API, Deepgram, local Whisper*
- [ ] **Diagram Rendering** 📊
- Render Mermaid/PlantUML to actual images
- Can generate the code, but rendering requires external service or tool
- "Show me how these components connect" → actual visual diagram
### Medium-Impact
- [ ] **Canvas / Visual Workspace** 🖼️
- Agent-controlled visual panel for rendering interactive UI
- Inspired by OpenClaw's Canvas feature
- **Capabilities:**
- `present` / `hide` - Show/hide the canvas panel
- `navigate` - Load HTML files or URLs into the canvas
- `eval` - Execute JavaScript in the canvas context
- `snapshot` - Capture the rendered UI as an image
- **Use cases:**
- Display generated HTML/CSS/JS previews
- Show interactive data visualizations (charts, graphs)
- Render diagrams (Mermaid → rendered output)
- Present structured information in rich format
- A2UI-style component system for structured agent UI
- **Implementation options:**
- Electron-based panel for CLI
- WebSocket-connected web app
- VS Code webview extension
- *Would let agent "show" things rather than just describe them*
- [ ] **Document Generation** 📄
- Create styled PDFs, Word docs, presentations
- *Can do basic PDF via terminal tools, but limited*
- [ ] **Diff/Patch Tool** 📝
- Surgical code modifications with preview
- "Change line 45-50 to X" without rewriting whole file
- Show diffs before applying
- *Can use `diff`/`patch` but a native tool would be safer*
### Skills to Create
- [ ] **Domain-specific skill packs:**
- DevOps/Infrastructure (Terraform, K8s, AWS)
- Data Science workflows (EDA, model training)
- Security/pentesting procedures
- [ ] **Framework-specific skills:**
- React/Vue/Angular patterns
- Django/Rails/Express conventions
- Database optimization playbooks
- [ ] **Troubleshooting flowcharts:**
- "Docker container won't start" → decision tree
- "Production is slow" → systematic diagnosis
---
## 7. Messaging Platform Integrations 💬 ✅ COMPLETE
**Problem:** Agent currently only works via `cli.py` which requires direct terminal access. Users may want to interact via messaging apps from their phone or other devices.
**Architecture:**
- `run_agent.py` already accepts `conversation_history` parameter and returns updated messages ✅
- Need: persistent session storage, platform monitors, session key resolution
**Implementation approach:**
```
┌─────────────────────────────────────────────────────────────┐
│ Platform Monitor (e.g., telegram_monitor.py) │
│ ├─ Long-running daemon connecting to messaging platform │
│ ├─ On message: resolve session key → load history from disk│
│ ├─ Call run_agent.py with loaded history │
│ ├─ Save updated history back to disk (JSONL) │
│ └─ Send response back to platform │
└─────────────────────────────────────────────────────────────┘
```
**Platform support (each user sets up their own credentials):**
- [x] **Telegram** - via `python-telegram-bot`
- Bot token from @BotFather
- Easiest to set up, good for personal use
- [x] **Discord** - via `discord.py`
- Bot token from Discord Developer Portal
- Can work in servers (group sessions) or DMs
- [x] **WhatsApp** - via Node.js bridge (whatsapp-web.js/baileys)
- Requires Node.js bridge setup
- More complex, but reaches most people
**Session management:**
- [x] **Session store** - JSONL persistence per session key
- `~/.hermes/sessions/{session_id}.jsonl`
- Session keys: `agent:main:telegram:dm`, `agent:main:discord:group:123`, etc.
- [x] **Session expiry** - Configurable reset policies
- Daily reset (default 4am) OR idle timeout (default 2 hours)
- Manual reset via `/reset` or `/new` command in chat
- Per-platform and per-type overrides
- [x] **Session continuity** - Conversations persist across messages until reset
**Files created:** `gateway/`, `gateway/platforms/`, `gateway/config.py`, `gateway/session.py`, `gateway/delivery.py`, `gateway/run.py`
**Configuration:**
- Environment variables: `TELEGRAM_BOT_TOKEN`, `DISCORD_BOT_TOKEN`, etc.
- Config file: `~/.hermes/gateway.json`
- CLI commands: `/platforms` to check status, `--gateway` to start
**Dynamic context injection:**
- Agent knows its source platform and chat
- Agent knows connected platforms and home channels
- Agent can deliver cron outputs to specific platforms
---
## 8. Text-to-Speech (TTS) 🔊
**Problem:** Agent can only respond with text. Some users prefer audio responses (accessibility, hands-free use, podcasts).
**Ideas:**
- [ ] **TTS tool** - Generate audio files from text
```python
tts_generate(text="Here's your summary...", voice="nova", output="summary.mp3")
```
- Returns path to generated audio file
- For messaging integrations: can send as voice message
- [ ] **Provider options:**
- Edge TTS (free, good quality, many voices)
- OpenAI TTS (paid, excellent quality)
- ElevenLabs (paid, best quality, voice cloning)
- Local options (Coqui TTS, Bark)
- [ ] **Modes:**
- On-demand: User explicitly asks "read this to me"
- Auto-TTS: Configurable to always generate audio for responses
- Long-text handling: Summarize or chunk very long responses
- [ ] **Integration with messaging:**
- When enabled, can send voice notes instead of/alongside text
- User preference per channel
**Files to create:** `tools/tts_tool.py`, config in `cli-config.yaml`
---
## 13. Speech-to-Text / Audio Transcription 🎤
**Problem:** Users may want to send voice memos instead of typing. Agent is blind to audio content.
**Ideas:**
- [ ] **Voice memo transcription** - For messaging integrations
- User sends voice message → transcribe → process as text
- Seamless: user speaks, agent responds
- [ ] **Audio/video file transcription** - Existing idea, expanded:
- Transcribe local audio files (mp3, wav, m4a)
- Transcribe YouTube videos (download audio → transcribe)
- Extract key moments with timestamps
- [ ] **Provider options:**
- OpenAI Whisper API (good quality, cheap)
- Deepgram (fast, good for real-time)
- Local Whisper (free, runs on GPU)
- Groq Whisper (fast, free tier available)
- [ ] **Tool interface:**
```python
transcribe(source="audio.mp3") # Local file
transcribe(source="https://youtube.com/...") # YouTube
transcribe(source="voice_message", data=bytes) # Voice memo
```
**Files to create:** `tools/transcribe_tool.py`, integrate with messaging monitors
### Plugin/Extension System 🔌
**Concept:** Allow users to add custom tools/skills without modifying core code.
**Why interesting:**
- Community contributions
- Organization-specific tools
- Clean separation of core vs. extensions
**Open questions:**
- Security implications of loading arbitrary code
- Versioning and compatibility
- Discovery and installation UX
---
## Recently Completed ✅
### Dangerous Command Approval System
**Implemented:** Dangerous command detection and approval for terminal tool.
**Features:**
- Pattern-based detection of dangerous commands (rm -rf, DROP TABLE, chmod 777, etc.)
- CLI prompt with options: `[o]nce | [s]ession | [a]lways | [d]eny`
- Session caching (approved patterns don't re-prompt)
- Permanent allowlist in `~/.hermes/config.yaml`
- Force flag for agent to bypass after user confirmation
- Skip check for isolated backends (Docker, Singularity, Modal)
- Helpful sudo failure messages for messaging platforms
**Files:** `tools/terminal_tool.py`, `model_tools.py`, `hermes_cli/config.py`
---
## 14. Learning Machine / Dynamic Memory System 🧠
*Inspired by [Dash](~/agent-codebases/dash) - a self-learning data agent.*
**Problem:** Agent starts fresh every session. Valuable learnings from debugging, error patterns, successful approaches, and user preferences are lost.
**Dash's Key Insight:** Separate **Knowledge** (static, curated) from **Learnings** (dynamic, discovered):
| System | What It Stores | How It Evolves |
|--------|---------------|----------------|
| **Knowledge** (Skills) | Validated approaches, templates, best practices | Curated by user |
| **Learnings** | Error patterns, gotchas, discovered fixes | Managed automatically |
**Tools to implement:**
- [ ] `save_learning(topic, learning, context?)` - Record a discovered pattern
```python
save_learning(
topic="python-ssl",
learning="On Ubuntu 22.04, SSL certificate errors often fixed by: apt install ca-certificates",
context="Debugging requests SSL failure"
)
```
- [ ] `search_learnings(query)` - Find relevant past learnings
```python
search_learnings("SSL certificate error Python")
# Returns: "On Ubuntu 22.04, SSL certificate errors often fixed by..."
```
**User Profile & Memory:**
- [ ] `user_profile` - Structured facts about user preferences
```yaml
# ~/.hermes/user_profile.yaml
coding_style:
python_formatter: black
type_hints: always
test_framework: pytest
preferences:
verbosity: detailed
confirm_destructive: true
environment:
os: linux
shell: bash
default_python: 3.11
```
- [ ] `user_memory` - Unstructured observations the agent learns
```yaml
# ~/.hermes/user_memory.yaml
- "User prefers tabs over spaces despite black's defaults"
- "User's main project is ~/work/myapp - a Django app"
- "User often works late - don't ask about timezone"
```
**When to learn:**
- After fixing an error that took multiple attempts
- When user corrects the agent's approach
- When a workaround is discovered for a tool limitation
- When user expresses a preference
**Storage:** Vector database (ChromaDB) or simple YAML with embedding search.
**Files to create:** `tools/learning_tools.py`, `learning/store.py`, `~/.hermes/learnings/`
---
## 15. Layered Context Architecture 📊
*Inspired by Dash's "Six Layers of Context" - grounding responses in multiple sources.*
**Problem:** Context sources are ad-hoc. No clear hierarchy or strategy for what context to include when.
**Proposed Layers for Hermes:**
| Layer | Source | When Loaded | Example |
|-------|--------|-------------|---------|
| 1. **Project Context** | `.hermes/context.md` | Auto on cwd | "This is a FastAPI project using PostgreSQL" |
| 2. **Skills** | `skills/*.md` | On request | "How to set up React project" |
| 3. **User Profile** | `~/.hermes/user_profile.yaml` | Always | "User prefers pytest, uses black" |
| 4. **Learnings** | `~/.hermes/learnings/` | Semantic search | "SSL fix for Ubuntu" |
| 5. **External Knowledge** | Web search, docs | On demand | Current API docs, Stack Overflow |
| 6. **Runtime Introspection** | Tool calls | Real-time | File contents, terminal output |
**Benefits:**
- Clear mental model for what context is available
- Prioritization: local > learned > external
- Debugging: "Why did agent do X?" → check which layers contributed
**Files to modify:** `run_agent.py` (context loading), new `context/layers.py`
---
## 16. Evaluation System with LLM Grading 📏
*Inspired by Dash's evaluation framework.*
**Problem:** `batch_runner.py` runs test cases but lacks quality assessment.
**Dash's Approach:**
- **String matching** (default) - Check if expected strings appear
- **LLM grader** (-g flag) - GPT evaluates response quality
- **Result comparison** (-r flag) - Compare against golden output
**Implementation for Hermes:**
- [ ] **Test case format:**
```python
TestCase(
name="create_python_project",
prompt="Create a new Python project with FastAPI and tests",
expected_strings=["requirements.txt", "main.py", "test_"], # Basic check
golden_actions=["write:main.py", "write:requirements.txt", "terminal:pip install"],
grader_criteria="Should create complete project structure with working code"
)
```
- [ ] **LLM grader mode:**
```python
def grade_response(response: str, criteria: str) -> Grade:
"""Use GPT to evaluate response quality."""
prompt = f"""
Evaluate this agent response against the criteria.
Criteria: {criteria}
Response: {response}
Score (1-5) and explain why.
"""
# Returns: Grade(score=4, explanation="Created all files but tests are minimal")
```
- [ ] **Action comparison mode:**
- Record tool calls made during test
- Compare against expected actions
- "Expected terminal call to pip install, got npm install"
- [ ] **CLI flags:**
```bash
python batch_runner.py eval test_cases.yaml # String matching
python batch_runner.py eval test_cases.yaml -g # + LLM grading
python batch_runner.py eval test_cases.yaml -r # + Result comparison
python batch_runner.py eval test_cases.yaml -v # Verbose (show responses)
```
**Files to modify:** `batch_runner.py`, new `evals/test_cases.py`, new `evals/grader.py`
---
*Last updated: $(date +%Y-%m-%d)* 🤖
- Diagram rendering (Mermaid/PlantUML to images)
- Document generation (PDFs, Word, presentations)
- Canvas / visual workspace
- Coding agent skill (Codex, Claude Code orchestration via PTY)
- Domain skill packs (DevOps, data science, security)
+1
View File
@@ -276,6 +276,7 @@ def _process_single_prompt(
max_tokens=config.get("max_tokens"),
reasoning_config=config.get("reasoning_config"),
prefill_messages=config.get("prefill_messages"),
skip_context_files=True, # Don't pollute trajectories with SOUL.md/AGENTS.md
)
# Run the agent with task_id to ensure each task gets its own isolated VM
+28 -10
View File
@@ -27,8 +27,8 @@ model:
# - CLI (`hermes` command): Uses "." (current directory where you run hermes)
# - Messaging (Telegram/Discord): Uses MESSAGING_CWD from .env (default: home)
terminal:
env_type: "local"
cwd: "." # CLI working directory - "." means current directory
backend: "local"
cwd: "." # For local backend: "." = current directory. Ignored for remote backends.
timeout: 180
lifetime_seconds: 300
# sudo_password: "" # Enable sudo commands (pipes via sudo -S) - SECURITY WARNING: plaintext!
@@ -39,8 +39,8 @@ terminal:
# Great for: keeping agent isolated from its own code, using powerful remote hardware
# -----------------------------------------------------------------------------
# terminal:
# env_type: "ssh"
# cwd: "/home/myuser/project"
# backend: "ssh"
# cwd: "/home/myuser/project" # Path on the REMOTE server
# timeout: 180
# lifetime_seconds: 300
# ssh_host: "my-server.example.com"
@@ -54,8 +54,8 @@ terminal:
# Great for: reproducible environments, testing, isolation
# -----------------------------------------------------------------------------
# terminal:
# env_type: "docker"
# cwd: "/workspace"
# backend: "docker"
# cwd: "/workspace" # Path INSIDE the container (default: /)
# timeout: 180
# lifetime_seconds: 300
# docker_image: "nikolaik/python-nodejs:python3.11-nodejs20"
@@ -66,8 +66,8 @@ terminal:
# Great for: HPC clusters, shared compute environments
# -----------------------------------------------------------------------------
# terminal:
# env_type: "singularity"
# cwd: "/workspace"
# backend: "singularity"
# cwd: "/workspace" # Path INSIDE the container (default: /root)
# timeout: 180
# lifetime_seconds: 300
# singularity_image: "docker://nikolaik/python-nodejs:python3.11-nodejs20"
@@ -78,8 +78,8 @@ terminal:
# Great for: GPU access, scalable compute, serverless execution
# -----------------------------------------------------------------------------
# terminal:
# env_type: "modal"
# cwd: "/workspace"
# backend: "modal"
# cwd: "/workspace" # Path INSIDE the sandbox (default: /root)
# timeout: 180
# lifetime_seconds: 300
# modal_image: "nikolaik/python-nodejs:python3.11-nodejs20"
@@ -244,6 +244,24 @@ toolsets:
# toolsets:
# - safe
# =============================================================================
# Voice Transcription (Speech-to-Text)
# =============================================================================
# Automatically transcribe voice messages on messaging platforms.
# Requires OPENAI_API_KEY in .env (uses OpenAI Whisper API directly).
stt:
enabled: true
model: "whisper-1" # whisper-1 (cheapest) | gpt-4o-mini-transcribe | gpt-4o-transcribe
# =============================================================================
# Response Pacing (Messaging Platforms)
# =============================================================================
# Add human-like delays between message chunks.
# human_delay:
# mode: "off" # "off" | "natural" | "custom"
# min_ms: 800 # Min delay (custom mode only)
# max_ms: 2500 # Max delay (custom mode only)
# =============================================================================
# Session Logging
# =============================================================================
+159 -89
View File
@@ -28,18 +28,13 @@ os.environ["HERMES_QUIET"] = "1" # Our own modules
import yaml
# prompt_toolkit for fixed input area TUI
from prompt_toolkit import PromptSession
from prompt_toolkit.history import FileHistory
from prompt_toolkit.styles import Style as PTStyle
from prompt_toolkit.formatted_text import HTML
from prompt_toolkit.patch_stdout import patch_stdout
from prompt_toolkit.application import Application, get_app
from prompt_toolkit.buffer import Buffer
from prompt_toolkit.application import Application
from prompt_toolkit.layout import Layout, HSplit, Window, FormattedTextControl
from prompt_toolkit.layout.processors import BeforeInput
from prompt_toolkit.widgets import TextArea
from prompt_toolkit.key_binding import KeyBindings
import asyncio
import threading
import queue
@@ -130,12 +125,20 @@ def load_cli_config() -> Dict[str, Any]:
},
}
# Track whether the config file explicitly set terminal config.
# When using defaults (no config file / no terminal section), we should NOT
# overwrite env vars that were already set by .env -- only a user's config
# file should be authoritative.
_file_has_terminal_config = False
# Load from file if exists
if config_path.exists():
try:
with open(config_path, "r") as f:
file_config = yaml.safe_load(f) or {}
_file_has_terminal_config = "terminal" in file_config
# Handle model config - can be string (new format) or dict (old format)
if "model" in file_config:
if isinstance(file_config["model"], str):
@@ -162,13 +165,27 @@ def load_cli_config() -> Dict[str, Any]:
print(f"[Warning] Failed to load cli-config.yaml: {e}")
# Apply terminal config to environment variables (so terminal_tool picks them up)
# Only set if not already set in environment (env vars take precedence)
terminal_config = defaults.get("terminal", {})
# Handle special cwd values: "." or "auto" means use current working directory
# Normalize config key: the new config system (hermes_cli/config.py) and all
# documentation use "backend", the legacy cli-config.yaml uses "env_type".
# Accept both, with "backend" taking precedence (it's the documented key).
if "backend" in terminal_config:
terminal_config["env_type"] = terminal_config["backend"]
# Handle special cwd values: "." or "auto" means use current working directory.
# Only resolve to the host's CWD for the local backend where the host
# filesystem is directly accessible. For ALL remote/container backends
# (ssh, docker, modal, singularity), the host path doesn't exist on the
# target -- remove the key so terminal_tool.py uses its per-backend default.
if terminal_config.get("cwd") in (".", "auto", "cwd"):
terminal_config["cwd"] = os.getcwd()
defaults["terminal"]["cwd"] = terminal_config["cwd"]
effective_backend = terminal_config.get("env_type", "local")
if effective_backend == "local":
terminal_config["cwd"] = os.getcwd()
defaults["terminal"]["cwd"] = terminal_config["cwd"]
else:
# Remove so TERMINAL_CWD stays unset → tool picks backend default
terminal_config.pop("cwd", None)
env_mappings = {
"env_type": "TERMINAL_ENV",
@@ -187,10 +204,15 @@ def load_cli_config() -> Dict[str, Any]:
"sudo_password": "SUDO_PASSWORD",
}
# CLI config overrides .env for terminal settings
# Apply config values to env vars so terminal_tool picks them up.
# If the config file explicitly has a [terminal] section, those values are
# authoritative and override any .env settings. When using defaults only
# (no config file or no terminal section), don't overwrite env vars that
# were already set by .env -- the user's .env is the fallback source.
for config_key, env_var in env_mappings.items():
if config_key in terminal_config:
os.environ[env_var] = str(terminal_config[config_key])
if _file_has_terminal_config or env_var not in os.environ:
os.environ[env_var] = str(terminal_config[config_key])
# Apply browser config to environment variables
browser_config = defaults.get("browser", {})
@@ -242,6 +264,24 @@ from cron import create_job, list_jobs, remove_job, get_job, run_daemon as run_c
from tools.terminal_tool import cleanup_all_environments as _cleanup_all_terminals
from tools.browser_tool import _emergency_cleanup_all_sessions as _cleanup_all_browsers
# Guard to prevent cleanup from running multiple times on exit
_cleanup_done = False
def _run_cleanup():
"""Run resource cleanup exactly once."""
global _cleanup_done
if _cleanup_done:
return
_cleanup_done = True
try:
_cleanup_all_terminals()
except Exception:
pass
try:
_cleanup_all_browsers()
except Exception:
pass
# ============================================================================
# ASCII Art & Branding
# ============================================================================
@@ -498,6 +538,8 @@ COMMANDS = {
"/clear": "Clear screen and reset conversation (fresh start)",
"/history": "Show conversation history",
"/reset": "Reset conversation only (keep screen)",
"/retry": "Retry the last message (resend to agent)",
"/undo": "Remove the last user/assistant exchange",
"/save": "Save the current conversation",
"/config": "Show current configuration",
"/cron": "Manage scheduled tasks (list, add, remove)",
@@ -508,7 +550,11 @@ COMMANDS = {
def save_config_value(key_path: str, value: any) -> bool:
"""
Save a value to cli-config.yaml at the specified key path.
Save a value to the active config file at the specified key path.
Respects the same lookup order as load_cli_config():
1. ~/.hermes/config.yaml (user config - preferred, used if it exists)
2. ./cli-config.yaml (project config - fallback)
Args:
key_path: Dot-separated path like "agent.system_prompt"
@@ -517,9 +563,15 @@ def save_config_value(key_path: str, value: any) -> bool:
Returns:
True if successful, False otherwise
"""
config_path = Path(__file__).parent / 'cli-config.yaml'
# Use the same precedence as load_cli_config: user config first, then project config
user_config_path = Path.home() / '.hermes' / 'config.yaml'
project_config_path = Path(__file__).parent / 'cli-config.yaml'
config_path = user_config_path if user_config_path.exists() else project_config_path
try:
# Ensure parent directory exists (for ~/.hermes/config.yaml on first use)
config_path.parent.mkdir(parents=True, exist_ok=True)
# Load existing config
if config_path.exists():
with open(config_path, 'r') as f:
@@ -631,26 +683,8 @@ class HermesCLI:
short_uuid = uuid.uuid4().hex[:6]
self.session_id = f"{timestamp_str}_{short_uuid}"
# Setup prompt_toolkit session with history
self._setup_prompt_session()
def _setup_prompt_session(self):
"""Setup prompt_toolkit session with history and styling."""
history_file = Path.home() / ".hermes_history"
# Custom style for the prompt
self.prompt_style = PTStyle.from_dict({
'prompt': '#FFD700 bold',
'input': '#FFF8DC',
})
# Create prompt session with file history
# Note: multiline disabled - Enter submits, use \ at end of line for continuation
self.prompt_session = PromptSession(
history=FileHistory(str(history_file)),
style=self.prompt_style,
enable_history_search=True,
)
# History file for persistent input recall across sessions
self._history_file = Path.home() / ".hermes_history"
def _init_agent(self) -> bool:
"""
@@ -673,6 +707,7 @@ class HermesCLI:
quiet_mode=True, # Suppress verbose output for clean CLI
ephemeral_system_prompt=self.system_prompt if self.system_prompt else None,
session_id=self.session_id, # Pass CLI's session ID to agent
platform="cli", # CLI interface — agent uses terminal-friendly formatting
)
return True
except Exception as e:
@@ -931,6 +966,67 @@ class HermesCLI:
except Exception as e:
print(f"(x_x) Failed to save: {e}")
def retry_last(self):
"""Retry the last user message by removing the last exchange and re-sending.
Removes the last assistant response (and any tool-call messages) and
the last user message, then re-sends that user message to the agent.
Returns the message to re-send, or None if there's nothing to retry.
"""
if not self.conversation_history:
print("(._.) No messages to retry.")
return None
# Walk backwards to find the last user message
last_user_idx = None
for i in range(len(self.conversation_history) - 1, -1, -1):
if self.conversation_history[i].get("role") == "user":
last_user_idx = i
break
if last_user_idx is None:
print("(._.) No user message found to retry.")
return None
# Extract the message text and remove everything from that point forward
last_message = self.conversation_history[last_user_idx].get("content", "")
self.conversation_history = self.conversation_history[:last_user_idx]
print(f"(^_^)b Retrying: \"{last_message[:60]}{'...' if len(last_message) > 60 else ''}\"")
return last_message
def undo_last(self):
"""Remove the last user/assistant exchange from conversation history.
Walks backwards and removes all messages from the last user message
onward (including assistant responses, tool calls, etc.).
"""
if not self.conversation_history:
print("(._.) No messages to undo.")
return
# Walk backwards to find the last user message
last_user_idx = None
for i in range(len(self.conversation_history) - 1, -1, -1):
if self.conversation_history[i].get("role") == "user":
last_user_idx = i
break
if last_user_idx is None:
print("(._.) No user message found to undo.")
return
# Count how many messages we're removing
removed_count = len(self.conversation_history) - last_user_idx
removed_msg = self.conversation_history[last_user_idx].get("content", "")
# Truncate history to before the last user message
self.conversation_history = self.conversation_history[:last_user_idx]
print(f"(^_^)b Undid {removed_count} message(s). Removed: \"{removed_msg[:60]}{'...' if len(removed_msg) > 60 else ''}\"")
remaining = len(self.conversation_history)
print(f" {remaining} message(s) remaining in history.")
def _handle_prompt_command(self, cmd: str):
"""Handle the /prompt command to view or set system prompt."""
parts = cmd.split(maxsplit=1)
@@ -1268,6 +1364,13 @@ class HermesCLI:
elif cmd_lower.startswith("/personality"):
# Use original case (handler lowercases the personality name itself)
self._handle_personality_command(cmd_original)
elif cmd_lower == "/retry":
retry_msg = self.retry_last()
if retry_msg and hasattr(self, '_pending_input'):
# Re-queue the message so process_loop sends it to the agent
self._pending_input.put(retry_msg)
elif cmd_lower == "/undo":
self.undo_last()
elif cmd_lower == "/save":
self.save_conversation()
elif cmd_lower.startswith("/cron"):
@@ -1302,8 +1405,9 @@ class HermesCLI:
# Add user message to history
self.conversation_history.append({"role": "user", "content": message})
# Visual separator after user input
print("" * 60, flush=True)
# Visual separator after user input (adapt to terminal width, capped for readability)
term_width = min(self.console.width, 120)
print("" * term_width, flush=True)
try:
# Run the conversation with interrupt monitoring
@@ -1361,14 +1465,20 @@ class HermesCLI:
if response:
# Use simple print for compatibility with prompt_toolkit's patch_stdout
# Adapt box width to terminal (cap at 120 for readability)
box_width = min(self.console.width, 120)
inner = box_width - 2 # account for border chars ╭/╰ and ╮/╯
label = "⚕ Hermes"
padding = inner - len(label) - 1 # -1 for the leading space
print()
print("" + "" * 58 + "")
print("⚕ Hermes" + " " * 49 + "")
print("" + "" * 58 + "")
print("" + "" * inner + "")
print("" + label + " " * max(padding, 0) + "")
print("" + "" * inner + "")
print()
print(response)
print()
print("" * 60)
print("" * box_width)
# If we have a pending message from interrupt, re-queue it for process_loop
# instead of recursing (avoids unbounded recursion from rapid interrupts)
@@ -1382,37 +1492,6 @@ class HermesCLI:
print(f"Error: {e}")
return None
def get_input(self) -> Optional[str]:
"""
Get user input using prompt_toolkit.
Enter submits. For multiline, end line with \\ to continue.
Returns:
The user's input, or None if EOF/interrupt
"""
try:
# Get first line
line = self.prompt_session.prompt(
HTML('<prompt> </prompt>'),
style=self.prompt_style,
)
# Handle multi-line input (lines ending with \)
lines = [line]
while line.endswith("\\"):
lines[-1] = line[:-1] # Remove trailing backslash
line = self.prompt_session.prompt(
HTML('<prompt> </prompt>'), # Continuation prompt
style=self.prompt_style,
)
lines.append(line)
return "\n".join(lines).strip()
except (EOFError, KeyboardInterrupt):
return None
def run(self):
"""Run the interactive CLI loop with persistent input at bottom."""
self.show_banner()
@@ -1426,9 +1505,6 @@ class HermesCLI:
self._should_exit = False
self._last_ctrl_c_time = 0 # Track double Ctrl+C for force exit
# Create a persistent input area using prompt_toolkit Application
input_buffer = Buffer()
# Key bindings for the input area
kb = KeyBindings()
@@ -1486,13 +1562,14 @@ class HermesCLI:
self._should_exit = True
event.app.exit()
# Create the input area widget
# Create the input area widget with persistent history across sessions
input_area = TextArea(
height=1,
prompt=' ',
style='class:input-area',
multiline=False,
wrap_lines=False,
history=FileHistory(str(self._history_file)),
)
# Create a status line that shows when agent is working
@@ -1545,6 +1622,7 @@ class HermesCLI:
# Check for commands
if user_input.startswith("/"):
print(f"\n⚙️ {user_input}")
if not self.process_command(user_input):
self._should_exit = True
# Schedule app exit
@@ -1556,6 +1634,9 @@ class HermesCLI:
self._agent_running = True
app.invalidate() # Refresh status line
# Echo the user's input so it stays visible in scrollback
print(f"\n💬 You: {user_input}")
try:
self.chat(user_input)
finally:
@@ -1570,9 +1651,7 @@ class HermesCLI:
process_thread.start()
# Register atexit cleanup so resources are freed even on unexpected exit
# (terminal VMs, browser sessions, etc.)
atexit.register(_cleanup_all_browsers)
atexit.register(_cleanup_all_terminals)
atexit.register(_run_cleanup)
# Run the application with patch_stdout for proper output handling
try:
@@ -1582,15 +1661,7 @@ class HermesCLI:
pass
finally:
self._should_exit = True
# Explicitly clean up resources before exit
try:
_cleanup_all_terminals()
except Exception:
pass
try:
_cleanup_all_browsers()
except Exception:
pass
_run_cleanup()
print("\nGoodbye! ⚕")
@@ -1711,8 +1782,7 @@ def main(
sys.exit(0)
# Register cleanup for single-query mode (interactive mode registers in run())
atexit.register(_cleanup_all_browsers)
atexit.register(_cleanup_all_terminals)
atexit.register(_run_cleanup)
# Handle single query mode
if query:
+32
View File
@@ -307,6 +307,38 @@ This is intentional: CLI users are in a terminal and expect the agent to work in
If the agent hits the max iteration limit while working, instead of a generic error, it asks the model to summarize what it found so far. This gives you a useful response even when the task couldn't be fully completed.
## Voice Messages (TTS)
The `text_to_speech` tool generates audio that the gateway delivers as native voice messages on each platform:
| Platform | Delivery | Format |
|----------|----------|--------|
| Telegram | Voice bubble (plays inline) | Opus `.ogg` — native from OpenAI/ElevenLabs, converted via ffmpeg for Edge TTS |
| Discord | Audio file attachment | MP3 |
| WhatsApp | Audio file attachment | MP3 |
| CLI | Saved to `~/voice-memos/` | MP3 |
**Providers:**
- **Edge TTS** (default) — Free, no API key, 322 voices in 74 languages
- **ElevenLabs** — Premium quality, requires `ELEVENLABS_API_KEY`
- **OpenAI TTS** — Good quality, requires `OPENAI_API_KEY`
Voice and provider are configured by the user in `~/.hermes/config.yaml` under the `tts:` key. The model only sends text; it does not choose the voice.
The tool returns a `MEDIA:<path>` tag that the gateway send pipeline intercepts and delivers as a native audio message. If `[[audio_as_voice]]` is present (Opus format available), Telegram sends it as a voice bubble instead of an audio file.
**Telegram voice bubbles & ffmpeg:**
Telegram requires Opus/OGG format for native voice bubbles (the round, inline-playable kind). **OpenAI and ElevenLabs** produce Opus natively when on Telegram — no extra setup needed. **Edge TTS** (the default free provider) outputs MP3 and needs `ffmpeg` to convert:
```bash
sudo apt install ffmpeg # Ubuntu/Debian
brew install ffmpeg # macOS
sudo dnf install ffmpeg # Fedora
```
Without ffmpeg, Edge TTS audio is sent as a regular audio file (still playable, but shows as a rectangular music player instead of a voice bubble).
## Cron Job Delivery
When scheduling cron jobs, you can specify where the output should be delivered:
+5 -1
View File
@@ -40,11 +40,15 @@ async def web_search(query: str) -> dict:
|----------|--------|-------|
| **Web** | `web_tools.py` | `web_search`, `web_extract`, `web_crawl` |
| **Terminal** | `terminal_tool.py` | `terminal` (local/docker/singularity/modal/ssh backends) |
| **File** | `file_tools.py` | `read_file`, `write_file`, `patch`, `search` |
| **Browser** | `browser_tool.py` | `browser_navigate`, `browser_click`, `browser_type`, etc. |
| **Vision** | `vision_tools.py` | `vision_analyze` |
| **Image Gen** | `image_generation_tool.py` | `image_generate` |
| **TTS** | `tts_tool.py` | `text_to_speech` (Edge TTS free / ElevenLabs / OpenAI) |
| **Reasoning** | `mixture_of_agents_tool.py` | `mixture_of_agents` |
| **Skills** | `skills_tool.py` | `skills_categories`, `skills_list`, `skill_view` |
| **Skills** | `skills_tool.py` | `skills_list`, `skill_view` |
| **Cronjob** | `cronjob_tools.py` | `schedule_cronjob`, `list_cronjobs`, `remove_cronjob` |
| **RL Training** | `rl_training_tool.py` | `rl_list_environments`, `rl_start_training`, `rl_check_status`, etc. |
## Tool Registration
+330
View File
@@ -0,0 +1,330 @@
# Hermes-Agent Atropos Environments
This directory contains the integration layer between **hermes-agent's** tool-calling capabilities and the **Atropos** RL training framework. It provides everything needed to run agentic LLMs through multi-turn tool-calling loops, score their output with arbitrary reward functions, and feed results into Atropos for training or evaluation.
## Architecture Overview
```
Atropos Framework
┌───────────────────────┐
│ BaseEnv │ (atroposlib)
│ - Server management │
│ - Worker scheduling │
│ - Wandb logging │
│ - CLI (serve/process/ │
│ evaluate) │
└───────────┬───────────┘
│ inherits
┌───────────┴───────────┐
│ HermesAgentBaseEnv │ hermes_base_env.py
│ - Terminal backend │
│ - Tool resolution │
│ - Agent loop │
│ - ToolContext │
│ - Async patches │
└───────────┬───────────┘
│ inherits
┌─────────────────┼─────────────────┐
│ │ │
TerminalTestEnv HermesSweEnv TerminalBench2EvalEnv
(stack testing) (SWE training) (TB2 benchmark eval)
```
### Inheritance Chain
**BaseEnv** (from `atroposlib`) is the Atropos base class. It provides:
- Server management (OpenAI-compatible API servers, VLLM, SGLang)
- Worker scheduling for parallel rollouts
- Wandb integration for metrics and rollout logging
- CLI interface with three subcommands: `serve`, `process`, `evaluate`
- `evaluate_log()` for saving eval results to JSON + samples.jsonl
**HermesAgentBaseEnv** (`hermes_base_env.py`) extends BaseEnv with hermes-agent specifics:
- Sets `os.environ["TERMINAL_ENV"]` to configure the terminal backend (local, docker, modal, ssh, singularity)
- Resolves hermes-agent toolsets via `_resolve_tools_for_group()` (calls `get_tool_definitions()` from `model_tools.py`)
- Implements `collect_trajectory()` which runs the full agent loop and computes rewards
- Supports two-phase operation (Phase 1: OpenAI server, Phase 2: VLLM ManagedServer)
- Applies monkey patches for async-safe tool operation at import time
Concrete environments inherit from `HermesAgentBaseEnv` and implement:
- `setup()` -- Load dataset, initialize state
- `get_next_item()` -- Return the next item for rollout
- `format_prompt()` -- Convert a dataset item into the user message
- `compute_reward()` -- Score the rollout using ToolContext
- `evaluate()` -- Periodic evaluation logic
## Core Components
### Agent Loop (`agent_loop.py`)
`HermesAgentLoop` is the reusable multi-turn agent engine. It runs the same pattern as hermes-agent's `run_agent.py`:
1. Send messages + tools to the API via `server.chat_completion()`
2. If the response contains `tool_calls`, execute each one via `handle_function_call()` from `model_tools.py`
3. Append tool results to the conversation and go back to step 1
4. If the response has no tool_calls, the agent is done
Tool calls are executed in a thread pool (`run_in_executor`) so backends that use `asyncio.run()` internally (Modal, Docker) don't deadlock inside Atropos's event loop.
Returns an `AgentResult` containing the full conversation history, turn count, reasoning content per turn, tool errors, and optional ManagedServer state (for Phase 2).
### Tool Context (`tool_context.py`)
`ToolContext` is a per-rollout handle that gives reward/verification functions direct access to **all** hermes-agent tools, scoped to the rollout's `task_id`. The same `task_id` means the terminal/browser session is the SAME one the model used during its rollout -- all state (files, processes, browser tabs) is preserved.
```python
async def compute_reward(self, item, result, ctx: ToolContext):
# Run tests in the model's terminal sandbox
test = ctx.terminal("pytest -v")
if test["exit_code"] == 0:
return 1.0
# Check if a file was created
content = ctx.read_file("/workspace/solution.py")
if content.get("content"):
return 0.5
# Download files locally for verification (binary-safe)
ctx.download_file("/remote/output.bin", "/local/output.bin")
return 0.0
```
Available methods:
- **Terminal**: `terminal(command, timeout)` -- run shell commands
- **Files**: `read_file(path)`, `write_file(path, content)`, `search(query, path)`
- **Transfers**: `upload_file()`, `upload_dir()`, `download_file()`, `download_dir()` -- binary-safe file transfers between host and sandbox
- **Web**: `web_search(query)`, `web_extract(urls)`
- **Browser**: `browser_navigate(url)`, `browser_snapshot()`
- **Generic**: `call_tool(name, args)` -- call any hermes-agent tool by name
- **Cleanup**: `cleanup()` -- release all resources (called automatically after `compute_reward`)
### Patches (`patches.py`)
**Problem**: Some hermes-agent tools use `asyncio.run()` internally (e.g., mini-swe-agent's Modal backend via SWE-ReX). This crashes when called from inside Atropos's event loop because `asyncio.run()` cannot be nested.
**Solution**: `patches.py` monkey-patches `SwerexModalEnvironment` to use a dedicated background thread (`_AsyncWorker`) with its own event loop. The calling code sees the same sync interface, but internally the async work happens on a separate thread that doesn't conflict with Atropos's loop.
What gets patched:
- `SwerexModalEnvironment.__init__` -- creates Modal deployment on a background thread
- `SwerexModalEnvironment.execute` -- runs commands on the same background thread
- `SwerexModalEnvironment.stop` -- stops deployment on the background thread
The patches are:
- **Idempotent** -- calling `apply_patches()` multiple times is safe
- **Transparent** -- same interface and behavior, only the internal async execution changes
- **Universal** -- works identically in normal CLI use (no running event loop)
Applied automatically at import time by `hermes_base_env.py`.
### Tool Call Parsers (`tool_call_parsers/`)
Client-side parsers that extract structured `tool_calls` from raw model output text. Used in **Phase 2** (VLLM server type) where ManagedServer's `/generate` endpoint returns raw text without tool call parsing.
Each parser is a standalone reimplementation of the corresponding VLLM parser's `extract_tool_calls()` logic. No VLLM dependency -- only standard library (`re`, `json`, `uuid`) and `openai` types.
Available parsers:
- `hermes` -- Hermes/ChatML `<tool_call>` XML format
- `mistral` -- Mistral `[TOOL_CALLS]` format
- `llama3_json` -- Llama 3 JSON tool calling
- `qwen` -- Qwen tool calling format
- `qwen3_coder` -- Qwen3 Coder format
- `deepseek_v3` -- DeepSeek V3 format
- `deepseek_v3_1` -- DeepSeek V3.1 format
- `kimi_k2` -- Kimi K2 format
- `longcat` -- Longcat format
- `glm45` / `glm47` -- GLM model formats
Usage:
```python
from environments.tool_call_parsers import get_parser
parser = get_parser("hermes")
content, tool_calls = parser.parse(raw_model_output)
```
In Phase 1 (OpenAI server type), these parsers are not needed -- the server handles tool call parsing natively.
## Two-Phase Operation
### Phase 1: OpenAI Server (Evaluation / SFT Data Generation)
Uses `server.chat_completion()` with `tools=` parameter. The server (VLLM, SGLang, OpenRouter, OpenAI) handles tool call parsing natively. Returns `ChatCompletion` objects with structured `tool_calls`.
- Good for: evaluation, SFT data generation, testing
- Run with: `serve` (with `run-api`), `process`, or `evaluate` subcommands
- Placeholder tokens are created for the Atropos pipeline
### Phase 2: VLLM ManagedServer (Full RL Training)
Uses ManagedServer for exact token IDs + logprobs via `/generate`. Client-side tool call parser (from `tool_call_parsers/`) reconstructs structured `tool_calls` from raw output.
- Good for: full RL training with GRPO/PPO
- Run with: `serve` subcommand
- Real tokens, masks, and logprobs flow through the pipeline
## Directory Structure
```
environments/
├── README.md # This file
├── __init__.py # Package exports
├── hermes_base_env.py # Abstract base (HermesAgentBaseEnv)
├── agent_loop.py # Multi-turn agent engine (HermesAgentLoop)
├── tool_context.py # Per-rollout tool access for reward functions
├── patches.py # Async-safety patches for Modal backend
├── tool_call_parsers/ # Phase 2 client-side parsers
│ ├── __init__.py # Registry + base class
│ ├── hermes_parser.py
│ ├── mistral_parser.py
│ ├── llama_parser.py
│ ├── qwen_parser.py
│ ├── qwen3_coder_parser.py
│ ├── deepseek_v3_parser.py
│ ├── deepseek_v3_1_parser.py
│ ├── kimi_k2_parser.py
│ ├── longcat_parser.py
│ ├── glm45_parser.py
│ └── glm47_parser.py
├── terminal_test_env/ # Stack validation environment
│ └── terminal_test_env.py
├── hermes_swe_env/ # SWE-bench style training environment
│ └── hermes_swe_env.py
└── benchmarks/ # Evaluation benchmarks
└── terminalbench_2/
└── terminalbench2_env.py
```
## Concrete Environments
### TerminalTestEnv (`terminal_test_env/`)
A self-contained environment with inline tasks (no external dataset needed) for validating the full stack end-to-end. Each task asks the model to create a file at a known path, and the verifier checks the content matches.
```bash
# Serve mode (needs run-api)
run-api
python environments/terminal_test_env/terminal_test_env.py serve
# Process mode (no run-api, saves to JSONL)
python environments/terminal_test_env/terminal_test_env.py process \
--env.data_path_to_save_groups terminal_test_output.jsonl
```
### HermesSweEnv (`hermes_swe_env/`)
SWE-bench style training environment. The model gets a coding task, uses terminal + file + web tools to solve it, and the reward function runs tests in the same Modal sandbox.
```bash
python environments/hermes_swe_env/hermes_swe_env.py serve \
--openai.model_name YourModel \
--env.dataset_name bigcode/humanevalpack \
--env.terminal_backend modal
```
### TerminalBench2EvalEnv (`benchmarks/terminalbench_2/`)
**Eval-only** environment for the Terminal-Bench 2.0 benchmark (89 tasks). Each task gets a pre-built Docker Hub image, a natural language instruction, and a test suite. The agent uses terminal + file tools to solve the task, then the test suite verifies correctness.
Follows the standard Atropos eval pattern (like GPQA, MMLU, etc.):
- Run via `evaluate` subcommand (no `run-api` needed)
- `setup()` loads the dataset, `evaluate()` runs all tasks
- `rollout_and_score_eval()` handles per-task agent loop + test verification
- Downloads verifier output locally for reliable reward checking (Harbor pattern)
```bash
# Run full benchmark
python environments/benchmarks/terminalbench_2/terminalbench2_env.py evaluate \
--openai.model_name anthropic/claude-opus-4.6
# Run subset of tasks
python environments/benchmarks/terminalbench_2/terminalbench2_env.py evaluate \
--openai.model_name anthropic/claude-opus-4.6 \
--env.task_filter fix-git,git-multibranch
# Skip specific tasks
python environments/benchmarks/terminalbench_2/terminalbench2_env.py evaluate \
--openai.model_name anthropic/claude-opus-4.6 \
--env.skip_tasks heavy-task,slow-task
```
## Creating a New Environment
### Training Environment
1. Create a new directory under `environments/`
2. Create your env file inheriting from `HermesAgentBaseEnv`
3. Implement the four abstract methods + `evaluate()`
```python
from environments.hermes_base_env import HermesAgentBaseEnv, HermesAgentEnvConfig
class MyEnvConfig(HermesAgentEnvConfig):
pass # Add custom fields as needed
class MyEnv(HermesAgentBaseEnv):
name = "my-env"
env_config_cls = MyEnvConfig
@classmethod
def config_init(cls):
env_config = MyEnvConfig(
enabled_toolsets=["terminal", "file"],
terminal_backend="modal",
# ... other config
)
server_configs = [APIServerConfig(...)]
return env_config, server_configs
async def setup(self):
self.dataset = load_dataset(...)
self.iter = 0
async def get_next_item(self):
item = self.dataset[self.iter % len(self.dataset)]
self.iter += 1
return item
def format_prompt(self, item):
return item["instruction"]
async def compute_reward(self, item, result, ctx):
# ctx gives you full tool access to the rollout's sandbox
test = ctx.terminal("pytest -v")
return 1.0 if test["exit_code"] == 0 else 0.0
async def evaluate(self, *args, **kwargs):
# Periodic evaluation logic
...
if __name__ == "__main__":
MyEnv.cli()
```
### Eval-Only Environment (Benchmark)
For eval benchmarks, follow the pattern in `terminalbench2_env.py`:
1. Create under `environments/benchmarks/your-benchmark/`
2. Inherit from `HermesAgentBaseEnv`
3. Set eval-only config: `eval_handling=STOP_TRAIN`, `steps_per_eval=1`, `total_steps=1`
4. Stub the training methods (`collect_trajectories`, `score`)
5. Implement `rollout_and_score_eval()` and `evaluate()`
6. Run with `evaluate` subcommand
## Key Config Fields
| Field | Description | Default |
|-------|-------------|---------|
| `enabled_toolsets` | Which hermes toolsets to enable | `None` (all) |
| `disabled_toolsets` | Toolsets to disable | `None` |
| `distribution` | Probabilistic toolset distribution name | `None` |
| `max_agent_turns` | Max LLM calls per rollout | `30` |
| `agent_temperature` | Sampling temperature | `1.0` |
| `terminal_backend` | `local`, `docker`, `modal`, `ssh`, `singularity` | `local` |
| `system_prompt` | System message for the agent | `None` |
| `tool_call_parser` | Parser name for Phase 2 | `hermes` |
| `eval_handling` | `STOP_TRAIN`, `LIMIT_TRAIN`, `NONE` | `STOP_TRAIN` |
+7 -3
View File
@@ -4,15 +4,19 @@ Hermes-Agent Atropos Environments
Provides a layered integration between hermes-agent's tool-calling capabilities
and the Atropos RL training framework.
Layers:
Core layers:
- agent_loop: Reusable multi-turn agent loop with standard OpenAI-spec tool calling
- tool_context: Per-rollout tool access handle for reward/verification functions
- hermes_base_env: Abstract base environment (BaseEnv subclass) for Atropos
- tool_call_parsers: Client-side tool call parser registry for Phase 2 (VLLM /generate)
Concrete environments:
- terminal_test_env: Simple file-creation tasks for testing the stack
- hermes_swe_env: SWE-bench style tasks with Modal sandboxes
- terminal_test_env/: Simple file-creation tasks for testing the stack
- hermes_swe_env/: SWE-bench style tasks with Modal sandboxes
- endless_terminals/: Terminal tasks from HuggingFace dataset with Apptainer containers
Benchmarks (eval-only):
- benchmarks/terminalbench_2/: Terminal-Bench 2.0 evaluation
"""
from environments.agent_loop import AgentResult, HermesAgentLoop
+60 -11
View File
@@ -15,6 +15,7 @@ import asyncio
import concurrent.futures
import json
import logging
import os
import uuid
from dataclasses import dataclass, field
from typing import Any, Dict, List, Optional, Set
@@ -24,7 +25,22 @@ from model_tools import handle_function_call
# Thread pool for running sync tool calls that internally use asyncio.run()
# (e.g., mini-swe-agent's modal/docker backends). Running them in a separate
# thread gives them a clean event loop so they don't deadlock inside Atropos's loop.
_tool_executor = concurrent.futures.ThreadPoolExecutor(max_workers=8)
# Size must be large enough for concurrent eval tasks (e.g., 89 TB2 tasks all
# making tool calls). Too small = thread pool starvation, tasks queue for minutes.
# Resized at runtime by HermesAgentBaseEnv.__init__ via resize_tool_pool().
_tool_executor = concurrent.futures.ThreadPoolExecutor(max_workers=128)
def resize_tool_pool(max_workers: int):
"""
Replace the global tool executor with a new one of the given size.
Called by HermesAgentBaseEnv.__init__ based on config.tool_pool_size.
Safe to call before any tasks are submitted.
"""
global _tool_executor
_tool_executor = concurrent.futures.ThreadPoolExecutor(max_workers=max_workers)
logger.info("Tool thread pool resized to %d workers", max_workers)
logger = logging.getLogger(__name__)
@@ -119,6 +135,7 @@ class HermesAgentLoop:
task_id: Optional[str] = None,
temperature: float = 1.0,
max_tokens: Optional[int] = None,
extra_body: Optional[Dict[str, Any]] = None,
):
"""
Initialize the agent loop.
@@ -132,6 +149,9 @@ class HermesAgentLoop:
task_id: Unique ID for terminal/browser session isolation
temperature: Sampling temperature for generation
max_tokens: Max tokens per generation (None for server default)
extra_body: Extra parameters passed to the OpenAI client's create() call.
Used for OpenRouter provider preferences, transforms, etc.
e.g. {"provider": {"ignore": ["DeepInfra"]}}
"""
self.server = server
self.tool_schemas = tool_schemas
@@ -140,6 +160,7 @@ class HermesAgentLoop:
self.task_id = task_id or str(uuid.uuid4())
self.temperature = temperature
self.max_tokens = max_tokens
self.extra_body = extra_body
async def run(self, messages: List[Dict[str, Any]]) -> AgentResult:
"""
@@ -155,7 +176,11 @@ class HermesAgentLoop:
reasoning_per_turn = []
tool_errors: List[ToolError] = []
import time as _time
for turn in range(self.max_turns):
turn_start = _time.monotonic()
# Build the chat_completion kwargs
chat_kwargs = {
"messages": messages,
@@ -171,11 +196,18 @@ class HermesAgentLoop:
if self.max_tokens is not None:
chat_kwargs["max_tokens"] = self.max_tokens
# Inject extra_body for provider-specific params (e.g., OpenRouter
# provider preferences like banned/preferred providers, transforms)
if self.extra_body:
chat_kwargs["extra_body"] = self.extra_body
# Make the API call -- standard OpenAI spec
api_start = _time.monotonic()
try:
response = await self.server.chat_completion(**chat_kwargs)
except Exception as e:
logger.error("API call failed on turn %d: %s", turn + 1, e)
api_elapsed = _time.monotonic() - api_start
logger.error("API call failed on turn %d (%.1fs): %s", turn + 1, api_elapsed, e)
return AgentResult(
messages=messages,
managed_state=self._get_managed_state(),
@@ -185,8 +217,10 @@ class HermesAgentLoop:
tool_errors=tool_errors,
)
api_elapsed = _time.monotonic() - api_start
if not response or not response.choices:
logger.warning("Empty response on turn %d", turn + 1)
logger.warning("Empty response on turn %d (api=%.1fs)", turn + 1, api_elapsed)
return AgentResult(
messages=messages,
managed_state=self._get_managed_state(),
@@ -265,14 +299,16 @@ class HermesAgentLoop:
try:
if tool_name == "terminal":
import os
backend = os.getenv("TERMINAL_ENV", "local")
cmd_preview = args.get("command", "")[:80]
print(f" 🖥️ [{backend}] $ {cmd_preview}")
logger.info(
"[%s] $ %s", self.task_id[:8], cmd_preview,
)
# Run tool calls in a thread pool so backends that use
# asyncio.run() internally (modal, docker) get a clean
# event loop instead of deadlocking inside Atropos's loop.
tool_submit_time = _time.monotonic()
loop = asyncio.get_event_loop()
tool_result = await loop.run_in_executor(
_tool_executor,
@@ -280,6 +316,16 @@ class HermesAgentLoop:
tool_name, args, task_id=self.task_id
),
)
tool_elapsed = _time.monotonic() - tool_submit_time
# Log slow tools and thread pool stats for debugging
pool_active = _tool_executor._work_queue.qsize()
if tool_elapsed > 30:
logger.warning(
"[%s] turn %d: %s took %.1fs (pool queue=%d)",
self.task_id[:8], turn + 1, tool_name,
tool_elapsed, pool_active,
)
except Exception as e:
tool_result = json.dumps(
{"error": f"Tool execution failed: {type(e).__name__}: {str(e)}"}
@@ -320,10 +366,11 @@ class HermesAgentLoop:
}
)
logger.debug(
"Turn %d: %d tool calls executed",
turn + 1,
len(assistant_msg.tool_calls),
turn_elapsed = _time.monotonic() - turn_start
logger.info(
"[%s] turn %d: api=%.1fs, %d tools, turn_total=%.1fs",
self.task_id[:8], turn + 1, api_elapsed,
len(assistant_msg.tool_calls), turn_elapsed,
)
else:
@@ -336,8 +383,10 @@ class HermesAgentLoop:
msg_dict["reasoning_content"] = reasoning
messages.append(msg_dict)
logger.debug(
"Turn %d: model finished naturally (no tool calls)", turn + 1
turn_elapsed = _time.monotonic() - turn_start
logger.info(
"[%s] turn %d: api=%.1fs, no tools (finished), turn_total=%.1fs",
self.task_id[:8], turn + 1, api_elapsed, turn_elapsed,
)
return AgentResult(
View File
@@ -0,0 +1,38 @@
# Terminal-Bench 2.0 Evaluation -- Default Configuration
#
# Eval-only environment for the TB2 benchmark (89 terminal tasks).
# Uses Modal terminal backend for per-task cloud-isolated sandboxes
# and OpenRouter for inference.
#
# Usage:
# python environments/benchmarks/terminalbench_2/terminalbench2_env.py evaluate \
# --config environments/benchmarks/terminalbench_2/default.yaml
#
# # Override model:
# python environments/benchmarks/terminalbench_2/terminalbench2_env.py evaluate \
# --config environments/benchmarks/terminalbench_2/default.yaml \
# --openai.model_name anthropic/claude-sonnet-4
env:
enabled_toolsets: ["terminal", "file"]
max_agent_turns: 60
max_token_length: 32000
agent_temperature: 0.8
terminal_backend: "modal"
terminal_timeout: 300 # 5 min per command (builds, pip install)
tool_pool_size: 128 # thread pool for 89 parallel tasks
dataset_name: "NousResearch/terminal-bench-2"
test_timeout: 600
task_timeout: 1800 # 30 min wall-clock per task, auto-FAIL if exceeded
tokenizer_name: "NousResearch/Hermes-3-Llama-3.1-8B"
use_wandb: true
wandb_name: "terminal-bench-2"
ensure_scores_are_not_same: false
data_dir_to_save_evals: "environments/benchmarks/evals/terminal-bench-2"
openai:
base_url: "https://openrouter.ai/api/v1"
model_name: "anthropic/claude-opus-4.6"
server_type: "openai"
health_check: false
# api_key loaded from OPENROUTER_API_KEY in .env
+32
View File
@@ -0,0 +1,32 @@
#!/bin/bash
# Terminal-Bench 2.0 Evaluation
#
# Run from repo root:
# bash environments/benchmarks/terminalbench_2/run_eval.sh
#
# Override model:
# bash environments/benchmarks/terminalbench_2/run_eval.sh \
# --openai.model_name anthropic/claude-sonnet-4
#
# Run a subset:
# bash environments/benchmarks/terminalbench_2/run_eval.sh \
# --env.task_filter fix-git,git-multibranch
mkdir -p logs evals/terminal-bench-2
LOG_FILE="logs/terminalbench2_$(date +%Y%m%d_%H%M%S).log"
echo "Terminal-Bench 2.0 Evaluation"
echo "Log: $LOG_FILE"
echo ""
export TERMINAL_ENV=modal
export TERMINAL_TIMEOUT=300
python environments/benchmarks/terminalbench_2/terminalbench2_env.py evaluate \
--config environments/benchmarks/terminalbench_2/default.yaml \
"$@" \
2>&1 | tee "$LOG_FILE"
echo ""
echo "Log saved to: $LOG_FILE"
@@ -0,0 +1,904 @@
"""
TerminalBench2Env -- Terminal-Bench 2.0 Evaluation Environment
Evaluates agentic LLMs on challenging terminal tasks from Terminal-Bench 2.0.
Each task provides a unique Docker environment (pre-built on Docker Hub), a natural
language instruction, and a test suite for verification. The agent uses terminal +
file tools to complete the task, then the test suite runs inside the same sandbox.
This is an eval-only environment (not a training environment). It is designed to
be run via the `evaluate` subcommand:
python environments/terminalbench2_env.py evaluate \\
--env.dataset_name NousResearch/terminal-bench-2
The evaluate flow:
1. setup() -- Loads the TB2 dataset from HuggingFace
2. evaluate() -- Iterates over all tasks, running each through:
a. rollout_and_score_eval() -- Per-task agent loop + test verification
- Resolves Docker image (pre-built Hub image or Dockerfile fallback)
- Registers per-task Modal sandbox via register_task_env_overrides()
- Runs the HermesAgentLoop (terminal + file tools)
- Uploads test suite and runs test.sh in the same sandbox
- Returns binary pass/fail result
b. Aggregates per-task, per-category, and overall pass rates
c. Logs results via evaluate_log() and wandb
Key features:
- Per-task Modal sandboxes using pre-built Docker Hub images
- Binary reward: 1.0 if all tests pass, 0.0 otherwise
- Concurrency-controlled parallel evaluation via asyncio.Semaphore
- Per-task, per-category, and aggregate pass rate tracking
"""
import asyncio
import base64
import io
import json
import logging
import os
import shutil
import sys
import tarfile
import tempfile
import time
import uuid
from collections import defaultdict
from pathlib import Path
from typing import Any, Dict, List, Optional, Tuple, Union
# Ensure repo root is on sys.path for imports
_repo_root = Path(__file__).resolve().parent.parent.parent.parent
if str(_repo_root) not in sys.path:
sys.path.insert(0, str(_repo_root))
from pydantic import Field
from atroposlib.envs.base import EvalHandlingEnum
from atroposlib.envs.server_handling.server_manager import APIServerConfig
from environments.agent_loop import AgentResult, HermesAgentLoop
from environments.hermes_base_env import HermesAgentBaseEnv, HermesAgentEnvConfig
from environments.tool_context import ToolContext
from tools.terminal_tool import (
register_task_env_overrides,
clear_task_env_overrides,
cleanup_vm,
)
logger = logging.getLogger(__name__)
# =============================================================================
# Configuration
# =============================================================================
class TerminalBench2EvalConfig(HermesAgentEnvConfig):
"""
Configuration for the Terminal-Bench 2.0 evaluation environment.
Extends HermesAgentEnvConfig with TB2-specific settings for dataset loading,
test execution, task filtering, and eval concurrency.
"""
# --- Dataset ---
dataset_name: str = Field(
default="NousResearch/terminal-bench-2",
description="HuggingFace dataset containing TB2 tasks.",
)
# --- Test execution ---
test_timeout: int = Field(
default=180,
description="Timeout in seconds for running the test suite after agent completes.",
)
# --- Image strategy ---
force_build: bool = Field(
default=False,
description="If True, always build from Dockerfile (ignore docker_image). "
"Useful for testing custom Dockerfiles.",
)
# --- Task filtering (comma-separated from CLI) ---
task_filter: Optional[str] = Field(
default=None,
description="Comma-separated task names to run (e.g., 'fix-git,git-multibranch'). "
"If not set, all tasks are run.",
)
skip_tasks: Optional[str] = Field(
default=None,
description="Comma-separated task names to skip on top of the default skip list.",
)
# --- Per-task wall-clock timeout ---
task_timeout: int = Field(
default=1800,
description="Maximum wall-clock seconds per task (agent loop + verification). "
"Tasks exceeding this are scored as FAIL. Default 30 minutes.",
)
# Tasks that cannot run properly on Modal and are excluded from scoring.
MODAL_INCOMPATIBLE_TASKS = {
"qemu-startup", # Needs KVM/hardware virtualization
"qemu-alpine-ssh", # Needs KVM/hardware virtualization
"crack-7z-hash", # Password brute-force -- too slow for cloud sandbox timeouts
}
# =============================================================================
# Tar extraction helper
# =============================================================================
def _extract_base64_tar(b64_data: str, target_dir: Path):
"""Extract a base64-encoded tar.gz archive into target_dir."""
if not b64_data:
return
raw = base64.b64decode(b64_data)
buf = io.BytesIO(raw)
with tarfile.open(fileobj=buf, mode="r:gz") as tar:
tar.extractall(path=str(target_dir))
# =============================================================================
# Main Environment
# =============================================================================
class TerminalBench2EvalEnv(HermesAgentBaseEnv):
"""
Terminal-Bench 2.0 evaluation environment (eval-only, no training).
Inherits from HermesAgentBaseEnv for:
- Terminal backend setup (os.environ["TERMINAL_ENV"])
- Tool resolution via _resolve_tools_for_group()
- Monkey patches for async-safe tool operation
- Wandb trajectory formatting
The evaluate flow (triggered by `environment.py evaluate`):
1. setup() -- Load dataset from HuggingFace
2. evaluate() -- Run all tasks through rollout_and_score_eval()
Each task in rollout_and_score_eval():
1. Resolve Docker image (pre-built Hub image or Dockerfile fallback)
2. Register per-task Modal sandbox override
3. Run HermesAgentLoop with terminal + file tools
4. Upload test suite and execute test.sh in the same sandbox
5. Check /logs/verifier/reward.txt for pass/fail
6. Clean up sandbox, overrides, and temp files
"""
name = "terminal-bench-2"
env_config_cls = TerminalBench2EvalConfig
@classmethod
def config_init(cls) -> Tuple[TerminalBench2EvalConfig, List[APIServerConfig]]:
"""
Default configuration for Terminal-Bench 2.0 evaluation.
Uses eval-only settings:
- eval_handling=STOP_TRAIN so the eval flow runs cleanly
- steps_per_eval=1, total_steps=1 so eval triggers immediately
- group_size=1 (one rollout per group, each task is expensive)
Uses Modal terminal backend (cloud-isolated sandbox per task) and
OpenRouter with Claude for inference.
"""
env_config = TerminalBench2EvalConfig(
# Terminal + file tools only (the agent interacts via shell commands)
enabled_toolsets=["terminal", "file"],
disabled_toolsets=None,
distribution=None,
# Agent settings -- TB2 tasks are complex, need many turns
max_agent_turns=60,
max_token_length=16000,
agent_temperature=0.6,
system_prompt=None,
# Modal backend for per-task cloud-isolated sandboxes
terminal_backend="modal",
terminal_timeout=300, # 5 min per command (builds, pip install, etc.)
# Test execution timeout (TB2 test scripts can install deps like pytest)
test_timeout=180,
# 89 tasks run in parallel, each needs a thread for tool calls
tool_pool_size=128,
# --- Eval-only Atropos settings ---
# These settings make the env work as an eval-only environment:
# - STOP_TRAIN: pauses training during eval (standard for eval envs)
# - steps_per_eval=1, total_steps=1: eval triggers immediately
# - group_size=1: one rollout per group (each task is expensive)
eval_handling=EvalHandlingEnum.STOP_TRAIN,
group_size=1,
steps_per_eval=1,
total_steps=1,
tokenizer_name="NousResearch/Hermes-3-Llama-3.1-8B",
use_wandb=True,
wandb_name="terminal-bench-2",
ensure_scores_are_not_same=False, # Binary rewards may all be 0 or 1
)
# OpenRouter with Claude -- API key loaded from .env
server_configs = [
APIServerConfig(
base_url="https://openrouter.ai/api/v1",
model_name="anthropic/claude-sonnet-4",
server_type="openai",
api_key=os.getenv("OPENROUTER_API_KEY", ""),
health_check=False,
)
]
return env_config, server_configs
# =========================================================================
# Setup -- load dataset
# =========================================================================
async def setup(self):
"""Load the Terminal-Bench 2.0 dataset from HuggingFace."""
from datasets import load_dataset
# Auto-set terminal_lifetime to task_timeout + 120s so sandboxes
# never get killed during an active task, but still get cleaned up
# promptly after the task times out.
lifetime = self.config.task_timeout + 120
self.config.terminal_lifetime = lifetime
os.environ["TERMINAL_LIFETIME_SECONDS"] = str(lifetime)
print(f" Terminal lifetime auto-set to {lifetime}s (task_timeout + 120s)")
print(f"Loading TB2 dataset from: {self.config.dataset_name}")
ds = load_dataset(self.config.dataset_name, split="train")
# Apply task filters (comma-separated strings from CLI)
tasks = list(ds)
if self.config.task_filter:
allowed = {name.strip() for name in self.config.task_filter.split(",")}
tasks = [t for t in tasks if t["task_name"] in allowed]
print(f" Filtered to {len(tasks)} tasks: {sorted(allowed)}")
# Skip tasks incompatible with the current backend (e.g., QEMU on Modal)
# plus any user-specified skip_tasks
skip = set(MODAL_INCOMPATIBLE_TASKS) if self.config.terminal_backend == "modal" else set()
if self.config.skip_tasks:
skip |= {name.strip() for name in self.config.skip_tasks.split(",")}
if skip:
before = len(tasks)
tasks = [t for t in tasks if t["task_name"] not in skip]
skipped = before - len(tasks)
if skipped > 0:
print(f" Skipped {skipped} incompatible tasks: {sorted(skip & {t['task_name'] for t in ds})}")
self.all_eval_items = tasks
self.iter = 0
# Build category index for per-category metrics
self.category_index: Dict[str, List[int]] = defaultdict(list)
for i, task in enumerate(self.all_eval_items):
self.category_index[task.get("category", "unknown")].append(i)
# Reward tracking for wandb logging
self.eval_metrics: List[Tuple[str, float]] = []
# Streaming JSONL writer -- saves each task's full conversation
# immediately on completion so data is preserved even on Ctrl+C.
# Timestamped filename so each run produces a unique file.
import datetime
log_dir = os.path.join(os.path.dirname(__file__), "logs")
os.makedirs(log_dir, exist_ok=True)
run_ts = datetime.datetime.now().strftime("%Y%m%d_%H%M%S")
self._streaming_path = os.path.join(log_dir, f"samples_{run_ts}.jsonl")
self._streaming_file = open(self._streaming_path, "w")
self._streaming_lock = __import__("threading").Lock()
print(f" Streaming results to: {self._streaming_path}")
print(f"TB2 ready: {len(self.all_eval_items)} tasks across {len(self.category_index)} categories")
for cat, indices in sorted(self.category_index.items()):
print(f" {cat}: {len(indices)} tasks")
def _save_result(self, result: Dict[str, Any]):
"""Write a single task result to the streaming JSONL file immediately."""
if not hasattr(self, "_streaming_file") or self._streaming_file.closed:
return
with self._streaming_lock:
self._streaming_file.write(json.dumps(result, ensure_ascii=False, default=str) + "\n")
self._streaming_file.flush()
# =========================================================================
# Training pipeline stubs -- NOT used in eval-only mode
# =========================================================================
# These satisfy the abstract method requirements from HermesAgentBaseEnv.
# The evaluate subcommand calls setup() -> evaluate() directly, bypassing
# the training pipeline entirely.
async def get_next_item(self):
"""Return next item (stub -- not used in eval-only mode)."""
item = self.all_eval_items[self.iter % len(self.all_eval_items)]
self.iter += 1
return item
def format_prompt(self, item: Dict[str, Any]) -> str:
"""Return the task's instruction as the user prompt."""
return item["instruction"]
async def compute_reward(self, item, result, ctx) -> float:
"""Compute reward (stub -- actual verification is in rollout_and_score_eval)."""
return 0.0
async def collect_trajectories(self, item):
"""Collect trajectories (stub -- not used in eval-only mode)."""
return None, []
async def score(self, rollout_group_data):
"""Score rollouts (stub -- not used in eval-only mode)."""
return None
# =========================================================================
# Docker image resolution
# =========================================================================
def _resolve_task_image(
self, item: Dict[str, Any], task_name: str
) -> Tuple[str, Optional[Path]]:
"""
Resolve the Docker image for a task, with fallback to Dockerfile.
Strategy (mirrors Harbor's approach):
1. If force_build=True, always build from Dockerfile in environment_tar
2. If docker_image is available, use the pre-built Docker Hub image (fast)
3. Otherwise, extract Dockerfile from environment_tar and build (slow)
Returns:
(modal_image, temp_dir) -- modal_image is a Docker Hub name or a
Dockerfile path. temp_dir is set if we extracted files that need
cleanup later.
"""
docker_image = item.get("docker_image", "")
environment_tar = item.get("environment_tar", "")
# Fast path: use pre-built Docker Hub image
if docker_image and not self.config.force_build:
logger.info("Task %s: using pre-built image %s", task_name, docker_image)
return docker_image, None
# Slow path: extract Dockerfile from environment_tar and build
if environment_tar:
task_dir = Path(tempfile.mkdtemp(prefix=f"tb2-{task_name}-"))
_extract_base64_tar(environment_tar, task_dir)
dockerfile_path = task_dir / "Dockerfile"
if dockerfile_path.exists():
logger.info(
"Task %s: building from Dockerfile (force_build=%s, docker_image=%s)",
task_name, self.config.force_build, bool(docker_image),
)
return str(dockerfile_path), task_dir
# Neither available -- fall back to Hub image if force_build was True
if docker_image:
logger.warning(
"Task %s: force_build=True but no environment_tar, "
"falling back to docker_image %s", task_name, docker_image,
)
return docker_image, None
return "", None
# =========================================================================
# Per-task evaluation -- agent loop + test verification
# =========================================================================
async def rollout_and_score_eval(self, eval_item: Dict[str, Any]) -> Dict:
"""
Evaluate a single TB2 task: run the agent loop, then verify with tests.
This is the core evaluation method. For each task it:
1. Resolves the Docker image and registers the Modal sandbox override
2. Runs HermesAgentLoop with terminal + file tools
3. Uploads the test suite into the sandbox
4. Executes test.sh and checks the result
5. Cleans up the sandbox and temp files
Args:
eval_item: A single TB2 task dict from the dataset
Returns:
Dict with 'passed' (bool), 'reward' (float), 'task_name' (str),
'category' (str), and optional debug info
"""
task_name = eval_item.get("task_name", "unknown")
category = eval_item.get("category", "unknown")
task_id = str(uuid.uuid4())
task_dir = None # Set if we extract a Dockerfile (needs cleanup)
from tqdm import tqdm
tqdm.write(f" [START] {task_name} (task_id={task_id[:8]})")
task_start = time.time()
try:
# --- 1. Resolve Docker image ---
modal_image, task_dir = self._resolve_task_image(eval_item, task_name)
if not modal_image:
logger.error("Task %s: no docker_image or environment_tar, skipping", task_name)
return {
"passed": False, "reward": 0.0,
"task_name": task_name, "category": category,
"error": "no_image",
}
# --- 2. Register per-task Modal image override ---
register_task_env_overrides(task_id, {"modal_image": modal_image})
logger.info(
"Task %s: registered image override for task_id %s",
task_name, task_id[:8],
)
# --- 3. Resolve tools and build messages ---
tools, valid_names = self._resolve_tools_for_group()
messages: List[Dict[str, Any]] = []
if self.config.system_prompt:
messages.append({"role": "system", "content": self.config.system_prompt})
messages.append({"role": "user", "content": self.format_prompt(eval_item)})
# --- 4. Run agent loop ---
agent = HermesAgentLoop(
server=self.server,
tool_schemas=tools,
valid_tool_names=valid_names,
max_turns=self.config.max_agent_turns,
task_id=task_id,
temperature=self.config.agent_temperature,
max_tokens=self.config.max_token_length,
extra_body=self.config.extra_body,
)
result = await agent.run(messages)
# --- 5. Verify -- run test suite in the agent's sandbox ---
# Skip verification if the agent produced no meaningful output
only_system_and_user = all(
msg.get("role") in ("system", "user") for msg in result.messages
)
if result.turns_used == 0 or only_system_and_user:
logger.warning(
"Task %s: agent produced no output (turns=%d). Reward=0.",
task_name, result.turns_used,
)
reward = 0.0
else:
# Run tests in a thread so the blocking ctx.terminal() calls
# don't freeze the entire event loop (which would stall all
# other tasks, tqdm updates, and timeout timers).
ctx = ToolContext(task_id)
try:
loop = asyncio.get_event_loop()
reward = await loop.run_in_executor(
None, # default thread pool
self._run_tests, eval_item, ctx, task_name,
)
except Exception as e:
logger.error("Task %s: test verification failed: %s", task_name, e)
reward = 0.0
finally:
ctx.cleanup()
passed = reward == 1.0
status = "PASS" if passed else "FAIL"
elapsed = time.time() - task_start
tqdm.write(f" [{status}] {task_name} (turns={result.turns_used}, {elapsed:.0f}s)")
logger.info(
"Task %s: reward=%.1f, turns=%d, finished=%s",
task_name, reward, result.turns_used, result.finished_naturally,
)
out = {
"passed": passed,
"reward": reward,
"task_name": task_name,
"category": category,
"turns_used": result.turns_used,
"finished_naturally": result.finished_naturally,
"messages": result.messages,
}
self._save_result(out)
return out
except Exception as e:
elapsed = time.time() - task_start
logger.error("Task %s: rollout failed: %s", task_name, e, exc_info=True)
tqdm.write(f" [ERROR] {task_name}: {e} ({elapsed:.0f}s)")
out = {
"passed": False, "reward": 0.0,
"task_name": task_name, "category": category,
"error": str(e),
}
self._save_result(out)
return out
finally:
# --- Cleanup: clear overrides, sandbox, and temp files ---
clear_task_env_overrides(task_id)
try:
cleanup_vm(task_id)
except Exception as e:
logger.debug("VM cleanup for %s: %s", task_id[:8], e)
if task_dir and task_dir.exists():
shutil.rmtree(task_dir, ignore_errors=True)
def _run_tests(
self, item: Dict[str, Any], ctx: ToolContext, task_name: str
) -> float:
"""
Upload and execute the test suite in the agent's sandbox, then
download the verifier output locally to read the reward.
Follows Harbor's verification pattern:
1. Upload tests/ directory into the sandbox
2. Execute test.sh inside the sandbox
3. Download /logs/verifier/ directory to a local temp dir
4. Read reward.txt locally with native Python I/O
Downloading locally avoids issues with the file_read tool on
the Modal VM and matches how Harbor handles verification.
TB2 test scripts (test.sh) typically:
1. Install pytest via uv/pip
2. Run pytest against the test files in /tests/
3. Write results to /logs/verifier/reward.txt
Args:
item: The TB2 task dict (contains tests_tar, test_sh)
ctx: ToolContext scoped to this task's sandbox
task_name: For logging
Returns:
1.0 if tests pass, 0.0 otherwise
"""
tests_tar = item.get("tests_tar", "")
test_sh = item.get("test_sh", "")
if not test_sh:
logger.warning("Task %s: no test_sh content, reward=0", task_name)
return 0.0
# Create required directories in the sandbox
ctx.terminal("mkdir -p /tests /logs/verifier")
# Upload test files into the sandbox (binary-safe via base64)
if tests_tar:
tests_temp = Path(tempfile.mkdtemp(prefix=f"tb2-tests-{task_name}-"))
try:
_extract_base64_tar(tests_tar, tests_temp)
ctx.upload_dir(str(tests_temp), "/tests")
except Exception as e:
logger.warning("Task %s: failed to upload test files: %s", task_name, e)
finally:
shutil.rmtree(tests_temp, ignore_errors=True)
# Write the test runner script (test.sh)
ctx.write_file("/tests/test.sh", test_sh)
ctx.terminal("chmod +x /tests/test.sh")
# Execute the test suite
logger.info(
"Task %s: running test suite (timeout=%ds)",
task_name, self.config.test_timeout,
)
test_result = ctx.terminal(
"bash /tests/test.sh",
timeout=self.config.test_timeout,
)
exit_code = test_result.get("exit_code", -1)
output = test_result.get("output", "")
# Download the verifier output directory locally, then read reward.txt
# with native Python I/O. This avoids issues with file_read on the
# Modal VM and matches Harbor's verification pattern.
reward = 0.0
local_verifier_dir = Path(tempfile.mkdtemp(prefix=f"tb2-verifier-{task_name}-"))
try:
ctx.download_dir("/logs/verifier", str(local_verifier_dir))
reward_file = local_verifier_dir / "reward.txt"
if reward_file.exists() and reward_file.stat().st_size > 0:
content = reward_file.read_text().strip()
if content == "1":
reward = 1.0
elif content == "0":
reward = 0.0
else:
# Unexpected content -- try parsing as float
try:
reward = float(content)
except (ValueError, TypeError):
logger.warning(
"Task %s: reward.txt content unexpected (%r), "
"falling back to exit_code=%d",
task_name, content, exit_code,
)
reward = 1.0 if exit_code == 0 else 0.0
else:
# reward.txt not written -- fall back to exit code
logger.warning(
"Task %s: reward.txt not found after download, "
"falling back to exit_code=%d",
task_name, exit_code,
)
reward = 1.0 if exit_code == 0 else 0.0
except Exception as e:
logger.warning(
"Task %s: failed to download verifier dir: %s, "
"falling back to exit_code=%d",
task_name, e, exit_code,
)
reward = 1.0 if exit_code == 0 else 0.0
finally:
shutil.rmtree(local_verifier_dir, ignore_errors=True)
# Log test output for debugging failures
if reward == 0.0:
output_preview = output[-500:] if output else "(no output)"
logger.info(
"Task %s: FAIL (exit_code=%d)\n%s",
task_name, exit_code, output_preview,
)
return reward
# =========================================================================
# Evaluate -- main entry point for the eval subcommand
# =========================================================================
async def _eval_with_timeout(self, item: Dict[str, Any]) -> Dict:
"""
Wrap rollout_and_score_eval with a per-task wall-clock timeout.
If the task exceeds task_timeout seconds, it's automatically scored
as FAIL. This prevents any single task from hanging indefinitely.
"""
task_name = item.get("task_name", "unknown")
category = item.get("category", "unknown")
try:
return await asyncio.wait_for(
self.rollout_and_score_eval(item),
timeout=self.config.task_timeout,
)
except asyncio.TimeoutError:
from tqdm import tqdm
elapsed = self.config.task_timeout
tqdm.write(f" [TIMEOUT] {task_name} (exceeded {elapsed}s wall-clock limit)")
logger.error("Task %s: wall-clock timeout after %ds", task_name, elapsed)
out = {
"passed": False, "reward": 0.0,
"task_name": task_name, "category": category,
"error": f"timeout ({elapsed}s)",
}
self._save_result(out)
return out
async def evaluate(self, *args, **kwargs) -> None:
"""
Run Terminal-Bench 2.0 evaluation over all tasks.
This is the main entry point when invoked via:
python environments/terminalbench2_env.py evaluate
Runs all tasks through rollout_and_score_eval() via asyncio.gather()
(same pattern as GPQA and other Atropos eval envs). Each task is
wrapped with a wall-clock timeout so hung tasks auto-fail.
Suppresses noisy Modal/terminal output (HERMES_QUIET) so the tqdm
bar stays visible.
"""
start_time = time.time()
# Route all logging through tqdm.write() so the progress bar stays
# pinned at the bottom while log lines scroll above it.
from tqdm import tqdm
class _TqdmHandler(logging.Handler):
def emit(self, record):
try:
tqdm.write(self.format(record))
except Exception:
self.handleError(record)
handler = _TqdmHandler()
handler.setFormatter(logging.Formatter(
"%(asctime)s [%(name)s] %(levelname)s: %(message)s",
datefmt="%H:%M:%S",
))
root = logging.getLogger()
root.handlers = [handler] # Replace any existing handlers
root.setLevel(logging.INFO)
# Silence noisy third-party loggers that flood the output
logging.getLogger("httpx").setLevel(logging.WARNING) # Every HTTP request
logging.getLogger("openai").setLevel(logging.WARNING) # OpenAI client retries
logging.getLogger("rex-deploy").setLevel(logging.WARNING) # Swerex deployment
logging.getLogger("rex_image_builder").setLevel(logging.WARNING) # Image builds
print(f"\n{'='*60}")
print("Starting Terminal-Bench 2.0 Evaluation")
print(f"{'='*60}")
print(f" Dataset: {self.config.dataset_name}")
print(f" Total tasks: {len(self.all_eval_items)}")
print(f" Max agent turns: {self.config.max_agent_turns}")
print(f" Task timeout: {self.config.task_timeout}s")
print(f" Terminal backend: {self.config.terminal_backend}")
print(f" Tool thread pool: {self.config.tool_pool_size}")
print(f" Terminal timeout: {self.config.terminal_timeout}s/cmd")
print(f" Terminal lifetime: {self.config.terminal_lifetime}s (auto: task_timeout + 120)")
print(f"{'='*60}\n")
# Fire all tasks with wall-clock timeout, track live accuracy on the bar
total_tasks = len(self.all_eval_items)
eval_tasks = [
asyncio.ensure_future(self._eval_with_timeout(item))
for item in self.all_eval_items
]
results = []
passed_count = 0
pbar = tqdm(total=total_tasks, desc="Evaluating TB2", dynamic_ncols=True)
try:
for coro in asyncio.as_completed(eval_tasks):
result = await coro
results.append(result)
if result and result.get("passed"):
passed_count += 1
done = len(results)
pct = (passed_count / done * 100) if done else 0
pbar.set_postfix_str(f"pass={passed_count}/{done} ({pct:.1f}%)")
pbar.update(1)
except (KeyboardInterrupt, asyncio.CancelledError):
pbar.close()
print(f"\n\nInterrupted! Cleaning up {len(eval_tasks)} tasks...")
# Cancel all pending tasks
for task in eval_tasks:
task.cancel()
# Let cancellations propagate (finally blocks run cleanup_vm)
await asyncio.gather(*eval_tasks, return_exceptions=True)
# Belt-and-suspenders: clean up any remaining sandboxes
from tools.terminal_tool import cleanup_all_environments
cleanup_all_environments()
print("All sandboxes cleaned up.")
return
finally:
pbar.close()
end_time = time.time()
# Filter out None results (shouldn't happen, but be safe)
valid_results = [r for r in results if r is not None]
if not valid_results:
print("Warning: No valid evaluation results obtained")
return
# ---- Compute metrics ----
total = len(valid_results)
passed = sum(1 for r in valid_results if r.get("passed"))
overall_pass_rate = passed / total if total > 0 else 0.0
# Per-category breakdown
cat_results: Dict[str, List[Dict]] = defaultdict(list)
for r in valid_results:
cat_results[r.get("category", "unknown")].append(r)
# Build metrics dict
eval_metrics = {
"eval/pass_rate": overall_pass_rate,
"eval/total_tasks": total,
"eval/passed_tasks": passed,
"eval/evaluation_time_seconds": end_time - start_time,
}
# Per-category metrics
for category, cat_items in sorted(cat_results.items()):
cat_passed = sum(1 for r in cat_items if r.get("passed"))
cat_total = len(cat_items)
cat_pass_rate = cat_passed / cat_total if cat_total > 0 else 0.0
cat_key = category.replace(" ", "_").replace("-", "_").lower()
eval_metrics[f"eval/pass_rate_{cat_key}"] = cat_pass_rate
# Store metrics for wandb_log
self.eval_metrics = [(k, v) for k, v in eval_metrics.items()]
# ---- Print summary ----
print(f"\n{'='*60}")
print("Terminal-Bench 2.0 Evaluation Results")
print(f"{'='*60}")
print(f"Overall Pass Rate: {overall_pass_rate:.4f} ({passed}/{total})")
print(f"Evaluation Time: {end_time - start_time:.1f} seconds")
print("\nCategory Breakdown:")
for category, cat_items in sorted(cat_results.items()):
cat_passed = sum(1 for r in cat_items if r.get("passed"))
cat_total = len(cat_items)
cat_rate = cat_passed / cat_total if cat_total > 0 else 0.0
print(f" {category}: {cat_rate:.1%} ({cat_passed}/{cat_total})")
# Print individual task results
print("\nTask Results:")
for r in sorted(valid_results, key=lambda x: x.get("task_name", "")):
status = "PASS" if r.get("passed") else "FAIL"
turns = r.get("turns_used", "?")
error = r.get("error", "")
extra = f" (error: {error})" if error else ""
print(f" [{status}] {r['task_name']} (turns={turns}){extra}")
print(f"{'='*60}\n")
# Build sample records for evaluate_log (includes full conversations)
samples = [
{
"task_name": r.get("task_name"),
"category": r.get("category"),
"passed": r.get("passed"),
"reward": r.get("reward"),
"turns_used": r.get("turns_used"),
"error": r.get("error"),
"messages": r.get("messages"),
}
for r in valid_results
]
# Log evaluation results
try:
await self.evaluate_log(
metrics=eval_metrics,
samples=samples,
start_time=start_time,
end_time=end_time,
generation_parameters={
"temperature": self.config.agent_temperature,
"max_tokens": self.config.max_token_length,
"max_agent_turns": self.config.max_agent_turns,
"terminal_backend": self.config.terminal_backend,
},
)
except Exception as e:
print(f"Error logging evaluation results: {e}")
# Close streaming file
if hasattr(self, "_streaming_file") and not self._streaming_file.closed:
self._streaming_file.close()
print(f" Live results saved to: {self._streaming_path}")
# Kill all remaining sandboxes. Timed-out tasks leave orphaned thread
# pool workers still executing commands -- cleanup_all stops them.
from tools.terminal_tool import cleanup_all_environments
print("\nCleaning up all sandboxes...")
cleanup_all_environments()
# Shut down the tool thread pool so orphaned workers from timed-out
# tasks are killed immediately instead of retrying against dead
# sandboxes and spamming the console with TimeoutError warnings.
from environments.agent_loop import _tool_executor
_tool_executor.shutdown(wait=False, cancel_futures=True)
print("Done.")
# =========================================================================
# Wandb logging
# =========================================================================
async def wandb_log(self, wandb_metrics: Optional[Dict] = None):
"""Log TB2-specific metrics to wandb."""
if wandb_metrics is None:
wandb_metrics = {}
# Add stored eval metrics
for metric_name, metric_value in self.eval_metrics:
wandb_metrics[metric_name] = metric_value
self.eval_metrics = []
await super().wandb_log(wandb_metrics)
if __name__ == "__main__":
TerminalBench2EvalEnv.cli()
@@ -0,0 +1,5 @@
"""Endless Terminals Environment - Terminal task training from HuggingFace dataset."""
from .endless_terminals_env import EndlessTerminalsEnv, EndlessTerminalsEnvConfig
__all__ = ["EndlessTerminalsEnv", "EndlessTerminalsEnvConfig"]
@@ -0,0 +1,69 @@
# Endless Terminals Environment -- Default Configuration
#
# Trains agents on terminal tasks from the Endless Terminals HuggingFace dataset.
# Uses hermes-agent backends (modal/docker/local) with per-task Docker images.
# Tests run in the same sandbox the agent used (no separate containers needed).
#
# Dataset: https://huggingface.co/datasets/obiwan96/endless-terminals-train
#
# Prerequisites:
# 1. Download dataset: huggingface-cli download obiwan96/endless-terminals-train \
# --repo-type dataset --local-dir ~/endless-terminals-data \
# --local-dir-use-symlinks False
# 2. Set TASKS_BASE_DIR environment variable or configure tasks_base_dir below
# 3. For modal backend: Configure Modal CLI (modal token set)
# 4. For docker backend: Install Docker
#
# Usage:
# python environments/endless_terminals/endless_terminals_env.py process \
# --config environments/endless_terminals/default.yaml
env:
# Toolsets
enabled_toolsets: ["terminal", "file"]
# Agent configuration
max_agent_turns: 32
max_token_length: 4096
agent_temperature: 1.0
# Terminal backend
terminal_backend: "local" # Change to "modal" or "docker" for cloud isolation
# Dataset settings
use_dataset: true
dataset_name: "obiwan96/endless-terminals"
dataset_split: "train"
dataset_cache_dir: "~/.cache/huggingface/datasets"
tasks_base_dir: "" # Set to directory containing task_* folders (e.g., ~/endless-terminals-data)
# Test execution
test_timeout_s: 60
# Training configuration
group_size: 8
total_steps: 10000
steps_per_eval: 500
num_eval_tasks: 10
eval_split_ratio: 0.1
# Logging
use_wandb: true
wandb_name: "endless-terminals"
# System prompt
system_prompt: >
You are a skilled Linux system administrator and programmer.
You have access to a terminal and file tools to complete system administration
and programming tasks. Use the tools effectively to solve the given task,
and verify your solution works correctly before finishing.
openai:
base_url: "https://openrouter.ai/api/v1"
model_name: "anthropic/claude-sonnet-4.5"
server_type: "openai"
api_key: "" # Loaded from OPENROUTER_API_KEY env var
health_check: false
timeout: 30 # 30 second timeout per request
max_retries: 2 # Only retry twice
@@ -0,0 +1,921 @@
"""
Endless Terminals Environment for Hermes-Agent + Atropos RL.
Loads pre-generated terminal tasks from HuggingFace dataset and scores
agent performance using test execution in the agent's sandbox.
Uses hermes-agent backends (modal, docker, local) with per-task Docker images
extracted from container.def files. Tests run in the same sandbox the agent
used, following the Terminal Bench 2 pattern.
Dataset: https://huggingface.co/datasets/obiwan96/endless-terminals-train
Run:
python environments/endless_terminals/endless_terminals_env.py process \
--config environments/endless_terminals/default.yaml
"""
import asyncio
import logging
import os
import random
import re
import sys
from pathlib import Path
from typing import Any, Dict, List, Optional, Tuple
from pydantic import Field
# Ensure hermes-agent root is on path
_repo_root = Path(__file__).resolve().parent.parent.parent
if str(_repo_root) not in sys.path:
sys.path.insert(0, str(_repo_root))
from atroposlib.envs.base import ScoredDataGroup, ScoredDataItem
from atroposlib.type_definitions import Item
from environments.hermes_base_env import HermesAgentBaseEnv, HermesAgentEnvConfig
from environments.agent_loop import AgentResult
from environments.tool_context import ToolContext
from tools.terminal_tool import (
register_task_env_overrides,
clear_task_env_overrides,
cleanup_vm,
)
logger = logging.getLogger(__name__)
# Add endless-terminals to path for imports
ENDLESS_TERMINALS_PATH = os.getenv(
"ENDLESS_TERMINALS_PATH",
str(Path.home() / "Desktop" / "Projects" / "endless-terminals")
)
sys.path.insert(0, ENDLESS_TERMINALS_PATH)
class EndlessTerminalsEnvConfig(HermesAgentEnvConfig):
"""Configuration for Endless Terminals environment."""
# Dataset settings
use_dataset: bool = Field(
default=True,
description="Load tasks from HuggingFace dataset (recommended). If False, generate procedurally."
)
dataset_name: str = Field(
default="obiwan96/endless-terminals-train",
description="HuggingFace dataset name"
)
dataset_split: str = Field(
default="train",
description="Dataset split to use"
)
dataset_cache_dir: str = Field(
default="~/.cache/huggingface/datasets",
description="HuggingFace datasets cache directory"
)
tasks_base_dir: str = Field(
default="",
description="Base directory containing task_* folders. If empty, uses paths from dataset."
)
# Test execution
test_timeout_s: int = Field(default=60, description="Test execution timeout (seconds)")
# Docker image fallback
default_docker_image: str = Field(
default="ubuntu:22.04",
description="Default Docker image if container.def parsing fails"
)
# Agent defaults
max_agent_turns: int = Field(default=32, description="Max turns for agent (increased for long traces)")
# Evaluation settings
num_eval_tasks: int = Field(
default=10,
description="Number of tasks to run during periodic evaluation"
)
eval_split_ratio: float = Field(
default=0.1,
description="Fraction of dataset to hold out for evaluation (0.0-1.0)"
)
class EndlessTerminalsEnv(HermesAgentBaseEnv):
"""
Endless Terminals environment using pre-generated HuggingFace dataset.
Loads terminal tasks from dataset, runs agent with terminal tools,
and scores by executing tests in the agent's sandbox using ToolContext.
"""
name = "endless_terminals_env"
env_config_cls = EndlessTerminalsEnvConfig
@classmethod
def config_init(cls) -> Tuple[EndlessTerminalsEnvConfig, List["APIServerConfig"]]:
"""
Default configuration for Endless Terminals environment.
This is used when no config file is provided, but note that when using
--config, the YAML is loaded differently and this may not be called.
"""
from atroposlib.envs.server_handling.server_manager import APIServerConfig
env_config = EndlessTerminalsEnvConfig(
enabled_toolsets=["terminal", "file"],
max_agent_turns=32,
terminal_backend="local",
use_dataset=True,
tasks_base_dir="",
group_size=1,
total_steps=1,
use_wandb=False,
)
server_configs = [
APIServerConfig(
base_url="https://openrouter.ai/api/v1",
model_name="anthropic/claude-sonnet-4.5",
server_type="openai",
api_key=os.getenv("OPENROUTER_API_KEY", ""),
health_check=False,
)
]
return env_config, server_configs
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
self._dataset = None
self._train_dataset = None
self._eval_dataset = None
self._dataset_indices = []
self._current_index = 0
# Metrics tracking for wandb - single buffer with dicts
self._metrics_buffer = []
# Debug: check server config
if hasattr(self, 'server') and hasattr(self.server, 'servers'):
for i, srv in enumerate(self.server.servers):
logger.debug(f"Server {i}: model_name={getattr(srv.config, 'model_name', 'NONE')}")
async def setup(self):
"""Load dataset from HuggingFace or local directory."""
if not self.config.use_dataset:
logger.info("Using procedural task generation (not implemented yet)")
return
# If tasks_base_dir is set, load from local directory instead of HuggingFace
if self.config.tasks_base_dir:
tasks_base = Path(os.path.expanduser(self.config.tasks_base_dir))
# Resolve to absolute path if relative
if not tasks_base.is_absolute():
tasks_base = Path.cwd() / tasks_base
tasks_base = tasks_base.resolve()
if not tasks_base.exists():
raise RuntimeError(f"tasks_base_dir not found: {tasks_base}")
logger.info(f"Loading tasks from local directory: {tasks_base}")
# Find all task_* directories
task_dirs = sorted(tasks_base.glob("task_*"))
logger.info(f"Found {len(task_dirs)} task directories")
if not task_dirs:
# Debug: show what's actually in the directory
all_items = list(tasks_base.iterdir())
logger.warning(f"Directory contains {len(all_items)} items:")
for item in all_items[:10]:
logger.warning(f" - {item.name} ({'dir' if item.is_dir() else 'file'})")
raise RuntimeError(f"No task_* directories found in {tasks_base}")
# Create fake dataset items (just the directory paths)
self._dataset = [
{
"description": f"Task from {task_dir.name}",
"extra_info": {"task_dir": str(task_dir)},
}
for task_dir in task_dirs
]
logger.info(f"Loaded {len(self._dataset)} tasks from local directory")
self._split_dataset()
return
# Otherwise, load from HuggingFace
logger.info(f"Loading dataset from HuggingFace: {self.config.dataset_name}")
try:
from datasets import load_dataset
self._dataset = await asyncio.get_event_loop().run_in_executor(
None,
lambda: load_dataset(
self.config.dataset_name,
split=self.config.dataset_split,
cache_dir=os.path.expanduser(self.config.dataset_cache_dir)
)
)
logger.info(f"Loaded {len(self._dataset)} tasks from HuggingFace")
self._split_dataset()
except Exception as e:
logger.error(f"ERROR loading dataset: {e}")
raise
def _split_dataset(self):
"""Split dataset into train and eval sets based on eval_split_ratio."""
if self._dataset is None or len(self._dataset) == 0:
raise RuntimeError("Cannot split empty dataset")
total_size = len(self._dataset)
eval_size = int(total_size * self.config.eval_split_ratio)
train_size = total_size - eval_size
all_indices = list(range(total_size))
random.shuffle(all_indices)
train_indices = all_indices[:train_size]
eval_indices = all_indices[train_size:]
if isinstance(self._dataset, list):
self._train_dataset = [self._dataset[i] for i in train_indices]
self._eval_dataset = [self._dataset[i] for i in eval_indices]
else:
self._train_dataset = self._dataset.select(train_indices)
self._eval_dataset = self._dataset.select(eval_indices)
self._dataset_indices = list(range(len(self._train_dataset)))
random.shuffle(self._dataset_indices)
self._current_index = 0
logger.info(
f"Split dataset: {len(self._train_dataset)} train, "
f"{len(self._eval_dataset)} eval "
f"(ratio={self.config.eval_split_ratio:.1%})"
)
async def get_next_item(self) -> Item:
"""Sample next task from training dataset."""
if self._train_dataset is None:
raise RuntimeError("Dataset not loaded. Call setup() first.")
# Get next task (with wraparound)
idx = self._dataset_indices[self._current_index]
task = self._train_dataset[idx]
# Advance to next task
self._current_index += 1
if self._current_index >= len(self._dataset_indices):
# Reshuffle for next epoch
random.shuffle(self._dataset_indices)
self._current_index = 0
logger.info("Reshuffled dataset (completed one epoch)")
# Extract task directory path
task_dir = task.get("extra_info", {}).get("task_dir")
if not task_dir:
task_dir = task.get("reward_spec", {}).get("ground_truth")
# Resolve task directory path
if task_dir:
task_dir_path = Path(task_dir)
# If tasks_base_dir is configured and path doesn't exist, reconstruct it
if self.config.tasks_base_dir and not task_dir_path.exists():
original_path = Path(task_dir)
task_name = original_path.name
task_dir_path = Path(os.path.expanduser(self.config.tasks_base_dir)) / task_name
else:
logger.error("No task directory path found in dataset item")
return await self.get_next_item()
# Verify directory exists
if not task_dir_path.exists():
logger.warning(f"Task dir not found: {task_dir_path}")
logger.warning("Hint: Set tasks_base_dir to directory containing task_* folders")
return await self.get_next_item() # Try next task
# Look for test file in tests/ subdirectory first, then at root
final_test = task_dir_path / "tests" / "test_final_state.py"
if not final_test.exists():
final_test = task_dir_path / "test_final_state.py"
# Verify test file exists
if not final_test.exists():
logger.warning(f"Missing test file in {task_dir_path} (checked tests/ and root)")
return await self.get_next_item()
# Parse container.def to extract Docker image
# Check environment/ subdirectory first, then root
container_def = task_dir_path / "environment" / "container.def"
if not container_def.exists():
container_def = task_dir_path / "container.def"
docker_image = self._parse_docker_image_from_def(container_def)
# Try to load description from instruction.md or task.json
description = task.get("description", "")
# First try instruction.md
instruction_md = task_dir_path / "instruction.md"
if not description and instruction_md.exists():
try:
description = instruction_md.read_text().strip()
except Exception as e:
logger.warning(f"Failed to load instruction.md for {task_dir_path.name}: {e}")
# Fallback to task.json in environment/
if not description:
task_json = task_dir_path / "environment" / "task.json"
if task_json.exists():
try:
import json
task_data = json.loads(task_json.read_text())
description = task_data.get("description", "") or task_data.get("instruction", "")
except Exception as e:
logger.warning(f"Failed to load task.json for {task_dir_path.name}: {e}")
if not description:
description = f"Complete the task in {task_dir_path.name}"
return {
"task_id": f"{task_dir_path.name}",
"task_name": task_dir_path.name,
"description": description,
"task_dir": str(task_dir_path),
"final_test": str(final_test),
"docker_image": docker_image,
"dataset_index": idx,
}
def format_prompt(self, item: Item) -> str:
"""Return the task description for the agent."""
return str(item.get("description", ""))
def _parse_docker_image_from_def(self, container_def_path: Path) -> str:
"""
Parse container.def file to extract the Docker base image.
Apptainer definition files typically look like:
Bootstrap: docker
From: ubuntu:22.04
Returns the image from the "From:" line, or falls back to default.
"""
if not container_def_path.exists():
logger.warning(f"container.def not found at {container_def_path}, using default image")
return self.config.default_docker_image
try:
content = container_def_path.read_text()
# Look for "From: <image>" line (case-insensitive)
match = re.search(r'^From:\s*(.+)$', content, re.MULTILINE | re.IGNORECASE)
if match:
image = match.group(1).strip()
logger.info(f"Extracted Docker image from container.def: {image}")
return image
except Exception as e:
logger.warning(f"Failed to parse {container_def_path}: {e}")
logger.warning(f"Could not extract image from {container_def_path}, using default")
return self.config.default_docker_image
async def collect_trajectory(
self, item: Item
) -> Tuple[Optional[ScoredDataItem], List[Item]]:
"""
Override to register per-task Docker image before running the agent.
Follows Terminal Bench 2 pattern: register_task_env_overrides() tells
the hermes-agent terminal backend to use a specific Docker image for
this task_id.
This is a copy of HermesAgentBaseEnv.collect_trajectory with Docker
image registration added after task_id generation.
"""
import uuid
from environments.agent_loop import HermesAgentLoop
task_id = str(uuid.uuid4())
task_name = item.get("task_name", "unknown")
docker_image = item.get("docker_image", self.config.default_docker_image)
logger.debug(f"collect_trajectory START for {task_name}")
# Register Docker image override for this task_id
logger.debug(f"Registering Docker image: {docker_image}")
register_task_env_overrides(task_id, {"modal_image": docker_image})
logger.info(
f"Task {task_name}: registered Docker image {docker_image} for task_id {task_id[:8]}"
)
logger.debug("Docker image registered")
try:
# Get group-level tools (resolved once in collect_trajectories)
logger.debug("Resolving tools...")
if self._current_group_tools is None:
tools, valid_names = self._resolve_tools_for_group()
else:
tools, valid_names = self._current_group_tools
logger.debug(f"Tools resolved: {len(tools)} tools")
# Build initial messages
logger.debug("Building initial messages...")
messages: List[Dict[str, Any]] = []
if self.config.system_prompt:
messages.append({"role": "system", "content": self.config.system_prompt})
messages.append({"role": "user", "content": self.format_prompt(item)})
logger.debug("Messages built, starting agent loop...")
# Run the agent loop
result: AgentResult
managed_state: Optional[Dict[str, Any]] = None
if self._use_managed_server():
# Phase 2: ManagedServer with parser
from environments.tool_call_parsers import get_parser
try:
tc_parser = get_parser(self.config.tool_call_parser)
except KeyError:
logger.warning(
"Tool call parser '%s' not found, falling back to 'hermes'",
self.config.tool_call_parser,
)
tc_parser = get_parser("hermes")
try:
async with self.server.managed_server(
tokenizer=self.tokenizer,
tool_call_parser=tc_parser,
) as managed:
agent = HermesAgentLoop(
server=managed,
tool_schemas=tools,
valid_tool_names=valid_names,
max_turns=self.config.max_agent_turns,
task_id=task_id,
temperature=self.config.agent_temperature,
max_tokens=self.config.max_token_length,
extra_body=self.config.extra_body,
)
result = await agent.run(messages)
# Get state directly from managed server while still in context
managed_state = managed.get_state()
except NotImplementedError:
# DummyManagedServer not allowed
logger.warning("ManagedServer not available. Falling back to direct server mode.")
agent = HermesAgentLoop(
server=self.server,
tool_schemas=tools,
valid_tool_names=valid_names,
max_turns=self.config.max_agent_turns,
task_id=task_id,
temperature=self.config.agent_temperature,
max_tokens=self.config.max_token_length,
extra_body=self.config.extra_body,
)
result = await agent.run(messages)
else:
# Phase 1: OpenAI server
agent = HermesAgentLoop(
server=self.server,
tool_schemas=tools,
valid_tool_names=valid_names,
max_turns=self.config.max_agent_turns,
task_id=task_id,
temperature=self.config.agent_temperature,
max_tokens=self.config.max_token_length,
extra_body=self.config.extra_body,
)
result = await agent.run(messages)
# Skip reward computation if agent produced no output
only_system_and_user = all(
msg.get("role") in ("system", "user") for msg in result.messages
)
if result.turns_used == 0 or only_system_and_user:
logger.warning(
"Agent loop produced no output (turns=%d). Skipping trajectory.",
result.turns_used,
)
# Return None to skip this trajectory (likely an API failure)
return None, []
else:
# Compute reward using ToolContext
ctx = ToolContext(task_id)
try:
reward = await self.compute_reward(item, result, ctx)
except Exception as e:
logger.error("compute_reward failed: %s", e)
reward = 0.0
finally:
ctx.cleanup()
# Track metrics for wandb logging
task_metrics = {
"test_passed": 1.0 if reward > 0.5 else 0.0,
"reward": reward,
"turns_used": result.turns_used,
"finished_naturally": result.finished_naturally,
"docker_image": docker_image,
"num_tool_errors": len(result.tool_errors),
}
# Include detailed tool errors if any occurred
if result.tool_errors:
task_metrics["tool_errors"] = [
{
"turn": err.turn,
"tool": err.tool_name,
"error": err.error[:200],
}
for err in result.tool_errors
]
self._metrics_buffer.append(task_metrics)
# ============================================================================
# Build ScoredDataGroup from ManagedServer state
# ============================================================================
# Phase 2: Extract pre-computed data from SequenceNodes
# We may have multiple trajectories in the nodes due to how interesting
# agents can be, so iterate through all nodes and return multiple sequences.
#
# Each SequenceNode contains:
# - tokens: Full unmasked token sequence [1, 2, 3, ..., N]
# - masked_tokens: Training format [-100, -100, ..., -100, actual, actual, ...]
# - logprobs: Training format [1.0, 1.0, ..., 1.0, -0.5, -0.3, ...]
# - full_text: Complete text (prompt + all completions)
#
# Phase 1: Create placeholder tokens for OpenAI-style servers
# ============================================================================
nodes = (managed_state or {}).get("nodes", []) if managed_state else []
# Create ScoredDataGroup with lists for multiple trajectories
scored_group = ScoredDataGroup()
scored_group["tokens"] = []
scored_group["masks"] = []
scored_group["scores"] = []
scored_group["messages"] = []
scored_group["inference_logprobs"] = []
if nodes:
# Phase 2: iterate through all nodes (may have multiple trajectories)
for i, node in enumerate(nodes):
scored_group["tokens"].append(node.tokens)
scored_group["masks"].append(node.masked_tokens)
scored_group["scores"].append(reward)
scored_group["messages"].append(result.messages)
if hasattr(node, "logprobs") and node.logprobs:
scored_group["inference_logprobs"].append(node.logprobs)
else:
# Placeholder logprobs if not available
scored_group["inference_logprobs"].append([1.0] * len(node.tokens))
logger.debug(f"Added trajectory {i+1}/{len(nodes)} with {len(node.tokens)} tokens")
else:
# Phase 1: create placeholder tokens for OpenAI-style servers
full_text = "\n".join(
msg.get("content", "") for msg in result.messages if msg.get("content")
)
if self.tokenizer:
tokens = self.tokenizer.encode(full_text, add_special_tokens=True)
else:
tokens = list(range(min(len(full_text) // 4, 128)))
scored_group["tokens"].append(tokens)
scored_group["masks"].append([-100] + tokens[1:])
scored_group["scores"].append(reward)
scored_group["messages"].append(result.messages)
scored_group["inference_logprobs"].append([1.0] * len(tokens))
# Return None if no trajectories collected
if len(scored_group["tokens"]) == 0:
return None, []
logger.debug(f"Returning ScoredDataGroup with {len(scored_group['tokens'])} trajectories")
return scored_group, []
finally:
# Clean up task overrides and sandbox
clear_task_env_overrides(task_id)
try:
cleanup_vm(task_id)
except Exception as e:
logger.debug(f"VM cleanup for {task_id[:8]}: {e}")
async def compute_reward(
self,
item: Item,
result: AgentResult,
ctx: ToolContext
) -> float:
"""
Run final tests in the agent's sandbox and return binary reward.
Uses ToolContext to execute pytest in the SAME sandbox the agent used,
following the Terminal Bench 2 verification pattern. No separate
Apptainer execution needed.
Returns 1.0 if tests pass, 0.0 otherwise.
"""
task_name = item.get("task_name", "unknown")
final_test_path = Path(item.get("final_test", ""))
if not final_test_path.exists():
logger.error(f"Task {task_name}: test file not found at {final_test_path}")
return 0.0
logger.info(f"Task {task_name}: running tests in sandbox...")
try:
# Run tests in a thread to avoid blocking the event loop
loop = asyncio.get_event_loop()
reward = await loop.run_in_executor(
None,
self._run_tests_in_sandbox,
final_test_path,
ctx,
task_name,
)
status = "PASS" if reward == 1.0 else "FAIL"
logger.info(f"Task {task_name}: {status} (reward={reward})")
return reward
except Exception as e:
logger.error(f"Task {task_name}: test execution failed: {e}", exc_info=True)
return 0.0
def _run_tests_in_sandbox(
self,
test_file_path: Path,
ctx: ToolContext,
task_name: str,
) -> float:
"""
Upload test file to sandbox and execute pytest.
Runs in thread pool (via run_in_executor) to avoid blocking the event loop
with synchronous ToolContext calls.
Args:
test_file_path: Local path to test_final_state.py
ctx: ToolContext scoped to the agent's sandbox
task_name: For logging
Returns:
1.0 if tests pass, 0.0 otherwise
"""
try:
# Upload test file to sandbox
test_content = test_file_path.read_text()
ctx.write_file("/workspace/test_final_state.py", test_content)
logger.debug(f"Task {task_name}: uploaded test file to /workspace/test_final_state.py")
# Run pytest in the sandbox
result = ctx.terminal(
"cd /workspace && python -m pytest -q test_final_state.py",
timeout=self.config.test_timeout_s,
)
exit_code = result.get("exit_code", -1)
output = result.get("output", "")
if exit_code == 0:
logger.debug(f"Task {task_name}: tests passed")
return 1.0
else:
# Log failure output (last 500 chars for debugging)
output_preview = output[-500:] if output else "(no output)"
logger.info(
f"Task {task_name}: tests failed (exit_code={exit_code})\n{output_preview}"
)
return 0.0
except Exception as e:
logger.error(f"Task {task_name}: error running tests: {e}")
return 0.0
async def evaluate(self):
"""
Periodic evaluation on holdout eval set.
Runs the agent on num_eval_tasks from the held-out eval set
(never seen during training). Returns metrics for wandb logging.
"""
if self._eval_dataset is None:
logger.warning("Cannot evaluate: eval dataset not loaded")
return {}
if len(self._eval_dataset) == 0:
logger.warning("Eval dataset is empty")
return {}
# Use min of num_eval_tasks and actual eval set size
num_tasks = min(self.config.num_eval_tasks, len(self._eval_dataset))
logger.info(f"Starting evaluation on {num_tasks} held-out tasks...")
eval_metrics = {
"rewards": [],
"passes": [],
"turns": [],
"natural_finishes": [],
}
# Sample from eval set (holdout)
import random
eval_indices = random.sample(range(len(self._eval_dataset)), num_tasks)
for idx in eval_indices:
task = self._eval_dataset[idx]
# Build item using same logic as get_next_item
task_dir = task.get("extra_info", {}).get("task_dir")
if not task_dir:
task_dir = task.get("reward_spec", {}).get("ground_truth")
if not task_dir:
continue
task_dir_path = Path(task_dir)
if self.config.tasks_base_dir and not task_dir_path.exists():
original_path = Path(task_dir)
task_name = original_path.name
task_dir_path = Path(os.path.expanduser(self.config.tasks_base_dir)) / task_name
if not task_dir_path.exists():
continue
# Find test file
final_test = task_dir_path / "tests" / "test_final_state.py"
if not final_test.exists():
final_test = task_dir_path / "test_final_state.py"
if not final_test.exists():
continue
# Parse Docker image
container_def = task_dir_path / "environment" / "container.def"
if not container_def.exists():
container_def = task_dir_path / "container.def"
docker_image = self._parse_docker_image_from_def(container_def)
# Load description
description = task.get("description", "")
instruction_md = task_dir_path / "instruction.md"
if not description and instruction_md.exists():
try:
description = instruction_md.read_text().strip()
except Exception:
pass
item = {
"description": description,
"final_test": str(final_test),
"docker_image": docker_image,
}
# Run agent on this task
try:
import uuid
task_id = str(uuid.uuid4())
# Register task environment
from model_tools import register_task_env_overrides
register_task_env_overrides(task_id, {"modal_image": docker_image})
# Build messages
messages = [
{"role": "system", "content": self.config.system_prompt},
{"role": "user", "content": description or "Complete the task."},
]
# Get tools
from model_tools import get_tool_definitions
tools = get_tool_definitions(self.config.enabled_toolsets)
valid_names = {t["function"]["name"] for t in tools}
# Run agent
from environments.agent_loop import HermesAgentLoop
agent = HermesAgentLoop(
server=self.server,
tool_schemas=tools,
valid_tool_names=valid_names,
max_turns=self.config.max_agent_turns,
task_id=task_id,
temperature=self.config.agent_temperature,
max_tokens=self.config.max_token_length,
extra_body=self.config.extra_body,
)
result = await agent.run(messages)
# Compute reward
from environments.tool_context import ToolContext
ctx = ToolContext(task_id)
try:
reward = await self.compute_reward(item, result, ctx)
except Exception as e:
logger.warning(f"Eval reward computation failed: {e}")
reward = 0.0
finally:
ctx.cleanup()
# Track metrics
eval_metrics["rewards"].append(reward)
eval_metrics["passes"].append(1.0 if reward > 0.5 else 0.0)
eval_metrics["turns"].append(result.turns_used)
eval_metrics["natural_finishes"].append(1.0 if result.finished_naturally else 0.0)
except Exception as e:
logger.error(f"Eval task failed: {e}")
continue
finally:
# Cleanup
from model_tools import clear_task_env_overrides, cleanup_vm
clear_task_env_overrides(task_id)
cleanup_vm(task_id)
# Aggregate metrics
if not eval_metrics["rewards"]:
logger.warning("No eval tasks completed successfully")
return {}
aggregated = {
"eval/pass_rate": sum(eval_metrics["passes"]) / len(eval_metrics["passes"]),
"eval/avg_reward": sum(eval_metrics["rewards"]) / len(eval_metrics["rewards"]),
"eval/avg_turns": sum(eval_metrics["turns"]) / len(eval_metrics["turns"]),
"eval/natural_finish_rate": sum(eval_metrics["natural_finishes"]) / len(eval_metrics["natural_finishes"]),
"eval/num_tasks": len(eval_metrics["rewards"]),
}
logger.info(f"Evaluation complete: pass_rate={aggregated['eval/pass_rate']:.2%}, avg_turns={aggregated['eval/avg_turns']:.1f}")
return aggregated
async def wandb_log(self, wandb_metrics: Optional[Dict] = None):
"""Log Endless Terminals specific metrics to wandb."""
if wandb_metrics is None:
wandb_metrics = {}
# Aggregate metrics from buffer
if self._metrics_buffer:
# Test pass rate
test_passes = [m["test_passed"] for m in self._metrics_buffer]
wandb_metrics["endless_terminals/test_pass_rate"] = sum(test_passes) / len(test_passes)
wandb_metrics["endless_terminals/num_tests_passed"] = sum(test_passes)
wandb_metrics["endless_terminals/num_tests_total"] = len(test_passes)
# Turns used statistics
turns = [m["turns_used"] for m in self._metrics_buffer]
wandb_metrics["endless_terminals/avg_turns_used"] = sum(turns) / len(turns)
wandb_metrics["endless_terminals/max_turns_used"] = max(turns)
wandb_metrics["endless_terminals/min_turns_used"] = min(turns)
# Natural finish rate (did model stop on its own vs hitting max turns)
natural_finishes = [1.0 if m["finished_naturally"] else 0.0 for m in self._metrics_buffer]
wandb_metrics["endless_terminals/natural_finish_rate"] = sum(natural_finishes) / len(natural_finishes)
# Tool error statistics
total_tool_errors = sum(m["num_tool_errors"] for m in self._metrics_buffer)
wandb_metrics["endless_terminals/total_tool_errors"] = total_tool_errors
wandb_metrics["endless_terminals/avg_tool_errors_per_task"] = total_tool_errors / len(self._metrics_buffer)
# Docker image distribution (count unique images used)
docker_images = [m["docker_image"] for m in self._metrics_buffer]
unique_images = set(docker_images)
wandb_metrics["endless_terminals/num_unique_docker_images"] = len(unique_images)
# Log most common errors if any
all_errors = []
for m in self._metrics_buffer:
if "tool_errors" in m:
all_errors.extend(m["tool_errors"])
if all_errors:
# Count error types
error_tools = {}
for err in all_errors:
tool = err["tool"]
error_tools[tool] = error_tools.get(tool, 0) + 1
# Log top 3 error-prone tools
for i, (tool, count) in enumerate(sorted(error_tools.items(), key=lambda x: x[1], reverse=True)[:3]):
wandb_metrics[f"endless_terminals/errors_by_tool/{tool}"] = count
# Clear buffer after logging
self._metrics_buffer = []
await super().wandb_log(wandb_metrics)
if __name__ == "__main__":
EndlessTerminalsEnv.cli()
+59 -2
View File
@@ -117,6 +117,18 @@ class HermesAgentEnvConfig(BaseEnvConfig):
description="Terminal backend: 'local', 'docker', 'modal', 'ssh', 'singularity'. "
"Modal recommended for production RL (cloud isolation per rollout).",
)
terminal_timeout: int = Field(
default=120,
description="Per-command timeout in seconds for terminal tool calls. "
"Commands exceeding this are killed. Increase for tasks with long-running "
"commands (compilation, pip install, etc.).",
)
terminal_lifetime: int = Field(
default=3600,
description="Sandbox inactivity lifetime in seconds. The cleanup thread kills "
"sandboxes that have been idle longer than this. Must be longer than "
"the longest gap between tool calls (e.g., waiting for LLM response).",
)
# --- Dataset ---
dataset_name: Optional[str] = Field(
@@ -132,6 +144,14 @@ class HermesAgentEnvConfig(BaseEnvConfig):
description="Which field in the dataset contains the prompt.",
)
# --- Thread pool ---
tool_pool_size: int = Field(
default=128,
description="Thread pool size for tool execution. Each concurrent task needs a "
"thread for tool calls. Must be large enough for parallel evaluation. "
"Too small = thread pool starvation.",
)
# --- Phase 2: Tool call parsing ---
tool_call_parser: str = Field(
default="hermes",
@@ -140,6 +160,22 @@ class HermesAgentEnvConfig(BaseEnvConfig):
"Options: hermes, mistral, llama3_json, qwen, deepseek_v3, etc.",
)
# --- Provider-specific parameters ---
# Passed as extra_body to the OpenAI client's chat.completions.create() call.
# Useful for OpenRouter provider preferences, transforms, route settings, etc.
# Example YAML:
# extra_body:
# provider:
# ignore: ["DeepInfra", "Fireworks"]
# order: ["Together"]
# transforms: ["middle-out"]
extra_body: Optional[Dict[str, Any]] = Field(
default=None,
description="Extra body parameters passed to the OpenAI client's "
"chat.completions.create(). Used for OpenRouter provider preferences, "
"transforms, and other provider-specific settings.",
)
class HermesAgentBaseEnv(BaseEnv):
"""
@@ -175,10 +211,23 @@ class HermesAgentBaseEnv(BaseEnv):
):
super().__init__(config, server_configs, slurm, testing)
# Set terminal backend environment variable so hermes tools pick it up
# Set terminal environment variables so hermes tools pick them up.
# These can all be overridden per-environment via config fields instead
# of requiring users to set shell env vars.
if config.terminal_backend:
os.environ["TERMINAL_ENV"] = config.terminal_backend
print(f"🖥️ Terminal backend: {config.terminal_backend}")
os.environ["TERMINAL_TIMEOUT"] = str(config.terminal_timeout)
os.environ["TERMINAL_LIFETIME_SECONDS"] = str(config.terminal_lifetime)
print(
f"🖥️ Terminal: backend={config.terminal_backend}, "
f"timeout={config.terminal_timeout}s, lifetime={config.terminal_lifetime}s"
)
# Resize the agent loop's thread pool for tool execution.
# This must be large enough for the number of concurrent tasks
# (e.g., 89 parallel TB2 eval tasks each need a thread for tool calls).
from environments.agent_loop import resize_tool_pool
resize_tool_pool(config.tool_pool_size)
# Current group's resolved tools (set in collect_trajectories)
self._current_group_tools: Optional[Tuple[List[Dict], Set[str]]] = None
@@ -209,6 +258,11 @@ class HermesAgentBaseEnv(BaseEnv):
logger.info("Sampled toolsets from '%s': %s", config.distribution, group_toolsets)
else:
group_toolsets = config.enabled_toolsets # None means "all available"
if group_toolsets is None:
logger.warning(
"enabled_toolsets is None -- loading ALL tools including messaging. "
"Set explicit enabled_toolsets for RL training."
)
tools = get_tool_definitions(
enabled_toolsets=group_toolsets,
@@ -437,6 +491,7 @@ class HermesAgentBaseEnv(BaseEnv):
task_id=task_id,
temperature=self.config.agent_temperature,
max_tokens=self.config.max_token_length,
extra_body=self.config.extra_body,
)
result = await agent.run(messages)
except NotImplementedError:
@@ -453,6 +508,7 @@ class HermesAgentBaseEnv(BaseEnv):
task_id=task_id,
temperature=self.config.agent_temperature,
max_tokens=self.config.max_token_length,
extra_body=self.config.extra_body,
)
result = await agent.run(messages)
else:
@@ -465,6 +521,7 @@ class HermesAgentBaseEnv(BaseEnv):
task_id=task_id,
temperature=self.config.agent_temperature,
max_tokens=self.config.max_token_length,
extra_body=self.config.extra_body,
)
result = await agent.run(messages)
@@ -4,7 +4,8 @@
# Uses terminal + file + web toolsets.
#
# Usage:
# python environments/hermes_swe_env.py serve --config environments/configs/swe_default.yaml
# python environments/hermes_swe_env/hermes_swe_env.py serve \
# --config environments/hermes_swe_env/default.yaml
env:
enabled_toolsets: ["terminal", "file", "web"]
@@ -36,7 +36,7 @@ from pathlib import Path
from typing import Any, Dict, List, Optional, Tuple, Union
# Ensure repo root is on sys.path for imports
_repo_root = Path(__file__).resolve().parent.parent
_repo_root = Path(__file__).resolve().parent.parent.parent
if str(_repo_root) not in sys.path:
sys.path.insert(0, str(_repo_root))
@@ -6,9 +6,8 @@
#
# Usage:
# run-api
# python environments/terminal_test_env.py serve
# # Or with config file:
# python environments/terminal_test_env.py serve --config environments/configs/terminal_test_default.yaml
# python environments/terminal_test_env/terminal_test_env.py serve \
# --config environments/terminal_test_env/default.yaml
env:
enabled_toolsets: ["terminal", "file"]
@@ -36,7 +36,7 @@ from pathlib import Path
from typing import Any, Dict, List, Optional, Tuple, Union
# Ensure repo root is on sys.path for imports
_repo_root = Path(__file__).resolve().parent.parent
_repo_root = Path(__file__).resolve().parent.parent.parent
if str(_repo_root) not in sys.path:
sys.path.insert(0, str(_repo_root))
+187 -3
View File
@@ -129,11 +129,14 @@ class ToolContext:
def write_file(self, path: str, content: str) -> Dict[str, Any]:
"""
Write a file in the rollout's filesystem.
Write a TEXT file in the rollout's filesystem.
Uses a shell heredoc under the hood, so this is only safe for text content.
For binary files (images, compiled artifacts, etc.), use upload_file() instead.
Args:
path: File path to write
content: Content to write
content: Text content to write
Returns:
Dict with success status or error
@@ -146,6 +149,177 @@ class ToolContext:
except json.JSONDecodeError:
return {"error": result}
def upload_file(self, local_path: str, remote_path: str) -> Dict[str, Any]:
"""
Upload a local file to the rollout's sandbox (binary-safe).
Unlike write_file() which passes content through a shell heredoc (text-only),
this method base64-encodes the file and decodes it inside the sandbox.
Safe for any file type: binaries, images, archives, etc.
For large files (>1MB), the content is split into chunks to avoid
hitting shell command-length limits.
Args:
local_path: Path to a local file on the host
remote_path: Destination path inside the sandbox
Returns:
Dict with 'exit_code' and 'output'
"""
import base64
from pathlib import Path as _Path
local = _Path(local_path)
if not local.exists():
return {"exit_code": -1, "output": f"Local file not found: {local_path}"}
raw = local.read_bytes()
b64 = base64.b64encode(raw).decode("ascii")
# Ensure parent directory exists in the sandbox
parent = str(_Path(remote_path).parent)
if parent not in (".", "/"):
self.terminal(f"mkdir -p {parent}", timeout=10)
# For small files, single command is fine
chunk_size = 60_000 # ~60KB per chunk (well within shell limits)
if len(b64) <= chunk_size:
result = self.terminal(
f"printf '%s' '{b64}' | base64 -d > {remote_path}",
timeout=30,
)
else:
# For larger files, write base64 in chunks then decode
tmp_b64 = "/tmp/_hermes_upload.b64"
self.terminal(f": > {tmp_b64}", timeout=5) # truncate
for i in range(0, len(b64), chunk_size):
chunk = b64[i : i + chunk_size]
self.terminal(f"printf '%s' '{chunk}' >> {tmp_b64}", timeout=15)
result = self.terminal(
f"base64 -d {tmp_b64} > {remote_path} && rm -f {tmp_b64}",
timeout=30,
)
return result
def upload_dir(self, local_dir: str, remote_dir: str) -> List[Dict[str, Any]]:
"""
Upload an entire local directory to the rollout's sandbox (binary-safe).
Recursively uploads all files, preserving directory structure.
Args:
local_dir: Path to a local directory on the host
remote_dir: Destination directory inside the sandbox
Returns:
List of results, one per file uploaded
"""
from pathlib import Path as _Path
local = _Path(local_dir)
if not local.exists() or not local.is_dir():
return [{"exit_code": -1, "output": f"Local directory not found: {local_dir}"}]
results = []
for file_path in sorted(local.rglob("*")):
if file_path.is_file():
relative = file_path.relative_to(local)
target = f"{remote_dir}/{relative}"
results.append(self.upload_file(str(file_path), target))
return results
def download_file(self, remote_path: str, local_path: str) -> Dict[str, Any]:
"""
Download a file from the rollout's sandbox to the host (binary-safe).
The inverse of upload_file(). Base64-encodes the file inside the sandbox,
reads the encoded data through the terminal, and decodes it locally.
Safe for any file type.
Args:
remote_path: Path to the file inside the sandbox
local_path: Destination path on the host
Returns:
Dict with 'success' (bool) and 'bytes' (int) or 'error' (str)
"""
import base64
from pathlib import Path as _Path
# Base64-encode the file inside the sandbox and capture output
result = self.terminal(
f"base64 {remote_path} 2>/dev/null",
timeout=30,
)
if result.get("exit_code", -1) != 0:
return {
"success": False,
"error": f"Failed to read remote file: {result.get('output', '')}",
}
b64_data = result.get("output", "").strip()
if not b64_data:
return {"success": False, "error": f"Remote file is empty or missing: {remote_path}"}
try:
raw = base64.b64decode(b64_data)
except Exception as e:
return {"success": False, "error": f"Base64 decode failed: {e}"}
# Write to local host filesystem
local = _Path(local_path)
local.parent.mkdir(parents=True, exist_ok=True)
local.write_bytes(raw)
return {"success": True, "bytes": len(raw)}
def download_dir(self, remote_dir: str, local_dir: str) -> List[Dict[str, Any]]:
"""
Download a directory from the rollout's sandbox to the host (binary-safe).
Lists all files in the remote directory, then downloads each one.
Preserves directory structure.
Args:
remote_dir: Path to the directory inside the sandbox
local_dir: Destination directory on the host
Returns:
List of results, one per file downloaded
"""
from pathlib import Path as _Path
# List files in the remote directory
ls_result = self.terminal(
f"find {remote_dir} -type f 2>/dev/null",
timeout=15,
)
if ls_result.get("exit_code", -1) != 0:
return [{"success": False, "error": f"Failed to list remote dir: {remote_dir}"}]
file_list = ls_result.get("output", "").strip()
if not file_list:
return [{"success": False, "error": f"Remote directory is empty or missing: {remote_dir}"}]
results = []
for remote_file in file_list.splitlines():
remote_file = remote_file.strip()
if not remote_file:
continue
# Compute the relative path to preserve directory structure
if remote_file.startswith(remote_dir):
relative = remote_file[len(remote_dir):].lstrip("/")
else:
relative = _Path(remote_file).name
local_file = str(_Path(local_dir) / relative)
results.append(self.download_file(remote_file, local_file))
return results
def search(self, query: str, path: str = ".") -> Dict[str, Any]:
"""
Search for text in the rollout's filesystem.
@@ -264,11 +438,21 @@ class ToolContext:
def cleanup(self):
"""
Release all resources (terminal VMs, browser sessions) for this rollout.
Release all resources (terminal VMs, browser sessions, background processes)
for this rollout.
Called automatically by the base environment via try/finally after
compute_reward() completes. You generally don't need to call this yourself.
"""
# Kill any background processes from this rollout (safety net)
try:
from tools.process_registry import process_registry
killed = process_registry.kill_all(task_id=self.task_id)
if killed:
logger.debug("Process cleanup for task %s: killed %d process(es)", self.task_id, killed)
except Exception as e:
logger.debug("Process cleanup for task %s: %s", self.task_id, e)
try:
cleanup_vm(self.task_id)
except Exception as e:
+19 -2
View File
@@ -22,6 +22,7 @@ class Platform(Enum):
TELEGRAM = "telegram"
DISCORD = "discord"
WHATSAPP = "whatsapp"
SLACK = "slack"
@dataclass
@@ -64,7 +65,7 @@ class SessionResetPolicy:
"""
mode: str = "both" # "daily", "idle", or "both"
at_hour: int = 4 # Hour for daily reset (0-23, local time)
idle_minutes: int = 120 # Minutes of inactivity before reset
idle_minutes: int = 1440 # Minutes of inactivity before reset (24 hours)
def to_dict(self) -> Dict[str, Any]:
return {
@@ -78,7 +79,7 @@ class SessionResetPolicy:
return cls(
mode=data.get("mode", "both"),
at_hour=data.get("at_hour", 4),
idle_minutes=data.get("idle_minutes", 120),
idle_minutes=data.get("idle_minutes", 1440),
)
@@ -308,6 +309,22 @@ def _apply_env_overrides(config: GatewayConfig) -> None:
config.platforms[Platform.WHATSAPP] = PlatformConfig()
config.platforms[Platform.WHATSAPP].enabled = True
# Slack
slack_token = os.getenv("SLACK_BOT_TOKEN")
if slack_token:
if Platform.SLACK not in config.platforms:
config.platforms[Platform.SLACK] = PlatformConfig()
config.platforms[Platform.SLACK].enabled = True
config.platforms[Platform.SLACK].token = slack_token
# Home channel
slack_home = os.getenv("SLACK_HOME_CHANNEL")
if slack_home:
config.platforms[Platform.SLACK].home_channel = HomeChannel(
platform=Platform.SLACK,
chat_id=slack_home,
name=os.getenv("SLACK_HOME_CHANNEL_NAME", ""),
)
# Session settings
idle_minutes = os.getenv("SESSION_IDLE_MINUTES")
if idle_minutes:
+150
View File
@@ -0,0 +1,150 @@
"""
Event Hook System
A lightweight event-driven system that fires handlers at key lifecycle points.
Hooks are discovered from ~/.hermes/hooks/ directories, each containing:
- HOOK.yaml (metadata: name, description, events list)
- handler.py (Python handler with async def handle(event_type, context))
Events:
- gateway:startup -- Gateway process starts
- session:start -- New session created
- session:reset -- User ran /new or /reset
- agent:start -- Agent begins processing a message
- agent:step -- Each turn in the tool-calling loop
- agent:end -- Agent finishes processing
- command:* -- Any slash command executed (wildcard match)
Errors in hooks are caught and logged but never block the main pipeline.
"""
import asyncio
import importlib.util
import os
from pathlib import Path
from typing import Any, Callable, Dict, List, Optional
import yaml
HOOKS_DIR = Path(os.path.expanduser("~/.hermes/hooks"))
class HookRegistry:
"""
Discovers, loads, and fires event hooks.
Usage:
registry = HookRegistry()
registry.discover_and_load()
await registry.emit("agent:start", {"platform": "telegram", ...})
"""
def __init__(self):
# event_type -> [handler_fn, ...]
self._handlers: Dict[str, List[Callable]] = {}
self._loaded_hooks: List[dict] = [] # metadata for listing
@property
def loaded_hooks(self) -> List[dict]:
"""Return metadata about all loaded hooks."""
return list(self._loaded_hooks)
def discover_and_load(self) -> None:
"""
Scan the hooks directory for hook directories and load their handlers.
Each hook directory must contain:
- HOOK.yaml with at least 'name' and 'events' keys
- handler.py with a top-level 'handle' function (sync or async)
"""
if not HOOKS_DIR.exists():
return
for hook_dir in sorted(HOOKS_DIR.iterdir()):
if not hook_dir.is_dir():
continue
manifest_path = hook_dir / "HOOK.yaml"
handler_path = hook_dir / "handler.py"
if not manifest_path.exists() or not handler_path.exists():
continue
try:
manifest = yaml.safe_load(manifest_path.read_text(encoding="utf-8"))
if not manifest or not isinstance(manifest, dict):
print(f"[hooks] Skipping {hook_dir.name}: invalid HOOK.yaml", flush=True)
continue
hook_name = manifest.get("name", hook_dir.name)
events = manifest.get("events", [])
if not events:
print(f"[hooks] Skipping {hook_name}: no events declared", flush=True)
continue
# Dynamically load the handler module
spec = importlib.util.spec_from_file_location(
f"hermes_hook_{hook_name}", handler_path
)
if spec is None or spec.loader is None:
print(f"[hooks] Skipping {hook_name}: could not load handler.py", flush=True)
continue
module = importlib.util.module_from_spec(spec)
spec.loader.exec_module(module)
handle_fn = getattr(module, "handle", None)
if handle_fn is None:
print(f"[hooks] Skipping {hook_name}: no 'handle' function found", flush=True)
continue
# Register the handler for each declared event
for event in events:
self._handlers.setdefault(event, []).append(handle_fn)
self._loaded_hooks.append({
"name": hook_name,
"description": manifest.get("description", ""),
"events": events,
"path": str(hook_dir),
})
print(f"[hooks] Loaded hook '{hook_name}' for events: {events}", flush=True)
except Exception as e:
print(f"[hooks] Error loading hook {hook_dir.name}: {e}", flush=True)
async def emit(self, event_type: str, context: Optional[Dict[str, Any]] = None) -> None:
"""
Fire all handlers registered for an event.
Supports wildcard matching: handlers registered for "command:*" will
fire for any "command:..." event. Handlers registered for a base type
like "agent" won't fire for "agent:start" -- only exact matches and
explicit wildcards.
Args:
event_type: The event identifier (e.g. "agent:start").
context: Optional dict with event-specific data.
"""
if context is None:
context = {}
# Collect handlers: exact match + wildcard match
handlers = list(self._handlers.get(event_type, []))
# Check for wildcard patterns (e.g., "command:*" matches "command:reset")
if ":" in event_type:
base = event_type.split(":")[0]
wildcard_key = f"{base}:*"
handlers.extend(self._handlers.get(wildcard_key, []))
for fn in handlers:
try:
result = fn(event_type, context)
# Support both sync and async handlers
if asyncio.iscoroutine(result):
await result
except Exception as e:
print(f"[hooks] Error in handler for '{event_type}': {e}", flush=True)
+282
View File
@@ -0,0 +1,282 @@
"""
DM Pairing System
Code-based approval flow for authorizing new users on messaging platforms.
Instead of static allowlists with user IDs, unknown users receive a one-time
pairing code that the bot owner approves via the CLI.
Security features (based on OWASP + NIST SP 800-63-4 guidance):
- 8-char codes from 32-char unambiguous alphabet (no 0/O/1/I)
- Cryptographic randomness via secrets.choice()
- 1-hour code expiry
- Max 3 pending codes per platform
- Rate limiting: 1 request per user per 10 minutes
- Lockout after 5 failed approval attempts (1 hour)
- File permissions: chmod 0600 on all data files
- Codes are never logged to stdout
Storage: ~/.hermes/pairing/
"""
import json
import os
import secrets
import time
from pathlib import Path
from typing import Optional
# Unambiguous alphabet -- excludes 0/O, 1/I to prevent confusion
ALPHABET = "ABCDEFGHJKLMNPQRSTUVWXYZ23456789"
CODE_LENGTH = 8
# Timing constants
CODE_TTL_SECONDS = 3600 # Codes expire after 1 hour
RATE_LIMIT_SECONDS = 600 # 1 request per user per 10 minutes
LOCKOUT_SECONDS = 3600 # Lockout duration after too many failures
# Limits
MAX_PENDING_PER_PLATFORM = 3 # Max pending codes per platform
MAX_FAILED_ATTEMPTS = 5 # Failed approvals before lockout
PAIRING_DIR = Path(os.path.expanduser("~/.hermes/pairing"))
def _secure_write(path: Path, data: str) -> None:
"""Write data to file with restrictive permissions (owner read/write only)."""
path.parent.mkdir(parents=True, exist_ok=True)
path.write_text(data, encoding="utf-8")
try:
os.chmod(path, 0o600)
except OSError:
pass # Windows doesn't support chmod the same way
class PairingStore:
"""
Manages pairing codes and approved user lists.
Data files per platform:
- {platform}-pending.json : pending pairing requests
- {platform}-approved.json : approved (paired) users
- _rate_limits.json : rate limit tracking
"""
def __init__(self):
PAIRING_DIR.mkdir(parents=True, exist_ok=True)
def _pending_path(self, platform: str) -> Path:
return PAIRING_DIR / f"{platform}-pending.json"
def _approved_path(self, platform: str) -> Path:
return PAIRING_DIR / f"{platform}-approved.json"
def _rate_limit_path(self) -> Path:
return PAIRING_DIR / "_rate_limits.json"
def _load_json(self, path: Path) -> dict:
if path.exists():
try:
return json.loads(path.read_text(encoding="utf-8"))
except (json.JSONDecodeError, OSError):
return {}
return {}
def _save_json(self, path: Path, data: dict) -> None:
_secure_write(path, json.dumps(data, indent=2, ensure_ascii=False))
# ----- Approved users -----
def is_approved(self, platform: str, user_id: str) -> bool:
"""Check if a user is approved (paired) on a platform."""
approved = self._load_json(self._approved_path(platform))
return user_id in approved
def list_approved(self, platform: str = None) -> list:
"""List approved users, optionally filtered by platform."""
results = []
platforms = [platform] if platform else self._all_platforms("approved")
for p in platforms:
approved = self._load_json(self._approved_path(p))
for uid, info in approved.items():
results.append({"platform": p, "user_id": uid, **info})
return results
def _approve_user(self, platform: str, user_id: str, user_name: str = "") -> None:
"""Add a user to the approved list."""
approved = self._load_json(self._approved_path(platform))
approved[user_id] = {
"user_name": user_name,
"approved_at": time.time(),
}
self._save_json(self._approved_path(platform), approved)
def revoke(self, platform: str, user_id: str) -> bool:
"""Remove a user from the approved list. Returns True if found."""
path = self._approved_path(platform)
approved = self._load_json(path)
if user_id in approved:
del approved[user_id]
self._save_json(path, approved)
return True
return False
# ----- Pending codes -----
def generate_code(
self, platform: str, user_id: str, user_name: str = ""
) -> Optional[str]:
"""
Generate a pairing code for a new user.
Returns the code string, or None if:
- User is rate-limited (too recent request)
- Max pending codes reached for this platform
- User/platform is in lockout due to failed attempts
"""
self._cleanup_expired(platform)
# Check lockout
if self._is_locked_out(platform):
return None
# Check rate limit for this specific user
if self._is_rate_limited(platform, user_id):
return None
# Check max pending
pending = self._load_json(self._pending_path(platform))
if len(pending) >= MAX_PENDING_PER_PLATFORM:
return None
# Generate cryptographically random code
code = "".join(secrets.choice(ALPHABET) for _ in range(CODE_LENGTH))
# Store pending request
pending[code] = {
"user_id": user_id,
"user_name": user_name,
"created_at": time.time(),
}
self._save_json(self._pending_path(platform), pending)
# Record rate limit
self._record_rate_limit(platform, user_id)
return code
def approve_code(self, platform: str, code: str) -> Optional[dict]:
"""
Approve a pairing code. Adds the user to the approved list.
Returns {user_id, user_name} on success, None if code is invalid/expired.
"""
self._cleanup_expired(platform)
code = code.upper().strip()
pending = self._load_json(self._pending_path(platform))
if code not in pending:
self._record_failed_attempt(platform)
return None
entry = pending.pop(code)
self._save_json(self._pending_path(platform), pending)
# Add to approved list
self._approve_user(platform, entry["user_id"], entry.get("user_name", ""))
return {
"user_id": entry["user_id"],
"user_name": entry.get("user_name", ""),
}
def list_pending(self, platform: str = None) -> list:
"""List pending pairing requests, optionally filtered by platform."""
results = []
platforms = [platform] if platform else self._all_platforms("pending")
for p in platforms:
self._cleanup_expired(p)
pending = self._load_json(self._pending_path(p))
for code, info in pending.items():
age_min = int((time.time() - info["created_at"]) / 60)
results.append({
"platform": p,
"code": code,
"user_id": info["user_id"],
"user_name": info.get("user_name", ""),
"age_minutes": age_min,
})
return results
def clear_pending(self, platform: str = None) -> int:
"""Clear all pending requests. Returns count removed."""
count = 0
platforms = [platform] if platform else self._all_platforms("pending")
for p in platforms:
pending = self._load_json(self._pending_path(p))
count += len(pending)
self._save_json(self._pending_path(p), {})
return count
# ----- Rate limiting and lockout -----
def _is_rate_limited(self, platform: str, user_id: str) -> bool:
"""Check if a user has requested a code too recently."""
limits = self._load_json(self._rate_limit_path())
key = f"{platform}:{user_id}"
last_request = limits.get(key, 0)
return (time.time() - last_request) < RATE_LIMIT_SECONDS
def _record_rate_limit(self, platform: str, user_id: str) -> None:
"""Record the time of a pairing request for rate limiting."""
limits = self._load_json(self._rate_limit_path())
key = f"{platform}:{user_id}"
limits[key] = time.time()
self._save_json(self._rate_limit_path(), limits)
def _is_locked_out(self, platform: str) -> bool:
"""Check if a platform is in lockout due to failed approval attempts."""
limits = self._load_json(self._rate_limit_path())
lockout_key = f"_lockout:{platform}"
lockout_until = limits.get(lockout_key, 0)
return time.time() < lockout_until
def _record_failed_attempt(self, platform: str) -> None:
"""Record a failed approval attempt. Triggers lockout after MAX_FAILED_ATTEMPTS."""
limits = self._load_json(self._rate_limit_path())
fail_key = f"_failures:{platform}"
fails = limits.get(fail_key, 0) + 1
limits[fail_key] = fails
if fails >= MAX_FAILED_ATTEMPTS:
lockout_key = f"_lockout:{platform}"
limits[lockout_key] = time.time() + LOCKOUT_SECONDS
limits[fail_key] = 0 # Reset counter
print(f"[pairing] Platform {platform} locked out for {LOCKOUT_SECONDS}s "
f"after {MAX_FAILED_ATTEMPTS} failed attempts", flush=True)
self._save_json(self._rate_limit_path(), limits)
# ----- Cleanup -----
def _cleanup_expired(self, platform: str) -> None:
"""Remove expired pending codes."""
path = self._pending_path(platform)
pending = self._load_json(path)
now = time.time()
expired = [
code for code, info in pending.items()
if (now - info["created_at"]) > CODE_TTL_SECONDS
]
if expired:
for code in expired:
del pending[code]
self._save_json(path, pending)
def _all_platforms(self, suffix: str) -> list:
"""List all platforms that have data files of a given suffix."""
platforms = []
for f in PAIRING_DIR.iterdir():
if f.name.endswith(f"-{suffix}.json"):
platform = f.name.replace(f"-{suffix}.json", "")
if not platform.startswith("_"):
platforms.append(platform)
return platforms
+341 -15
View File
@@ -6,10 +6,14 @@ and implement the required methods.
"""
import asyncio
import os
import re
import uuid
from abc import ABC, abstractmethod
from dataclasses import dataclass, field
from datetime import datetime
from typing import Dict, List, Optional, Any, Callable, Awaitable
from pathlib import Path
from typing import Dict, List, Optional, Any, Callable, Awaitable, Tuple
from enum import Enum
import sys
@@ -19,6 +23,150 @@ from gateway.config import Platform, PlatformConfig
from gateway.session import SessionSource
# ---------------------------------------------------------------------------
# Image cache utilities
#
# When users send images on messaging platforms, we download them to a local
# cache directory so they can be analyzed by the vision tool (which accepts
# local file paths). This avoids issues with ephemeral platform URLs
# (e.g. Telegram file URLs expire after ~1 hour).
# ---------------------------------------------------------------------------
# Default location: ~/.hermes/image_cache/
IMAGE_CACHE_DIR = Path(os.path.expanduser("~/.hermes/image_cache"))
def get_image_cache_dir() -> Path:
"""Return the image cache directory, creating it if it doesn't exist."""
IMAGE_CACHE_DIR.mkdir(parents=True, exist_ok=True)
return IMAGE_CACHE_DIR
def cache_image_from_bytes(data: bytes, ext: str = ".jpg") -> str:
"""
Save raw image bytes to the cache and return the absolute file path.
Args:
data: Raw image bytes.
ext: File extension including the dot (e.g. ".jpg", ".png").
Returns:
Absolute path to the cached image file as a string.
"""
cache_dir = get_image_cache_dir()
filename = f"img_{uuid.uuid4().hex[:12]}{ext}"
filepath = cache_dir / filename
filepath.write_bytes(data)
return str(filepath)
async def cache_image_from_url(url: str, ext: str = ".jpg") -> str:
"""
Download an image from a URL and save it to the local cache.
Uses httpx for async download with a reasonable timeout.
Args:
url: The HTTP/HTTPS URL to download from.
ext: File extension including the dot (e.g. ".jpg", ".png").
Returns:
Absolute path to the cached image file as a string.
"""
import httpx
async with httpx.AsyncClient(timeout=30.0, follow_redirects=True) as client:
response = await client.get(
url,
headers={
"User-Agent": "Mozilla/5.0 (compatible; HermesAgent/1.0)",
"Accept": "image/*,*/*;q=0.8",
},
)
response.raise_for_status()
return cache_image_from_bytes(response.content, ext)
def cleanup_image_cache(max_age_hours: int = 24) -> int:
"""
Delete cached images older than *max_age_hours*.
Returns the number of files removed.
"""
import time
cache_dir = get_image_cache_dir()
cutoff = time.time() - (max_age_hours * 3600)
removed = 0
for f in cache_dir.iterdir():
if f.is_file() and f.stat().st_mtime < cutoff:
try:
f.unlink()
removed += 1
except OSError:
pass
return removed
# ---------------------------------------------------------------------------
# Audio cache utilities
#
# Same pattern as image cache -- voice messages from platforms are downloaded
# here so the STT tool (OpenAI Whisper) can transcribe them from local files.
# ---------------------------------------------------------------------------
AUDIO_CACHE_DIR = Path(os.path.expanduser("~/.hermes/audio_cache"))
def get_audio_cache_dir() -> Path:
"""Return the audio cache directory, creating it if it doesn't exist."""
AUDIO_CACHE_DIR.mkdir(parents=True, exist_ok=True)
return AUDIO_CACHE_DIR
def cache_audio_from_bytes(data: bytes, ext: str = ".ogg") -> str:
"""
Save raw audio bytes to the cache and return the absolute file path.
Args:
data: Raw audio bytes.
ext: File extension including the dot (e.g. ".ogg", ".mp3").
Returns:
Absolute path to the cached audio file as a string.
"""
cache_dir = get_audio_cache_dir()
filename = f"audio_{uuid.uuid4().hex[:12]}{ext}"
filepath = cache_dir / filename
filepath.write_bytes(data)
return str(filepath)
async def cache_audio_from_url(url: str, ext: str = ".ogg") -> str:
"""
Download an audio file from a URL and save it to the local cache.
Args:
url: The HTTP/HTTPS URL to download from.
ext: File extension including the dot (e.g. ".ogg", ".mp3").
Returns:
Absolute path to the cached audio file as a string.
"""
import httpx
async with httpx.AsyncClient(timeout=30.0, follow_redirects=True) as client:
response = await client.get(
url,
headers={
"User-Agent": "Mozilla/5.0 (compatible; HermesAgent/1.0)",
"Accept": "audio/*,*/*;q=0.8",
},
)
response.raise_for_status()
return cache_audio_from_bytes(response.content, ext)
class MessageType(Enum):
"""Types of incoming messages."""
TEXT = "text"
@@ -177,6 +325,123 @@ class BasePlatformAdapter(ABC):
"""
pass
async def send_image(
self,
chat_id: str,
image_url: str,
caption: Optional[str] = None,
reply_to: Optional[str] = None,
) -> SendResult:
"""
Send an image natively via the platform API.
Override in subclasses to send images as proper attachments
instead of plain-text URLs. Default falls back to sending the
URL as a text message.
"""
# Fallback: send URL as text (subclasses override for native images)
text = f"{caption}\n{image_url}" if caption else image_url
return await self.send(chat_id=chat_id, content=text, reply_to=reply_to)
@staticmethod
def extract_images(content: str) -> Tuple[List[Tuple[str, str]], str]:
"""
Extract image URLs from markdown and HTML image tags in a response.
Finds patterns like:
- ![alt text](https://example.com/image.png)
- <img src="https://example.com/image.png">
- <img src="https://example.com/image.png"></img>
Args:
content: The response text to scan.
Returns:
Tuple of (list of (url, alt_text) pairs, cleaned content with image tags removed).
"""
images = []
cleaned = content
# Match markdown images: ![alt](url)
md_pattern = r'!\[([^\]]*)\]\((https?://[^\s\)]+)\)'
for match in re.finditer(md_pattern, content):
alt_text = match.group(1)
url = match.group(2)
# Only extract URLs that look like actual images
if any(url.lower().endswith(ext) or ext in url.lower() for ext in
['.png', '.jpg', '.jpeg', '.gif', '.webp', 'fal.media', 'fal-cdn', 'replicate.delivery']):
images.append((url, alt_text))
# Match HTML img tags: <img src="url"> or <img src="url"></img> or <img src="url"/>
html_pattern = r'<img\s+src=["\']?(https?://[^\s"\'<>]+)["\']?\s*/?>\s*(?:</img>)?'
for match in re.finditer(html_pattern, content):
url = match.group(1)
images.append((url, ""))
# Remove matched image tags from content if we found images
if images:
cleaned = re.sub(md_pattern, '', cleaned)
cleaned = re.sub(html_pattern, '', cleaned)
# Clean up leftover blank lines
cleaned = re.sub(r'\n{3,}', '\n\n', cleaned).strip()
return images, cleaned
async def send_voice(
self,
chat_id: str,
audio_path: str,
caption: Optional[str] = None,
reply_to: Optional[str] = None,
) -> SendResult:
"""
Send an audio file as a native voice message via the platform API.
Override in subclasses to send audio as voice bubbles (Telegram)
or file attachments (Discord). Default falls back to sending the
file path as text.
"""
text = f"🔊 Audio: {audio_path}"
if caption:
text = f"{caption}\n{text}"
return await self.send(chat_id=chat_id, content=text, reply_to=reply_to)
@staticmethod
def extract_media(content: str) -> Tuple[List[Tuple[str, bool]], str]:
"""
Extract MEDIA:<path> tags and [[audio_as_voice]] directives from response text.
The TTS tool returns responses like:
[[audio_as_voice]]
MEDIA:/path/to/audio.ogg
Args:
content: The response text to scan.
Returns:
Tuple of (list of (path, is_voice) pairs, cleaned content with tags removed).
"""
media = []
cleaned = content
# Check for [[audio_as_voice]] directive
has_voice_tag = "[[audio_as_voice]]" in content
cleaned = cleaned.replace("[[audio_as_voice]]", "")
# Extract MEDIA:<path> tags (path may contain spaces)
media_pattern = r'MEDIA:(\S+)'
for match in re.finditer(media_pattern, content):
path = match.group(1).strip()
if path:
media.append((path, has_voice_tag))
# Remove MEDIA tags from content
if media:
cleaned = re.sub(media_pattern, '', cleaned)
cleaned = re.sub(r'\n{3,}', '\n\n', cleaned).strip()
return media, cleaned
async def _keep_typing(self, chat_id: str, interval: float = 2.0) -> None:
"""
Continuously send typing indicator until cancelled.
@@ -216,6 +481,27 @@ class BasePlatformAdapter(ABC):
# Spawn background task to process this message
asyncio.create_task(self._process_message_background(event, session_key))
@staticmethod
def _get_human_delay() -> float:
"""
Return a random delay in seconds for human-like response pacing.
Reads from env vars:
HERMES_HUMAN_DELAY_MODE: "off" (default) | "natural" | "custom"
HERMES_HUMAN_DELAY_MIN_MS: minimum delay in ms (default 800, custom mode)
HERMES_HUMAN_DELAY_MAX_MS: maximum delay in ms (default 2500, custom mode)
"""
import random
mode = os.getenv("HERMES_HUMAN_DELAY_MODE", "off").lower()
if mode == "off":
return 0.0
min_ms = int(os.getenv("HERMES_HUMAN_DELAY_MIN_MS", "800"))
max_ms = int(os.getenv("HERMES_HUMAN_DELAY_MAX_MS", "2500"))
if mode == "natural":
min_ms, max_ms = 800, 2500
return random.uniform(min_ms / 1000.0, max_ms / 1000.0)
async def _process_message_background(self, event: MessageEvent, session_key: str) -> None:
"""Background task that actually processes the message."""
# Create interrupt event for this session
@@ -231,23 +517,63 @@ class BasePlatformAdapter(ABC):
# Send response if any
if response:
result = await self.send(
chat_id=event.source.chat_id,
content=response,
reply_to=event.message_id
)
# Extract MEDIA:<path> tags (from TTS tool) before other processing
media_files, response = self.extract_media(response)
# Log send failures (don't raise - user already saw tool progress)
if not result.success:
print(f"[{self.name}] Failed to send response: {result.error}")
# Try sending without markdown as fallback
fallback_result = await self.send(
# Extract image URLs and send them as native platform attachments
images, text_content = self.extract_images(response)
# Send the text portion first (if any remains after extractions)
if text_content:
result = await self.send(
chat_id=event.source.chat_id,
content=f"(Response formatting failed, plain text:)\n\n{response[:3500]}",
content=text_content,
reply_to=event.message_id
)
if not fallback_result.success:
print(f"[{self.name}] Fallback send also failed: {fallback_result.error}")
# Log send failures (don't raise - user already saw tool progress)
if not result.success:
print(f"[{self.name}] Failed to send response: {result.error}")
# Try sending without markdown as fallback
fallback_result = await self.send(
chat_id=event.source.chat_id,
content=f"(Response formatting failed, plain text:)\n\n{text_content[:3500]}",
reply_to=event.message_id
)
if not fallback_result.success:
print(f"[{self.name}] Fallback send also failed: {fallback_result.error}")
# Human-like pacing delay between text and media
human_delay = self._get_human_delay()
# Send extracted images as native attachments
for image_url, alt_text in images:
if human_delay > 0:
await asyncio.sleep(human_delay)
try:
img_result = await self.send_image(
chat_id=event.source.chat_id,
image_url=image_url,
caption=alt_text if alt_text else None,
)
if not img_result.success:
print(f"[{self.name}] Failed to send image: {img_result.error}")
except Exception as img_err:
print(f"[{self.name}] Error sending image: {img_err}")
# Send extracted audio/voice files as native attachments
for audio_path, is_voice in media_files:
if human_delay > 0:
await asyncio.sleep(human_delay)
try:
voice_result = await self.send_voice(
chat_id=event.source.chat_id,
audio_path=audio_path,
)
if not voice_result.success:
print(f"[{self.name}] Failed to send voice: {voice_result.error}")
except Exception as voice_err:
print(f"[{self.name}] Error sending voice: {voice_err}")
# Check if there's a pending message that was queued during our processing
if session_key in self._pending_messages:
@@ -286,7 +612,7 @@ class BasePlatformAdapter(ABC):
def get_pending_message(self, session_key: str) -> Optional[MessageEvent]:
"""Get and clear any pending message for a session."""
return self._pending_messages.get(session_key)
return self._pending_messages.pop(session_key, None)
def build_source(
self,
+386 -4
View File
@@ -8,6 +8,7 @@ Uses discord.py library for:
"""
import asyncio
import os
from typing import Dict, List, Optional, Any
try:
@@ -31,6 +32,8 @@ from gateway.platforms.base import (
MessageEvent,
MessageType,
SendResult,
cache_image_from_url,
cache_audio_from_url,
)
@@ -47,7 +50,10 @@ class DiscordAdapter(BasePlatformAdapter):
- Receiving messages from servers and DMs
- Sending responses with Discord markdown
- Thread support
- Slash commands (future)
- Native slash commands (/ask, /reset, /status, /stop)
- Button-based exec approvals
- Auto-threading for long conversations
- Reaction-based feedback
"""
# Discord message limits
@@ -57,6 +63,7 @@ class DiscordAdapter(BasePlatformAdapter):
super().__init__(config, Platform.DISCORD)
self._client: Optional[commands.Bot] = None
self._ready_event = asyncio.Event()
self._allowed_user_ids: set = set() # For button approval authorization
async def connect(self) -> bool:
"""Connect to Discord and start receiving events."""
@@ -81,10 +88,23 @@ class DiscordAdapter(BasePlatformAdapter):
intents=intents,
)
# Parse allowed user IDs for button authorization
allowed_env = os.getenv("DISCORD_ALLOWED_USERS", "")
if allowed_env:
self._allowed_user_ids = {
uid.strip() for uid in allowed_env.split(",") if uid.strip()
}
# Register event handlers
@self._client.event
async def on_ready():
print(f"[{self.name}] Connected as {self._client.user}")
# Sync slash commands with Discord
try:
synced = await self._client.tree.sync()
print(f"[{self.name}] Synced {len(synced)} slash command(s)")
except Exception as e:
print(f"[{self.name}] Slash command sync failed: {e}")
self._ready_event.set()
@self._client.event
@@ -94,6 +114,9 @@ class DiscordAdapter(BasePlatformAdapter):
return
await self._handle_message(message)
# Register slash commands
self._register_slash_commands()
# Start the bot in background
asyncio.create_task(self._client.start(self.config.token))
@@ -173,6 +196,99 @@ class DiscordAdapter(BasePlatformAdapter):
except Exception as e:
return SendResult(success=False, error=str(e))
async def send_voice(
self,
chat_id: str,
audio_path: str,
caption: Optional[str] = None,
reply_to: Optional[str] = None,
) -> SendResult:
"""Send audio as a Discord file attachment."""
if not self._client:
return SendResult(success=False, error="Not connected")
try:
import io
channel = self._client.get_channel(int(chat_id))
if not channel:
channel = await self._client.fetch_channel(int(chat_id))
if not channel:
return SendResult(success=False, error=f"Channel {chat_id} not found")
if not os.path.exists(audio_path):
return SendResult(success=False, error=f"Audio file not found: {audio_path}")
# Determine filename from path
filename = os.path.basename(audio_path)
with open(audio_path, "rb") as f:
file = discord.File(io.BytesIO(f.read()), filename=filename)
msg = await channel.send(
content=caption if caption else None,
file=file,
)
return SendResult(success=True, message_id=str(msg.id))
except Exception as e:
print(f"[{self.name}] Failed to send audio: {e}")
return await super().send_voice(chat_id, audio_path, caption, reply_to)
async def send_image(
self,
chat_id: str,
image_url: str,
caption: Optional[str] = None,
reply_to: Optional[str] = None,
) -> SendResult:
"""Send an image natively as a Discord file attachment."""
if not self._client:
return SendResult(success=False, error="Not connected")
try:
import aiohttp
channel = self._client.get_channel(int(chat_id))
if not channel:
channel = await self._client.fetch_channel(int(chat_id))
if not channel:
return SendResult(success=False, error=f"Channel {chat_id} not found")
# Download the image and send as a Discord file attachment
# (Discord renders attachments inline, unlike plain URLs)
async with aiohttp.ClientSession() as session:
async with session.get(image_url, timeout=aiohttp.ClientTimeout(total=30)) as resp:
if resp.status != 200:
raise Exception(f"Failed to download image: HTTP {resp.status}")
image_data = await resp.read()
# Determine filename from URL or content type
content_type = resp.headers.get("content-type", "image/png")
ext = "png"
if "jpeg" in content_type or "jpg" in content_type:
ext = "jpg"
elif "gif" in content_type:
ext = "gif"
elif "webp" in content_type:
ext = "webp"
import io
file = discord.File(io.BytesIO(image_data), filename=f"image.{ext}")
msg = await channel.send(
content=caption if caption else None,
file=file,
)
return SendResult(success=True, message_id=str(msg.id))
except ImportError:
print(f"[{self.name}] aiohttp not installed, falling back to URL. Run: pip install aiohttp")
return await super().send_image(chat_id, image_url, caption, reply_to)
except Exception as e:
print(f"[{self.name}] Failed to send image attachment, falling back to URL: {e}")
return await super().send_image(chat_id, image_url, caption, reply_to)
async def send_typing(self, chat_id: str) -> None:
"""Send typing indicator."""
if self._client:
@@ -230,8 +346,148 @@ class DiscordAdapter(BasePlatformAdapter):
# Discord markdown is fairly standard, no special escaping needed
return content
def _register_slash_commands(self) -> None:
"""Register Discord slash commands on the command tree."""
if not self._client:
return
tree = self._client.tree
@tree.command(name="ask", description="Ask Hermes a question")
@discord.app_commands.describe(question="Your question for Hermes")
async def slash_ask(interaction: discord.Interaction, question: str):
await interaction.response.defer()
event = self._build_slash_event(interaction, question)
await self.handle_message(event)
# The response is sent via the normal send() flow
# Send a followup to close the interaction if needed
try:
await interaction.followup.send("Processing complete~", ephemeral=True)
except Exception:
pass
@tree.command(name="reset", description="Reset your Hermes session")
async def slash_reset(interaction: discord.Interaction):
await interaction.response.defer(ephemeral=True)
event = self._build_slash_event(interaction, "/reset")
await self.handle_message(event)
try:
await interaction.followup.send("Session reset~", ephemeral=True)
except Exception:
pass
@tree.command(name="status", description="Show Hermes session status")
async def slash_status(interaction: discord.Interaction):
await interaction.response.defer(ephemeral=True)
event = self._build_slash_event(interaction, "/status")
await self.handle_message(event)
try:
await interaction.followup.send("Status sent~", ephemeral=True)
except Exception:
pass
@tree.command(name="stop", description="Stop the running Hermes agent")
async def slash_stop(interaction: discord.Interaction):
await interaction.response.defer(ephemeral=True)
event = self._build_slash_event(interaction, "/stop")
await self.handle_message(event)
try:
await interaction.followup.send("Stop requested~", ephemeral=True)
except Exception:
pass
def _build_slash_event(self, interaction: discord.Interaction, text: str) -> MessageEvent:
"""Build a MessageEvent from a Discord slash command interaction."""
is_dm = isinstance(interaction.channel, discord.DMChannel)
chat_type = "dm" if is_dm else "group"
chat_name = ""
if not is_dm and hasattr(interaction.channel, "name"):
chat_name = interaction.channel.name
if hasattr(interaction.channel, "guild") and interaction.channel.guild:
chat_name = f"{interaction.channel.guild.name} / #{chat_name}"
source = self.build_source(
chat_id=str(interaction.channel_id),
chat_name=chat_name,
chat_type=chat_type,
user_id=str(interaction.user.id),
user_name=interaction.user.display_name,
)
msg_type = MessageType.COMMAND if text.startswith("/") else MessageType.TEXT
return MessageEvent(
text=text,
message_type=msg_type,
source=source,
raw_message=interaction,
)
async def send_exec_approval(
self, chat_id: str, command: str, approval_id: str
) -> SendResult:
"""
Send a button-based exec approval prompt for a dangerous command.
Returns SendResult. The approval is resolved when a user clicks a button.
"""
if not self._client or not DISCORD_AVAILABLE:
return SendResult(success=False, error="Not connected")
try:
channel = self._client.get_channel(int(chat_id))
if not channel:
channel = await self._client.fetch_channel(int(chat_id))
embed = discord.Embed(
title="Command Approval Required",
description=f"```\n{command[:500]}\n```",
color=discord.Color.orange(),
)
embed.set_footer(text=f"Approval ID: {approval_id}")
view = ExecApprovalView(
approval_id=approval_id,
allowed_user_ids=self._allowed_user_ids,
)
msg = await channel.send(embed=embed, view=view)
return SendResult(success=True, message_id=str(msg.id))
except Exception as e:
return SendResult(success=False, error=str(e))
async def _handle_message(self, message: DiscordMessage) -> None:
"""Handle incoming Discord messages."""
# In server channels (not DMs), require the bot to be @mentioned
# UNLESS the channel is in the free-response list.
#
# Config:
# DISCORD_FREE_RESPONSE_CHANNELS: Comma-separated channel IDs where the
# bot responds to every message without needing a mention.
# DISCORD_REQUIRE_MENTION: Set to "false" to disable mention requirement
# globally (all channels become free-response). Default: "true".
if not isinstance(message.channel, discord.DMChannel):
# Check if this channel is in the free-response list
free_channels_raw = os.getenv("DISCORD_FREE_RESPONSE_CHANNELS", "")
free_channels = {ch.strip() for ch in free_channels_raw.split(",") if ch.strip()}
channel_id = str(message.channel.id)
# Global override: if DISCORD_REQUIRE_MENTION=false, all channels are free
require_mention = os.getenv("DISCORD_REQUIRE_MENTION", "true").lower() not in ("false", "0", "no")
is_free_channel = channel_id in free_channels
if require_mention and not is_free_channel:
# Must be @mentioned to respond
if self._client.user not in message.mentions:
return # Silently ignore messages that don't mention the bot
# Strip the bot mention from the message text so the agent sees clean input
if self._client.user and self._client.user in message.mentions:
message.content = message.content.replace(f"<@{self._client.user.id}>", "").strip()
message.content = message.content.replace(f"<@!{self._client.user.id}>", "").strip()
# Determine message type
msg_type = MessageType.TEXT
if message.content.startswith("/"):
@@ -278,9 +534,44 @@ class DiscordAdapter(BasePlatformAdapter):
thread_id=thread_id,
)
# Build media URLs
media_urls = [att.url for att in message.attachments]
media_types = [att.content_type or "unknown" for att in message.attachments]
# Build media URLs -- download image attachments to local cache so the
# vision tool can access them reliably (Discord CDN URLs can expire).
media_urls = []
media_types = []
for att in message.attachments:
content_type = att.content_type or "unknown"
if content_type.startswith("image/"):
try:
# Determine extension from content type (image/png -> .png)
ext = "." + content_type.split("/")[-1].split(";")[0]
if ext not in (".jpg", ".jpeg", ".png", ".gif", ".webp"):
ext = ".jpg"
cached_path = await cache_image_from_url(att.url, ext=ext)
media_urls.append(cached_path)
media_types.append(content_type)
print(f"[Discord] Cached user image: {cached_path}", flush=True)
except Exception as e:
print(f"[Discord] Failed to cache image attachment: {e}", flush=True)
# Fall back to the CDN URL if caching fails
media_urls.append(att.url)
media_types.append(content_type)
elif content_type.startswith("audio/"):
try:
ext = "." + content_type.split("/")[-1].split(";")[0]
if ext not in (".ogg", ".mp3", ".wav", ".webm", ".m4a"):
ext = ".ogg"
cached_path = await cache_audio_from_url(att.url, ext=ext)
media_urls.append(cached_path)
media_types.append(content_type)
print(f"[Discord] Cached user audio: {cached_path}", flush=True)
except Exception as e:
print(f"[Discord] Failed to cache audio attachment: {e}", flush=True)
media_urls.append(att.url)
media_types.append(content_type)
else:
# Other attachments: keep the original URL
media_urls.append(att.url)
media_types.append(content_type)
event = MessageEvent(
text=message.content,
@@ -295,3 +586,94 @@ class DiscordAdapter(BasePlatformAdapter):
)
await self.handle_message(event)
# ---------------------------------------------------------------------------
# Discord UI Components (outside the adapter class)
# ---------------------------------------------------------------------------
if DISCORD_AVAILABLE:
class ExecApprovalView(discord.ui.View):
"""
Interactive button view for exec approval of dangerous commands.
Shows three buttons: Allow Once (green), Always Allow (blue), Deny (red).
Only users in the allowed list can click. The view times out after 5 minutes.
"""
def __init__(self, approval_id: str, allowed_user_ids: set):
super().__init__(timeout=300) # 5-minute timeout
self.approval_id = approval_id
self.allowed_user_ids = allowed_user_ids
self.resolved = False
def _check_auth(self, interaction: discord.Interaction) -> bool:
"""Verify the user clicking is authorized."""
if not self.allowed_user_ids:
return True # No allowlist = anyone can approve
return str(interaction.user.id) in self.allowed_user_ids
async def _resolve(
self, interaction: discord.Interaction, action: str, color: discord.Color
):
"""Resolve the approval and update the message."""
if self.resolved:
await interaction.response.send_message(
"This approval has already been resolved~", ephemeral=True
)
return
if not self._check_auth(interaction):
await interaction.response.send_message(
"You're not authorized to approve commands~", ephemeral=True
)
return
self.resolved = True
# Update the embed with the decision
embed = interaction.message.embeds[0] if interaction.message.embeds else None
if embed:
embed.color = color
embed.set_footer(text=f"{action} by {interaction.user.display_name}")
# Disable all buttons
for child in self.children:
child.disabled = True
await interaction.response.edit_message(embed=embed, view=self)
# Store the approval decision for the gateway to pick up
try:
from tools.terminal_tool import _session_approved_patterns
if action == "allow_once":
pass # One-time approval handled by gateway
elif action == "allow_always":
_session_approved_patterns.add(self.approval_id)
except ImportError:
pass
@discord.ui.button(label="Allow Once", style=discord.ButtonStyle.green)
async def allow_once(
self, interaction: discord.Interaction, button: discord.ui.Button
):
await self._resolve(interaction, "allow_once", discord.Color.green())
@discord.ui.button(label="Always Allow", style=discord.ButtonStyle.blurple)
async def allow_always(
self, interaction: discord.Interaction, button: discord.ui.Button
):
await self._resolve(interaction, "allow_always", discord.Color.blue())
@discord.ui.button(label="Deny", style=discord.ButtonStyle.red)
async def deny(
self, interaction: discord.Interaction, button: discord.ui.Button
):
await self._resolve(interaction, "deny", discord.Color.red())
async def on_timeout(self):
"""Handle view timeout -- disable buttons and mark as expired."""
self.resolved = True
for child in self.children:
child.disabled = True
+374
View File
@@ -0,0 +1,374 @@
"""
Slack platform adapter.
Uses slack-bolt (Python) with Socket Mode for:
- Receiving messages from channels and DMs
- Sending responses back
- Handling slash commands
- Thread support
"""
import asyncio
import os
from typing import Dict, List, Optional, Any
try:
from slack_bolt.async_app import AsyncApp
from slack_bolt.adapter.socket_mode.async_handler import AsyncSocketModeHandler
from slack_sdk.web.async_client import AsyncWebClient
SLACK_AVAILABLE = True
except ImportError:
SLACK_AVAILABLE = False
AsyncApp = Any
AsyncSocketModeHandler = Any
AsyncWebClient = Any
import sys
sys.path.insert(0, str(__file__).rsplit("/", 3)[0])
from gateway.config import Platform, PlatformConfig
from gateway.platforms.base import (
BasePlatformAdapter,
MessageEvent,
MessageType,
SendResult,
cache_image_from_url,
cache_audio_from_url,
)
def check_slack_requirements() -> bool:
"""Check if Slack dependencies are available."""
return SLACK_AVAILABLE
class SlackAdapter(BasePlatformAdapter):
"""
Slack bot adapter using Socket Mode.
Requires two tokens:
- SLACK_BOT_TOKEN (xoxb-...) for API calls
- SLACK_APP_TOKEN (xapp-...) for Socket Mode connection
Features:
- DMs and channel messages (mention-gated in channels)
- Thread support
- File/image/audio attachments
- Slash commands (/hermes)
- Typing indicators (not natively supported by Slack bots)
"""
MAX_MESSAGE_LENGTH = 4000 # Slack's limit is higher but mrkdwn can inflate
def __init__(self, config: PlatformConfig):
super().__init__(config, Platform.SLACK)
self._app: Optional[AsyncApp] = None
self._handler: Optional[AsyncSocketModeHandler] = None
self._bot_user_id: Optional[str] = None
async def connect(self) -> bool:
"""Connect to Slack via Socket Mode."""
if not SLACK_AVAILABLE:
print("[Slack] slack-bolt not installed. Run: pip install slack-bolt")
return False
bot_token = self.config.token
app_token = os.getenv("SLACK_APP_TOKEN")
if not bot_token:
print("[Slack] SLACK_BOT_TOKEN not set")
return False
if not app_token:
print("[Slack] SLACK_APP_TOKEN not set")
return False
try:
self._app = AsyncApp(token=bot_token)
# Get our own bot user ID for mention detection
auth_response = await self._app.client.auth_test()
self._bot_user_id = auth_response.get("user_id")
bot_name = auth_response.get("user", "unknown")
# Register message event handler
@self._app.event("message")
async def handle_message_event(event, say):
await self._handle_slack_message(event)
# Register slash command handler
@self._app.command("/hermes")
async def handle_hermes_command(ack, command):
await ack()
await self._handle_slash_command(command)
# Start Socket Mode handler in background
self._handler = AsyncSocketModeHandler(self._app, app_token)
asyncio.create_task(self._handler.start_async())
self._running = True
print(f"[Slack] Connected as @{bot_name} (Socket Mode)")
return True
except Exception as e:
print(f"[Slack] Connection failed: {e}")
return False
async def disconnect(self) -> None:
"""Disconnect from Slack."""
if self._handler:
await self._handler.close_async()
self._running = False
print("[Slack] Disconnected")
async def send(
self,
chat_id: str,
content: str,
reply_to: Optional[str] = None,
metadata: Optional[Dict[str, Any]] = None,
) -> SendResult:
"""Send a message to a Slack channel or DM."""
if not self._app:
return SendResult(success=False, error="Not connected")
try:
kwargs = {
"channel": chat_id,
"text": content,
}
# Reply in thread if thread_ts is available
if reply_to:
kwargs["thread_ts"] = reply_to
elif metadata and metadata.get("thread_ts"):
kwargs["thread_ts"] = metadata["thread_ts"]
result = await self._app.client.chat_postMessage(**kwargs)
return SendResult(
success=True,
message_id=result.get("ts"),
raw_response=result,
)
except Exception as e:
print(f"[Slack] Send error: {e}")
return SendResult(success=False, error=str(e))
async def send_typing(self, chat_id: str) -> None:
"""Slack doesn't have a direct typing indicator API for bots."""
pass
async def send_image(
self,
chat_id: str,
image_url: str,
caption: Optional[str] = None,
reply_to: Optional[str] = None,
) -> SendResult:
"""Send an image to Slack by uploading the URL as a file."""
if not self._app:
return SendResult(success=False, error="Not connected")
try:
import httpx
# Download the image first
async with httpx.AsyncClient(timeout=30.0, follow_redirects=True) as client:
response = await client.get(image_url)
response.raise_for_status()
result = await self._app.client.files_upload_v2(
channel=chat_id,
content=response.content,
filename="image.png",
initial_comment=caption or "",
thread_ts=reply_to,
)
return SendResult(success=True, raw_response=result)
except Exception as e:
# Fall back to sending the URL as text
text = f"{caption}\n{image_url}" if caption else image_url
return await self.send(chat_id=chat_id, content=text, reply_to=reply_to)
async def send_voice(
self,
chat_id: str,
audio_path: str,
caption: Optional[str] = None,
reply_to: Optional[str] = None,
) -> SendResult:
"""Send an audio file to Slack."""
if not self._app:
return SendResult(success=False, error="Not connected")
try:
result = await self._app.client.files_upload_v2(
channel=chat_id,
file=audio_path,
filename=os.path.basename(audio_path),
initial_comment=caption or "",
thread_ts=reply_to,
)
return SendResult(success=True, raw_response=result)
except Exception as e:
return SendResult(success=False, error=str(e))
async def get_chat_info(self, chat_id: str) -> Dict[str, Any]:
"""Get information about a Slack channel."""
if not self._app:
return {"name": chat_id, "type": "unknown"}
try:
result = await self._app.client.conversations_info(channel=chat_id)
channel = result.get("channel", {})
is_dm = channel.get("is_im", False)
return {
"name": channel.get("name", chat_id),
"type": "dm" if is_dm else "group",
}
except Exception:
return {"name": chat_id, "type": "unknown"}
# ----- Internal handlers -----
async def _handle_slack_message(self, event: dict) -> None:
"""Handle an incoming Slack message event."""
# Ignore bot messages (including our own)
if event.get("bot_id") or event.get("subtype") == "bot_message":
return
# Ignore message edits and deletions
subtype = event.get("subtype")
if subtype in ("message_changed", "message_deleted"):
return
text = event.get("text", "")
user_id = event.get("user", "")
channel_id = event.get("channel", "")
thread_ts = event.get("thread_ts") or event.get("ts")
ts = event.get("ts", "")
# Determine if this is a DM or channel message
channel_type = event.get("channel_type", "")
is_dm = channel_type == "im"
# In channels, only respond if bot is mentioned
if not is_dm and self._bot_user_id:
if f"<@{self._bot_user_id}>" not in text:
return
# Strip the bot mention from the text
text = text.replace(f"<@{self._bot_user_id}>", "").strip()
# Determine message type
msg_type = MessageType.TEXT
if text.startswith("/"):
msg_type = MessageType.COMMAND
# Handle file attachments
media_urls = []
media_types = []
files = event.get("files", [])
for f in files:
mimetype = f.get("mimetype", "unknown")
url = f.get("url_private_download") or f.get("url_private", "")
if mimetype.startswith("image/") and url:
try:
ext = "." + mimetype.split("/")[-1].split(";")[0]
if ext not in (".jpg", ".jpeg", ".png", ".gif", ".webp"):
ext = ".jpg"
# Slack private URLs require the bot token as auth header
cached = await self._download_slack_file(url, ext)
media_urls.append(cached)
media_types.append(mimetype)
msg_type = MessageType.PHOTO
except Exception as e:
print(f"[Slack] Failed to cache image: {e}", flush=True)
elif mimetype.startswith("audio/") and url:
try:
ext = "." + mimetype.split("/")[-1].split(";")[0]
if ext not in (".ogg", ".mp3", ".wav", ".webm", ".m4a"):
ext = ".ogg"
cached = await self._download_slack_file(url, ext, audio=True)
media_urls.append(cached)
media_types.append(mimetype)
msg_type = MessageType.VOICE
except Exception as e:
print(f"[Slack] Failed to cache audio: {e}", flush=True)
# Build source
source = self.build_source(
chat_id=channel_id,
chat_name=channel_id, # Will be resolved later if needed
chat_type="dm" if is_dm else "group",
user_id=user_id,
thread_id=thread_ts,
)
msg_event = MessageEvent(
text=text,
message_type=msg_type,
source=source,
raw_message=event,
message_id=ts,
media_urls=media_urls,
media_types=media_types,
reply_to_message_id=thread_ts if thread_ts != ts else None,
)
await self.handle_message(msg_event)
async def _handle_slash_command(self, command: dict) -> None:
"""Handle /hermes slash command."""
text = command.get("text", "").strip()
user_id = command.get("user_id", "")
channel_id = command.get("channel_id", "")
# Map common slash subcommands to gateway commands
if text in ("new", "reset"):
text = "/reset"
elif text == "status":
text = "/status"
elif text == "stop":
text = "/stop"
elif text:
pass # Treat as a regular question
else:
text = "/help"
source = self.build_source(
chat_id=channel_id,
chat_type="dm", # Slash commands are always in DM-like context
user_id=user_id,
)
event = MessageEvent(
text=text,
message_type=MessageType.COMMAND if text.startswith("/") else MessageType.TEXT,
source=source,
raw_message=command,
)
await self.handle_message(event)
async def _download_slack_file(self, url: str, ext: str, audio: bool = False) -> str:
"""Download a Slack file using the bot token for auth."""
import httpx
bot_token = self.config.token
async with httpx.AsyncClient(timeout=30.0, follow_redirects=True) as client:
response = await client.get(
url,
headers={"Authorization": f"Bearer {bot_token}"},
)
response.raise_for_status()
if audio:
from gateway.platforms.base import cache_audio_from_bytes
return cache_audio_from_bytes(response.content, ext)
else:
from gateway.platforms.base import cache_image_from_bytes
return cache_image_from_bytes(response.content, ext)
+189 -3
View File
@@ -38,6 +38,8 @@ from gateway.platforms.base import (
MessageEvent,
MessageType,
SendResult,
cache_image_from_bytes,
cache_audio_from_bytes,
)
@@ -90,7 +92,7 @@ class TelegramAdapter(BasePlatformAdapter):
self._handle_command
))
self._app.add_handler(TelegramMessageHandler(
filters.PHOTO | filters.VIDEO | filters.AUDIO | filters.VOICE | filters.Document.ALL,
filters.PHOTO | filters.VIDEO | filters.AUDIO | filters.VOICE | filters.Document.ALL | filters.Sticker.ALL,
self._handle_media_message
))
@@ -174,6 +176,69 @@ class TelegramAdapter(BasePlatformAdapter):
except Exception as e:
return SendResult(success=False, error=str(e))
async def send_voice(
self,
chat_id: str,
audio_path: str,
caption: Optional[str] = None,
reply_to: Optional[str] = None,
) -> SendResult:
"""Send audio as a native Telegram voice message or audio file."""
if not self._bot:
return SendResult(success=False, error="Not connected")
try:
import os
if not os.path.exists(audio_path):
return SendResult(success=False, error=f"Audio file not found: {audio_path}")
with open(audio_path, "rb") as audio_file:
# .ogg files -> send as voice (round playable bubble)
if audio_path.endswith(".ogg") or audio_path.endswith(".opus"):
msg = await self._bot.send_voice(
chat_id=int(chat_id),
voice=audio_file,
caption=caption[:1024] if caption else None,
reply_to_message_id=int(reply_to) if reply_to else None,
)
else:
# .mp3 and others -> send as audio file
msg = await self._bot.send_audio(
chat_id=int(chat_id),
audio=audio_file,
caption=caption[:1024] if caption else None,
reply_to_message_id=int(reply_to) if reply_to else None,
)
return SendResult(success=True, message_id=str(msg.message_id))
except Exception as e:
print(f"[{self.name}] Failed to send voice/audio: {e}")
return await super().send_voice(chat_id, audio_path, caption, reply_to)
async def send_image(
self,
chat_id: str,
image_url: str,
caption: Optional[str] = None,
reply_to: Optional[str] = None,
) -> SendResult:
"""Send an image natively as a Telegram photo."""
if not self._bot:
return SendResult(success=False, error="Not connected")
try:
# Telegram can send photos directly from URLs
msg = await self._bot.send_photo(
chat_id=int(chat_id),
photo=image_url,
caption=caption[:1024] if caption else None, # Telegram caption limit
reply_to_message_id=int(reply_to) if reply_to else None,
)
return SendResult(success=True, message_id=str(msg.message_id))
except Exception as e:
print(f"[{self.name}] Failed to send photo, falling back to URL: {e}")
# Fallback: send as text link
return await super().send_image(chat_id, image_url, caption, reply_to)
async def send_typing(self, chat_id: str) -> None:
"""Send typing indicator."""
if self._bot:
@@ -240,14 +305,16 @@ class TelegramAdapter(BasePlatformAdapter):
await self.handle_message(event)
async def _handle_media_message(self, update: Update, context: ContextTypes.DEFAULT_TYPE) -> None:
"""Handle incoming media messages."""
"""Handle incoming media messages, downloading images to local cache."""
if not update.message:
return
msg = update.message
# Determine media type
if msg.photo:
if msg.sticker:
msg_type = MessageType.STICKER
elif msg.photo:
msg_type = MessageType.PHOTO
elif msg.video:
msg_type = MessageType.VIDEO
@@ -264,8 +331,127 @@ class TelegramAdapter(BasePlatformAdapter):
if msg.caption:
event.text = msg.caption
# Handle stickers: describe via vision tool with caching
if msg.sticker:
await self._handle_sticker(msg, event)
await self.handle_message(event)
return
# Download photo to local image cache so the vision tool can access it
# even after Telegram's ephemeral file URLs expire (~1 hour).
if msg.photo:
try:
# msg.photo is a list of PhotoSize sorted by size; take the largest
photo = msg.photo[-1]
file_obj = await photo.get_file()
# Download the image bytes directly into memory
image_bytes = await file_obj.download_as_bytearray()
# Determine extension from the file path if available
ext = ".jpg"
if file_obj.file_path:
for candidate in [".png", ".webp", ".gif", ".jpeg", ".jpg"]:
if file_obj.file_path.lower().endswith(candidate):
ext = candidate
break
# Save to cache and populate media_urls with the local path
cached_path = cache_image_from_bytes(bytes(image_bytes), ext=ext)
event.media_urls = [cached_path]
event.media_types = [f"image/{ext.lstrip('.')}"]
print(f"[Telegram] Cached user photo: {cached_path}", flush=True)
except Exception as e:
print(f"[Telegram] Failed to cache photo: {e}", flush=True)
# Download voice/audio messages to cache for STT transcription
if msg.voice:
try:
file_obj = await msg.voice.get_file()
audio_bytes = await file_obj.download_as_bytearray()
cached_path = cache_audio_from_bytes(bytes(audio_bytes), ext=".ogg")
event.media_urls = [cached_path]
event.media_types = ["audio/ogg"]
print(f"[Telegram] Cached user voice: {cached_path}", flush=True)
except Exception as e:
print(f"[Telegram] Failed to cache voice: {e}", flush=True)
elif msg.audio:
try:
file_obj = await msg.audio.get_file()
audio_bytes = await file_obj.download_as_bytearray()
cached_path = cache_audio_from_bytes(bytes(audio_bytes), ext=".mp3")
event.media_urls = [cached_path]
event.media_types = ["audio/mp3"]
print(f"[Telegram] Cached user audio: {cached_path}", flush=True)
except Exception as e:
print(f"[Telegram] Failed to cache audio: {e}", flush=True)
await self.handle_message(event)
async def _handle_sticker(self, msg: Message, event: "MessageEvent") -> None:
"""
Describe a Telegram sticker via vision analysis, with caching.
For static stickers (WEBP), we download, analyze with vision, and cache
the description by file_unique_id. For animated/video stickers, we inject
a placeholder noting the emoji.
"""
from gateway.sticker_cache import (
get_cached_description,
cache_sticker_description,
build_sticker_injection,
build_animated_sticker_injection,
STICKER_VISION_PROMPT,
)
sticker = msg.sticker
emoji = sticker.emoji or ""
set_name = sticker.set_name or ""
# Animated and video stickers can't be analyzed as static images
if sticker.is_animated or sticker.is_video:
event.text = build_animated_sticker_injection(emoji)
return
# Check the cache first
cached = get_cached_description(sticker.file_unique_id)
if cached:
event.text = build_sticker_injection(
cached["description"], cached.get("emoji", emoji), cached.get("set_name", set_name)
)
print(f"[Telegram] Sticker cache hit: {sticker.file_unique_id}", flush=True)
return
# Cache miss -- download and analyze
try:
file_obj = await sticker.get_file()
image_bytes = await file_obj.download_as_bytearray()
cached_path = cache_image_from_bytes(bytes(image_bytes), ext=".webp")
print(f"[Telegram] Analyzing sticker: {cached_path}", flush=True)
from tools.vision_tools import vision_analyze_tool
import json as _json
result_json = await vision_analyze_tool(
image_url=cached_path,
user_prompt=STICKER_VISION_PROMPT,
)
result = _json.loads(result_json)
if result.get("success"):
description = result.get("analysis", "a sticker")
cache_sticker_description(sticker.file_unique_id, description, emoji, set_name)
event.text = build_sticker_injection(description, emoji, set_name)
else:
# Vision failed -- use emoji as fallback
event.text = build_sticker_injection(
f"a sticker with emoji {emoji}" if emoji else "a sticker",
emoji, set_name,
)
except Exception as e:
print(f"[Telegram] Sticker analysis error: {e}", flush=True)
event.text = build_sticker_injection(
f"a sticker with emoji {emoji}" if emoji else "a sticker",
emoji, set_name,
)
def _build_message_event(self, message: Message, msg_type: MessageType) -> MessageEvent:
"""Build a MessageEvent from a Telegram message."""
chat = message.chat
+37 -4
View File
@@ -30,6 +30,8 @@ from gateway.platforms.base import (
MessageEvent,
MessageType,
SendResult,
cache_image_from_url,
cache_audio_from_url,
)
@@ -267,7 +269,7 @@ class WhatsAppAdapter(BasePlatformAdapter):
if resp.status == 200:
messages = await resp.json()
for msg_data in messages:
event = self._build_message_event(msg_data)
event = await self._build_message_event(msg_data)
if event:
await self.handle_message(event)
except asyncio.CancelledError:
@@ -278,8 +280,8 @@ class WhatsAppAdapter(BasePlatformAdapter):
await asyncio.sleep(1) # Poll interval
def _build_message_event(self, data: Dict[str, Any]) -> Optional[MessageEvent]:
"""Build a MessageEvent from bridge message data."""
async def _build_message_event(self, data: Dict[str, Any]) -> Optional[MessageEvent]:
"""Build a MessageEvent from bridge message data, downloading images to cache."""
try:
# Determine message type
msg_type = MessageType.TEXT
@@ -307,13 +309,44 @@ class WhatsAppAdapter(BasePlatformAdapter):
user_name=data.get("senderName"),
)
# Download image media URLs to the local cache so the vision tool
# can access them reliably regardless of URL expiration.
raw_urls = data.get("mediaUrls", [])
cached_urls = []
media_types = []
for url in raw_urls:
if msg_type == MessageType.PHOTO and url.startswith(("http://", "https://")):
try:
cached_path = await cache_image_from_url(url, ext=".jpg")
cached_urls.append(cached_path)
media_types.append("image/jpeg")
print(f"[{self.name}] Cached user image: {cached_path}", flush=True)
except Exception as e:
print(f"[{self.name}] Failed to cache image: {e}", flush=True)
cached_urls.append(url)
media_types.append("image/jpeg")
elif msg_type == MessageType.VOICE and url.startswith(("http://", "https://")):
try:
cached_path = await cache_audio_from_url(url, ext=".ogg")
cached_urls.append(cached_path)
media_types.append("audio/ogg")
print(f"[{self.name}] Cached user voice: {cached_path}", flush=True)
except Exception as e:
print(f"[{self.name}] Failed to cache voice: {e}", flush=True)
cached_urls.append(url)
media_types.append("audio/ogg")
else:
cached_urls.append(url)
media_types.append("unknown")
return MessageEvent(
text=data.get("body", ""),
message_type=msg_type,
source=source,
raw_message=data,
message_id=data.get("messageId"),
media_urls=data.get("mediaUrls", []),
media_urls=cached_urls,
media_types=media_types,
)
except Exception as e:
print(f"[{self.name}] Error building event: {e}")
+576 -47
View File
@@ -15,6 +15,7 @@ Usage:
import asyncio
import os
import re
import sys
import signal
from pathlib import Path
@@ -35,6 +36,9 @@ load_dotenv()
# Gateway runs in quiet mode - suppress debug output and use cwd directly (no temp dirs)
os.environ["HERMES_QUIET"] = "1"
# Enable interactive exec approval for dangerous commands on messaging platforms
os.environ["HERMES_EXEC_ASK"] = "1"
# Set terminal working directory for messaging platforms
# Uses MESSAGING_CWD if set, otherwise defaults to home directory
# This is separate from CLI which uses the directory where `hermes` is run
@@ -54,7 +58,7 @@ from gateway.session import (
build_session_context_prompt,
)
from gateway.delivery import DeliveryRouter, DeliveryTarget
from gateway.platforms.base import BasePlatformAdapter, MessageEvent
from gateway.platforms.base import BasePlatformAdapter, MessageEvent, MessageType
class GatewayRunner:
@@ -68,7 +72,13 @@ class GatewayRunner:
def __init__(self, config: Optional[GatewayConfig] = None):
self.config = config or load_gateway_config()
self.adapters: Dict[Platform, BasePlatformAdapter] = {}
self.session_store = SessionStore(self.config.sessions_dir, self.config)
# Wire process registry into session store for reset protection
from tools.process_registry import process_registry
self.session_store = SessionStore(
self.config.sessions_dir, self.config,
has_active_processes_fn=lambda key: process_registry.has_active_for_session(key),
)
self.delivery_router = DeliveryRouter(self.config)
self._running = False
self._shutdown_event = asyncio.Event()
@@ -77,6 +87,18 @@ class GatewayRunner:
# Key: session_key, Value: AIAgent instance
self._running_agents: Dict[str, Any] = {}
self._pending_messages: Dict[str, str] = {} # Queued messages during interrupt
# Track pending exec approvals per session
# Key: session_key, Value: {"command": str, "pattern_key": str}
self._pending_approvals: Dict[str, Dict[str, str]] = {}
# DM pairing store for code-based user authorization
from gateway.pairing import PairingStore
self.pairing_store = PairingStore()
# Event hook system
from gateway.hooks import HookRegistry
self.hooks = HookRegistry()
async def start(self) -> bool:
"""
@@ -87,6 +109,18 @@ class GatewayRunner:
print("[gateway] Starting Hermes Gateway...")
print(f"[gateway] Session storage: {self.config.sessions_dir}")
# Discover and load event hooks
self.hooks.discover_and_load()
# Recover background processes from checkpoint (crash recovery)
try:
from tools.process_registry import process_registry
recovered = process_registry.recover_from_checkpoint()
if recovered:
print(f"[gateway] Recovered {recovered} background process(es) from previous run")
except Exception as e:
print(f"[gateway] Process checkpoint recovery: {e}")
connected_count = 0
# Initialize and connect each configured platform
@@ -123,6 +157,15 @@ class GatewayRunner:
self.delivery_router.adapters = self.adapters
self._running = True
# Emit gateway:startup hook
hook_count = len(self.hooks.loaded_hooks)
if hook_count:
print(f"[gateway] {hook_count} hook(s) loaded")
await self.hooks.emit("gateway:startup", {
"platforms": [p.value for p in self.adapters.keys()],
})
print(f"[gateway] Gateway running with {connected_count} platform(s)")
print("[gateway] Press Ctrl+C to stop")
@@ -175,18 +218,23 @@ class GatewayRunner:
return None
return WhatsAppAdapter(config)
elif platform == Platform.SLACK:
from gateway.platforms.slack import SlackAdapter, check_slack_requirements
if not check_slack_requirements():
print(f"[gateway] Slack: slack-bolt not installed. Run: pip install 'hermes-agent[slack]'")
return None
return SlackAdapter(config)
return None
def _is_user_authorized(self, source: SessionSource) -> bool:
"""
Check if a user is authorized to use the bot.
Authorization is checked via environment variables:
- GATEWAY_ALLOWED_USERS: Comma-separated list of user IDs (all platforms)
- TELEGRAM_ALLOWED_USERS: Telegram-specific user IDs
- DISCORD_ALLOWED_USERS: Discord-specific user IDs
If no allowlist is configured, all users are allowed (open access).
Checks in order:
1. Environment variable allowlists (TELEGRAM_ALLOWED_USERS, etc.)
2. DM pairing approved list
3. If no allowlists AND no pairing approvals exist, allow all (open access)
"""
user_id = source.user_id
if not user_id:
@@ -197,12 +245,18 @@ class GatewayRunner:
Platform.TELEGRAM: "TELEGRAM_ALLOWED_USERS",
Platform.DISCORD: "DISCORD_ALLOWED_USERS",
Platform.WHATSAPP: "WHATSAPP_ALLOWED_USERS",
Platform.SLACK: "SLACK_ALLOWED_USERS",
}
platform_allowlist = os.getenv(platform_env_map.get(source.platform, ""))
global_allowlist = os.getenv("GATEWAY_ALLOWED_USERS", "")
# If no allowlists configured, allow all (backward compatible)
# Check pairing store (always checked, regardless of allowlists)
platform_name = source.platform.value if source.platform else ""
if self.pairing_store.is_approved(platform_name, user_id):
return True
# If no allowlists configured and no pairing approvals, allow all (backward compatible)
if not platform_allowlist and not global_allowlist:
return True
@@ -233,7 +287,31 @@ class GatewayRunner:
# Check if user is authorized
if not self._is_user_authorized(source):
print(f"[gateway] Unauthorized user: {source.user_id} ({source.user_name}) on {source.platform.value}")
return None # Silently ignore unauthorized users
# In DMs: offer pairing code. In groups: silently ignore.
if source.chat_type == "dm":
platform_name = source.platform.value if source.platform else "unknown"
code = self.pairing_store.generate_code(
platform_name, source.user_id, source.user_name or ""
)
if code:
adapter = self.adapters.get(source.platform)
if adapter:
await adapter.send(
source.chat_id,
f"Hi~ I don't recognize you yet!\n\n"
f"Here's your pairing code: `{code}`\n\n"
f"Ask the bot owner to run:\n"
f"`hermes pairing approve {platform_name} {code}`"
)
else:
adapter = self.adapters.get(source.platform)
if adapter:
await adapter.send(
source.chat_id,
"Too many pairing requests right now~ "
"Please try again later!"
)
return None
# Check for commands
command = event.get_command()
@@ -246,6 +324,25 @@ class GatewayRunner:
if command == "stop":
return await self._handle_stop_command(event)
# Check for pending exec approval responses
session_key_preview = f"agent:main:{source.platform.value}:{source.chat_type}:{source.chat_id}" if source.chat_type != "dm" else f"agent:main:{source.platform.value}:dm"
if session_key_preview in self._pending_approvals:
user_text = event.text.strip().lower()
if user_text in ("yes", "y", "approve", "ok", "go", "do it"):
approval = self._pending_approvals.pop(session_key_preview)
cmd = approval["command"]
pattern_key = approval.get("pattern_key", "")
print(f"[gateway] ✅ User approved dangerous command: {cmd[:60]}...")
# Approve for session and re-run via terminal_tool with force=True
from tools.terminal_tool import terminal_tool, _session_approved_patterns
_session_approved_patterns.add(pattern_key)
result = terminal_tool(command=cmd, force=True)
return f"✅ Command approved and executed.\n\n```\n{result[:3500]}\n```"
elif user_text in ("no", "n", "deny", "cancel", "nope"):
self._pending_approvals.pop(session_key_preview)
return "❌ Command denied."
# If it's not clearly an approval/denial, fall through to normal processing
# Get or create session
session_entry = self.session_store.get_or_create_session(source)
session_key = session_entry.session_key
@@ -271,10 +368,66 @@ class GatewayRunner:
# Load conversation history from transcript
history = self.session_store.load_transcript(session_entry.session_id)
# -----------------------------------------------------------------
# Auto-analyze images sent by the user
#
# If the user attached image(s), we run the vision tool eagerly so
# the conversation model always receives a text description. The
# local file path is also included so the model can re-examine the
# image later with a more targeted question via vision_analyze.
#
# We filter to image paths only (by media_type) so that non-image
# attachments (documents, audio, etc.) are not sent to the vision
# tool even when they appear in the same message.
# -----------------------------------------------------------------
message_text = event.text or ""
if event.media_urls:
image_paths = []
for i, path in enumerate(event.media_urls):
# Check media_types if available; otherwise infer from message type
mtype = event.media_types[i] if i < len(event.media_types) else ""
is_image = (
mtype.startswith("image/")
or event.message_type == MessageType.PHOTO
)
if is_image:
image_paths.append(path)
if image_paths:
message_text = await self._enrich_message_with_vision(
message_text, image_paths
)
# -----------------------------------------------------------------
# Auto-transcribe voice/audio messages sent by the user
# -----------------------------------------------------------------
if event.media_urls:
audio_paths = []
for i, path in enumerate(event.media_urls):
mtype = event.media_types[i] if i < len(event.media_types) else ""
is_audio = (
mtype.startswith("audio/")
or event.message_type in (MessageType.VOICE, MessageType.AUDIO)
)
if is_audio:
audio_paths.append(path)
if audio_paths:
message_text = await self._enrich_message_with_transcription(
message_text, audio_paths
)
try:
# Emit agent:start hook
hook_ctx = {
"platform": source.platform.value if source.platform else "",
"user_id": source.user_id,
"session_id": session_entry.session_id,
"message": message_text[:500],
}
await self.hooks.emit("agent:start", hook_ctx)
# Run the agent
response = await self._run_agent(
message=event.text,
agent_result = await self._run_agent(
message=message_text,
context_prompt=context_prompt,
history=history,
source=source,
@@ -282,15 +435,82 @@ class GatewayRunner:
session_key=session_key
)
# Append to transcript
self.session_store.append_to_transcript(
session_entry.session_id,
{"role": "user", "content": event.text, "timestamp": datetime.now().isoformat()}
)
self.session_store.append_to_transcript(
session_entry.session_id,
{"role": "assistant", "content": response, "timestamp": datetime.now().isoformat()}
)
response = agent_result.get("final_response", "")
agent_messages = agent_result.get("messages", [])
# Emit agent:end hook
await self.hooks.emit("agent:end", {
**hook_ctx,
"response": (response or "")[:500],
})
# Check for pending process watchers (check_interval on background processes)
try:
from tools.process_registry import process_registry
while process_registry.pending_watchers:
watcher = process_registry.pending_watchers.pop(0)
asyncio.create_task(self._run_process_watcher(watcher))
except Exception as e:
print(f"[gateway] Process watcher setup error: {e}", flush=True)
# Check if the agent encountered a dangerous command needing approval
# The terminal tool stores the last pending approval globally
try:
from tools.terminal_tool import _last_pending_approval
if _last_pending_approval:
self._pending_approvals[session_key] = _last_pending_approval.copy()
# Clear the global so it doesn't leak to other sessions
_last_pending_approval.clear()
except Exception:
pass
# Save the full conversation to the transcript, including tool calls.
# This preserves the complete agent loop (tool_calls, tool results,
# intermediate reasoning) so sessions can be resumed with full context
# and transcripts are useful for debugging and training data.
ts = datetime.now().isoformat()
# If this is a fresh session (no history), write the full tool
# definitions as the first entry so the transcript is self-describing
# -- the same list of dicts sent as tools=[...] in the API request.
if not history:
tool_defs = agent_result.get("tools", [])
self.session_store.append_to_transcript(
session_entry.session_id,
{
"role": "session_meta",
"tools": tool_defs or [],
"model": os.getenv("HERMES_MODEL", ""),
"platform": source.platform.value if source.platform else "",
"timestamp": ts,
}
)
# Find only the NEW messages from this turn (skip history we loaded)
history_len = len(history)
new_messages = agent_messages[history_len:] if len(agent_messages) > history_len else agent_messages
# If no new messages found (edge case), fall back to simple user/assistant
if not new_messages:
self.session_store.append_to_transcript(
session_entry.session_id,
{"role": "user", "content": message_text, "timestamp": ts}
)
if response:
self.session_store.append_to_transcript(
session_entry.session_id,
{"role": "assistant", "content": response, "timestamp": ts}
)
else:
for msg in new_messages:
# Skip system messages (they're rebuilt each run)
if msg.get("role") == "system":
continue
# Add timestamp to each message for debugging
entry = {**msg, "timestamp": ts}
self.session_store.append_to_transcript(
session_entry.session_id, entry
)
# Update session
self.session_store.update_session(session_entry.session_key)
@@ -315,6 +535,13 @@ class GatewayRunner:
# Reset the session
new_entry = self.session_store.reset_session(session_key)
# Emit session:reset hook
await self.hooks.emit("session:reset", {
"platform": source.platform.value if source.platform else "",
"user_id": source.user_id,
"session_key": session_key,
})
if new_entry:
return "✨ Session reset! I've started fresh with no memory of our previous conversation."
else:
@@ -373,6 +600,200 @@ class GatewayRunner:
if var in os.environ:
del os.environ[var]
async def _enrich_message_with_vision(
self,
user_text: str,
image_paths: List[str],
) -> str:
"""
Auto-analyze user-attached images with the vision tool and prepend
the descriptions to the message text.
Each image is analyzed with a general-purpose prompt. The resulting
description *and* the local cache path are injected so the model can:
1. Immediately understand what the user sent (no extra tool call).
2. Re-examine the image with vision_analyze if it needs more detail.
Args:
user_text: The user's original caption / message text.
image_paths: List of local file paths to cached images.
Returns:
The enriched message string with vision descriptions prepended.
"""
from tools.vision_tools import vision_analyze_tool
import json as _json
analysis_prompt = (
"Describe everything visible in this image in thorough detail. "
"Include any text, code, data, objects, people, layout, colors, "
"and any other notable visual information."
)
enriched_parts = []
for path in image_paths:
try:
print(f"[gateway] Auto-analyzing user image: {path}", flush=True)
result_json = await vision_analyze_tool(
image_url=path,
user_prompt=analysis_prompt,
)
result = _json.loads(result_json)
if result.get("success"):
description = result.get("analysis", "")
enriched_parts.append(
f"[The user sent an image~ Here's what I can see:\n{description}]\n"
f"[If you need a closer look, use vision_analyze with "
f"image_url: {path} ~]"
)
else:
enriched_parts.append(
"[The user sent an image but I couldn't quite see it "
"this time (>_<) You can try looking at it yourself "
f"with vision_analyze using image_url: {path}]"
)
except Exception as e:
print(f"[gateway] Vision auto-analysis error: {e}", flush=True)
enriched_parts.append(
f"[The user sent an image but something went wrong when I "
f"tried to look at it~ You can try examining it yourself "
f"with vision_analyze using image_url: {path}]"
)
# Combine: vision descriptions first, then the user's original text
if enriched_parts:
prefix = "\n\n".join(enriched_parts)
if user_text:
return f"{prefix}\n\n{user_text}"
return prefix
return user_text
async def _enrich_message_with_transcription(
self,
user_text: str,
audio_paths: List[str],
) -> str:
"""
Auto-transcribe user voice/audio messages using OpenAI Whisper API
and prepend the transcript to the message text.
Args:
user_text: The user's original caption / message text.
audio_paths: List of local file paths to cached audio files.
Returns:
The enriched message string with transcriptions prepended.
"""
from tools.transcription_tools import transcribe_audio
import asyncio
enriched_parts = []
for path in audio_paths:
try:
print(f"[gateway] Transcribing user voice: {path}", flush=True)
result = await asyncio.to_thread(transcribe_audio, path)
if result["success"]:
transcript = result["transcript"]
enriched_parts.append(
f'[The user sent a voice message~ '
f'Here\'s what they said: "{transcript}"]'
)
else:
error = result.get("error", "unknown error")
if "OPENAI_API_KEY" in error or "HERMES_OPENAI_API_KEY" in error:
enriched_parts.append(
"[The user sent a voice message but I can't listen "
"to it right now~ HERMES_OPENAI_API_KEY isn't set up yet "
"(';w;') Let them know!]"
)
else:
enriched_parts.append(
"[The user sent a voice message but I had trouble "
f"transcribing it~ ({error})]"
)
except Exception as e:
print(f"[gateway] Transcription error: {e}", flush=True)
enriched_parts.append(
"[The user sent a voice message but something went wrong "
"when I tried to listen to it~ Let them know!]"
)
if enriched_parts:
prefix = "\n\n".join(enriched_parts)
if user_text:
return f"{prefix}\n\n{user_text}"
return prefix
return user_text
async def _run_process_watcher(self, watcher: dict) -> None:
"""
Periodically check a background process and push updates to the user.
Runs as an asyncio task. Stays silent when nothing changed.
Auto-removes when the process exits or is killed.
"""
from tools.process_registry import process_registry
session_id = watcher["session_id"]
interval = watcher["check_interval"]
session_key = watcher.get("session_key", "")
platform_name = watcher.get("platform", "")
chat_id = watcher.get("chat_id", "")
print(f"[gateway] Process watcher started: {session_id} (every {interval}s)", flush=True)
last_output_len = 0
while True:
await asyncio.sleep(interval)
session = process_registry.get(session_id)
if session is None:
break
current_output_len = len(session.output_buffer)
has_new_output = current_output_len > last_output_len
last_output_len = current_output_len
if session.exited:
# Process finished -- deliver final update
new_output = session.output_buffer[-1000:] if session.output_buffer else ""
message_text = (
f"[Background process {session_id} finished with exit code {session.exit_code}~ "
f"Here's the final output:\n{new_output}]"
)
# Try to deliver to the originating platform
adapter = None
for p, a in self.adapters.items():
if p.value == platform_name:
adapter = a
break
if adapter and chat_id:
try:
await adapter.send(chat_id, message_text)
except Exception as e:
print(f"[gateway] Watcher delivery error: {e}", flush=True)
break
elif has_new_output:
# New output available -- deliver status update
new_output = session.output_buffer[-500:] if session.output_buffer else ""
message_text = (
f"[Background process {session_id} is still running~ "
f"New output:\n{new_output}]"
)
adapter = None
for p, a in self.adapters.items():
if p.value == platform_name:
adapter = a
break
if adapter and chat_id:
try:
await adapter.send(chat_id, message_text)
except Exception as e:
print(f"[gateway] Watcher delivery error: {e}", flush=True)
print(f"[gateway] Process watcher ended: {session_id}", flush=True)
async def _run_agent(
self,
message: str,
@@ -381,10 +802,16 @@ class GatewayRunner:
source: SessionSource,
session_id: str,
session_key: str = None
) -> str:
) -> Dict[str, Any]:
"""
Run the agent with the given message and context.
Returns the full result dict from run_conversation, including:
- "final_response": str (the text to send back)
- "messages": list (full conversation including tool calls)
- "api_calls": int
- "completed": bool
This is run in a thread pool to not block the event loop.
Supports interruption via new messages.
"""
@@ -418,23 +845,35 @@ class GatewayRunner:
return
last_tool[0] = tool_name
# Build progress message
# Build progress message with primary argument preview
tool_emojis = {
"terminal": "💻",
"web_search": "🔍",
"web_extract": "📄",
"read_file": "📖",
"write_file": "✍️",
"patch": "🔧",
"search": "🔎",
"list_directory": "📂",
"image_generate": "🎨",
"text_to_speech": "🔊",
"browser_navigate": "🌐",
"browser_click": "👆",
"browser_type": "⌨️",
"browser_snapshot": "📸",
"moa_query": "🧠",
"mixture_of_agents": "🧠",
"vision_analyze": "👁️",
"skill_view": "📚",
"skills_list": "📋",
}
emoji = tool_emojis.get(tool_name, "⚙️")
if tool_name == "terminal" and preview:
msg = f"{emoji} `{preview}`..."
if preview:
# Truncate preview to keep messages clean
if len(preview) > 40:
preview = preview[:37] + "..."
msg = f"{emoji} {tool_name}... \"{preview}\""
else:
msg = f"{emoji} {tool_name}..."
@@ -475,11 +914,20 @@ class GatewayRunner:
# We need to share the agent instance for interrupt support
agent_holder = [None] # Mutable container for the agent instance
result_holder = [None] # Mutable container for the result
tools_holder = [None] # Mutable container for the tool definitions
def run_sync():
# Pass session_key to process registry via env var so background
# processes can be mapped back to this gateway session
os.environ["HERMES_SESSION_KEY"] = session_key or ""
# Read from env var or use default (same as CLI)
max_iterations = int(os.getenv("HERMES_MAX_ITERATIONS", "60"))
# Map platform enum to the platform hint key the agent understands.
# Platform.LOCAL ("local") maps to "cli"; others pass through as-is.
platform_key = "cli" if source.platform == Platform.LOCAL else source.platform.value
agent = AIAgent(
model=os.getenv("HERMES_MODEL", "anthropic/claude-opus-4.6"),
max_iterations=max_iterations,
@@ -488,32 +936,104 @@ class GatewayRunner:
ephemeral_system_prompt=context_prompt,
session_id=session_id,
tool_progress_callback=progress_callback if tool_progress_enabled else None,
platform=platform_key, # Tells the agent which interface to format for
)
# Store agent reference for interrupt support
agent_holder[0] = agent
# Capture the full tool definitions for transcript logging
tools_holder[0] = agent.tools if hasattr(agent, 'tools') else None
# Convert transcript history to agent format
# Transcript has timestamps; agent expects {"role": ..., "content": ...}
# Convert history to agent format.
# Two cases:
# 1. Normal path (from transcript): simple {role, content, timestamp} dicts
# - Strip timestamps, keep role+content
# 2. Interrupt path (from agent result["messages"]): full agent messages
# that may include tool_calls, tool_call_id, reasoning, etc.
# - These must be passed through intact so the API sees valid
# assistant→tool sequences (dropping tool_calls causes 500 errors)
agent_history = []
for msg in history:
role = msg.get("role")
content = msg.get("content")
if role and content:
agent_history.append({"role": role, "content": content})
if not role:
continue
# Skip metadata entries (tool definitions, session info)
# -- these are for transcript logging, not for the LLM
if role in ("session_meta",):
continue
# Skip system messages -- the agent rebuilds its own system prompt
if role == "system":
continue
# Rich agent messages (tool_calls, tool results) must be passed
# through intact so the API sees valid assistant→tool sequences
has_tool_calls = "tool_calls" in msg
has_tool_call_id = "tool_call_id" in msg
is_tool_message = role == "tool"
if has_tool_calls or has_tool_call_id or is_tool_message:
clean_msg = {k: v for k, v in msg.items() if k != "timestamp"}
agent_history.append(clean_msg)
else:
# Simple text message - just need role and content
content = msg.get("content")
if content:
agent_history.append({"role": role, "content": content})
result = agent.run_conversation(message, conversation_history=agent_history)
result_holder[0] = result
# Return final response, or a message if something went wrong
final_response = result.get("final_response")
if final_response:
return final_response
elif result.get("error"):
# Agent couldn't recover - show the error
return f"⚠️ {result['error']}"
else:
return "(No response generated)"
if not final_response:
error_msg = f"⚠️ {result['error']}" if result.get("error") else "(No response generated)"
return {
"final_response": error_msg,
"messages": result.get("messages", []),
"api_calls": result.get("api_calls", 0),
"tools": tools_holder[0] or [],
}
# Scan tool results for MEDIA:<path> tags that need to be delivered
# as native audio/file attachments. The TTS tool embeds MEDIA: tags
# in its JSON response, but the model's final text reply usually
# doesn't include them. We collect unique tags from tool results and
# append any that aren't already present in the final response, so the
# adapter's extract_media() can find and deliver the files exactly once.
if "MEDIA:" not in final_response:
media_tags = []
has_voice_directive = False
for msg in result.get("messages", []):
if msg.get("role") == "tool" or msg.get("role") == "function":
content = msg.get("content", "")
if "MEDIA:" in content:
for match in re.finditer(r'MEDIA:(\S+)', content):
path = match.group(1).strip().rstrip('",}')
if path:
media_tags.append(f"MEDIA:{path}")
if "[[audio_as_voice]]" in content:
has_voice_directive = True
if media_tags:
# Deduplicate while preserving order
seen = set()
unique_tags = []
for tag in media_tags:
if tag not in seen:
seen.add(tag)
unique_tags.append(tag)
if has_voice_directive:
unique_tags.insert(0, "[[audio_as_voice]]")
final_response = final_response + "\n" + "\n".join(unique_tags)
return {
"final_response": final_response,
"messages": result_holder[0].get("messages", []) if result_holder[0] else [],
"api_calls": result_holder[0].get("api_calls", 0) if result_holder[0] else 0,
"tools": tools_holder[0] or [],
}
# Start progress message sender if enabled
progress_task = None
@@ -572,13 +1092,16 @@ class GatewayRunner:
if pending:
print(f"[gateway] 📨 Processing interrupted message: '{pending[:40]}...'")
# Add an indicator to the response
if response:
response = response + "\n\n---\n_[Interrupted - processing your new message]_"
# Send the interrupted response first
if adapter and response:
await adapter.send(chat_id=source.chat_id, content=response)
# Clear the adapter's interrupt event so the next _run_agent call
# doesn't immediately re-trigger the interrupt before the new agent
# even makes its first API call (this was causing an infinite loop).
if adapter and hasattr(adapter, '_active_sessions') and source.chat_id in adapter._active_sessions:
adapter._active_sessions[source.chat_id].clear()
# Don't send the interrupted response to the user — it's just noise
# like "Operation interrupted." They already know they sent a new
# message, so go straight to processing it.
# Now process the pending message with updated history
updated_history = result.get("messages", history)
@@ -612,11 +1135,13 @@ class GatewayRunner:
return response
async def start_gateway(config: Optional[GatewayConfig] = None) -> None:
async def start_gateway(config: Optional[GatewayConfig] = None) -> bool:
"""
Start the gateway and run until interrupted.
This is the main entry point for running the gateway.
Returns True if the gateway ran successfully, False if it failed to start.
A False return causes a non-zero exit code so systemd can auto-restart.
"""
runner = GatewayRunner(config)
@@ -635,10 +1160,11 @@ async def start_gateway(config: Optional[GatewayConfig] = None) -> None:
# Start the gateway
success = await runner.start()
if not success:
return
return False
# Wait for shutdown
await runner.wait_for_shutdown()
return True
def main():
@@ -658,8 +1184,11 @@ def main():
data = json.load(f)
config = GatewayConfig.from_dict(data)
# Run the gateway
asyncio.run(start_gateway(config))
# Run the gateway - exit with code 1 if no platforms connected,
# so systemd Restart=on-failure will retry on transient errors (e.g. DNS)
success = asyncio.run(start_gateway(config))
if not success:
sys.exit(1)
if __name__ == "__main__":
+12 -1
View File
@@ -270,11 +270,15 @@ class SessionStore:
- {session_id}.jsonl: Conversation transcripts
"""
def __init__(self, sessions_dir: Path, config: GatewayConfig):
def __init__(self, sessions_dir: Path, config: GatewayConfig,
has_active_processes_fn=None):
self.sessions_dir = sessions_dir
self.config = config
self._entries: Dict[str, SessionEntry] = {}
self._loaded = False
# Optional callback to check if a session has active background processes.
# When set, sessions with running processes are exempt from reset.
self._has_active_processes_fn = has_active_processes_fn
def _ensure_loaded(self) -> None:
"""Load sessions from disk if not already loaded."""
@@ -320,7 +324,14 @@ class SessionStore:
Check if a session should be reset based on policy.
Returns True if the session is stale and should start fresh.
Sessions with active background processes are never reset.
"""
# Don't reset sessions that have active background processes
if self._has_active_processes_fn:
session_key = self._generate_session_key(source)
if self._has_active_processes_fn(session_key):
return False
policy = self.config.get_reset_policy(
platform=source.platform,
session_type=source.chat_type
+111
View File
@@ -0,0 +1,111 @@
"""
Sticker description cache for Telegram.
When users send stickers, we describe them via the vision tool and cache
the descriptions keyed by file_unique_id so we don't re-analyze the same
sticker image on every send. Descriptions are concise (1-2 sentences).
Cache location: ~/.hermes/sticker_cache.json
"""
import json
import os
import time
from pathlib import Path
from typing import Optional
CACHE_PATH = Path(os.path.expanduser("~/.hermes/sticker_cache.json"))
# Vision prompt for describing stickers -- kept concise to save tokens
STICKER_VISION_PROMPT = (
"Describe this sticker in 1-2 sentences. Focus on what it depicts -- "
"character, action, emotion. Be concise and objective."
)
def _load_cache() -> dict:
"""Load the sticker cache from disk."""
if CACHE_PATH.exists():
try:
return json.loads(CACHE_PATH.read_text(encoding="utf-8"))
except (json.JSONDecodeError, OSError):
return {}
return {}
def _save_cache(cache: dict) -> None:
"""Save the sticker cache to disk."""
CACHE_PATH.parent.mkdir(parents=True, exist_ok=True)
CACHE_PATH.write_text(
json.dumps(cache, indent=2, ensure_ascii=False),
encoding="utf-8",
)
def get_cached_description(file_unique_id: str) -> Optional[dict]:
"""
Look up a cached sticker description.
Returns:
dict with keys {description, emoji, set_name, cached_at} or None.
"""
cache = _load_cache()
return cache.get(file_unique_id)
def cache_sticker_description(
file_unique_id: str,
description: str,
emoji: str = "",
set_name: str = "",
) -> None:
"""
Store a sticker description in the cache.
Args:
file_unique_id: Telegram's stable sticker identifier.
description: Vision-generated description text.
emoji: Associated emoji (e.g. "😀").
set_name: Sticker set name if available.
"""
cache = _load_cache()
cache[file_unique_id] = {
"description": description,
"emoji": emoji,
"set_name": set_name,
"cached_at": time.time(),
}
_save_cache(cache)
def build_sticker_injection(
description: str,
emoji: str = "",
set_name: str = "",
) -> str:
"""
Build the warm-style injection text for a sticker description.
Returns a string like:
[The user sent a sticker 😀 from "MyPack"~ It shows: "A cat waving" (=^.w.^=)]
"""
context = ""
if set_name and emoji:
context = f" {emoji} from \"{set_name}\""
elif emoji:
context = f" {emoji}"
return f"[The user sent a sticker{context}~ It shows: \"{description}\" (=^.w.^=)]"
def build_animated_sticker_injection(emoji: str = "") -> str:
"""
Build injection text for animated/video stickers we can't analyze.
"""
if emoji:
return (
f"[The user sent an animated sticker {emoji}~ "
f"I can't see animated ones yet, but the emoji suggests: {emoji}]"
)
return "[The user sent an animated sticker~ I can't see animated ones yet]"
+131 -19
View File
@@ -99,11 +99,40 @@ DEFAULT_CONFIG = {
"personality": "kawaii",
},
# Text-to-speech configuration
"tts": {
"provider": "edge", # "edge" (free) | "elevenlabs" (premium) | "openai"
"edge": {
"voice": "en-US-AriaNeural",
# Popular: AriaNeural, JennyNeural, AndrewNeural, BrianNeural, SoniaNeural
},
"elevenlabs": {
"voice_id": "pNInz6obpgDQGcFmaJgB", # Adam
"model_id": "eleven_multilingual_v2",
},
"openai": {
"model": "gpt-4o-mini-tts",
"voice": "alloy",
# Voices: alloy, echo, fable, onyx, nova, shimmer
},
},
"stt": {
"enabled": True,
"model": "whisper-1",
},
"human_delay": {
"mode": "off",
"min_ms": 800,
"max_ms": 2500,
},
# Permanently allowed dangerous command patterns (added via "always" approval)
"command_allowlist": [],
# Config schema version - bump this when adding new required fields
"_config_version": 1,
"_config_version": 2,
}
# =============================================================================
@@ -166,15 +195,31 @@ OPTIONAL_ENV_VARS = {
"password": True,
},
"OPENAI_BASE_URL": {
"description": "Custom OpenAI-compatible API endpoint URL",
"prompt": "API base URL (e.g., https://api.example.com/v1)",
"description": "Custom OpenAI-compatible API endpoint (for VLLM/SGLang/etc.)",
"prompt": "OpenAI-compatible base URL (only if running your own endpoint)",
"url": None,
"password": False,
"advanced": True, # Hide from standard migrate flow
},
"OPENAI_API_KEY": {
"description": "API key for custom OpenAI-compatible endpoint",
"prompt": "API key for custom endpoint",
"url": None,
"HERMES_OPENAI_API_KEY": {
"description": "OpenAI API key for voice transcription (Whisper) and OpenAI TTS",
"prompt": "OpenAI API Key (for Whisper STT + TTS)",
"url": "https://platform.openai.com/api-keys",
"tools": ["voice_transcription", "openai_tts"],
"password": True,
},
"SLACK_BOT_TOKEN": {
"description": "Slack bot integration",
"prompt": "Slack Bot Token (xoxb-...)",
"url": "https://api.slack.com/apps",
"tools": ["slack"],
"password": True,
},
"SLACK_APP_TOKEN": {
"description": "Slack Socket Mode connection",
"prompt": "Slack App Token (xapp-...)",
"url": "https://api.slack.com/apps",
"tools": ["slack"],
"password": True,
},
# Messaging platform tokens
@@ -202,6 +247,13 @@ OPTIONAL_ENV_VARS = {
"url": None,
"password": False,
},
# Text-to-speech (premium providers)
"ELEVENLABS_API_KEY": {
"description": "ElevenLabs API key for premium text-to-speech voices",
"prompt": "ElevenLabs API key",
"url": "https://elevenlabs.io/",
"password": True,
},
# Terminal configuration
"MESSAGING_CWD": {
"description": "Working directory for terminal commands via messaging (Telegram/Discord/etc). CLI always uses current directory.",
@@ -350,6 +402,44 @@ def migrate_config(interactive: bool = True, quiet: bool = False) -> Dict[str, A
results["warnings"].append(f"Skipped {var['name']} - some features may not work")
print()
# Check for missing optional env vars and offer to configure interactively
# Skip "advanced" vars (like OPENAI_BASE_URL) -- those are for power users
missing_optional = get_missing_env_vars(required_only=False)
required_names = {v["name"] for v in missing_env} if missing_env else set()
missing_optional = [
v for v in missing_optional
if v["name"] not in required_names and not v.get("advanced")
]
if interactive and missing_optional:
print(" Would you like to configure any optional keys now?")
try:
answer = input(" Configure optional keys? [y/N]: ").strip().lower()
except (EOFError, KeyboardInterrupt):
answer = "n"
if answer in ("y", "yes"):
print()
for var in missing_optional:
desc = var.get("description", "")
if var.get("url"):
print(f" {desc}")
print(f" Get your key at: {var['url']}")
else:
print(f" {desc}")
if var.get("password"):
import getpass
value = getpass.getpass(f" {var['prompt']} (Enter to skip): ")
else:
value = input(f" {var['prompt']} (Enter to skip): ").strip()
if value:
save_env_value(var["name"], value)
results["env_added"].append(var["name"])
print(f" ✓ Saved {var['name']}")
print()
# Check for missing config fields
missing_config = get_missing_config_fields()
@@ -388,16 +478,18 @@ def migrate_config(interactive: bool = True, quiet: bool = False) -> Dict[str, A
def load_config() -> Dict[str, Any]:
"""Load configuration from ~/.hermes/config.yaml."""
import copy
config_path = get_config_path()
config = DEFAULT_CONFIG.copy()
# Deep copy to avoid mutating DEFAULT_CONFIG
config = copy.deepcopy(DEFAULT_CONFIG)
if config_path.exists():
try:
with open(config_path) as f:
user_config = yaml.safe_load(f) or {}
# Deep merge
# Deep merge user values over defaults
for key, value in user_config.items():
if isinstance(value, dict) and key in config and isinstance(config[key], dict):
config[key].update(value)
@@ -454,6 +546,9 @@ def save_env_value(key: str, value: str):
break
if not found:
# Ensure there's a newline at the end of the file before appending
if lines and not lines[-1].endswith("\n"):
lines[-1] += "\n"
lines.append(f"{key}={value}\n")
with open(env_path, 'w') as f:
@@ -508,7 +603,7 @@ def show_config():
keys = [
("OPENROUTER_API_KEY", "OpenRouter"),
("ANTHROPIC_API_KEY", "Anthropic"),
("OPENAI_API_KEY", "OpenAI"),
("HERMES_OPENAI_API_KEY", "OpenAI (STT/TTS)"),
("FIRECRAWL_API_KEY", "Firecrawl"),
("BROWSERBASE_API_KEY", "Browserbase"),
("FAL_KEY", "FAL"),
@@ -608,11 +703,11 @@ def set_config_value(key: str, value: str):
"""Set a configuration value."""
# Check if it's an API key (goes to .env)
api_keys = [
'OPENROUTER_API_KEY', 'ANTHROPIC_API_KEY', 'OPENAI_API_KEY',
'OPENROUTER_API_KEY', 'ANTHROPIC_API_KEY', 'HERMES_OPENAI_API_KEY',
'FIRECRAWL_API_KEY', 'BROWSERBASE_API_KEY', 'BROWSERBASE_PROJECT_ID',
'FAL_KEY', 'TELEGRAM_BOT_TOKEN', 'DISCORD_BOT_TOKEN',
'TERMINAL_SSH_HOST', 'TERMINAL_SSH_USER', 'TERMINAL_SSH_KEY',
'SUDO_PASSWORD'
'SUDO_PASSWORD', 'SLACK_BOT_TOKEN', 'SLACK_APP_TOKEN',
]
if key.upper() in api_keys or key.upper().startswith('TERMINAL_SSH'):
@@ -621,14 +716,23 @@ def set_config_value(key: str, value: str):
return
# Otherwise it goes to config.yaml
config = load_config()
# Read the raw user config (not merged with defaults) to avoid
# dumping all default values back to the file
config_path = get_config_path()
user_config = {}
if config_path.exists():
try:
with open(config_path) as f:
user_config = yaml.safe_load(f) or {}
except Exception:
user_config = {}
# Handle nested keys (e.g., "terminal.backend")
# Handle nested keys (e.g., "tts.provider")
parts = key.split('.')
current = config
current = user_config
for part in parts[:-1]:
if part not in current:
if part not in current or not isinstance(current.get(part), dict):
current[part] = {}
current = current[part]
@@ -643,8 +747,13 @@ def set_config_value(key: str, value: str):
value = float(value)
current[parts[-1]] = value
save_config(config)
print(f"✓ Set {key} = {value} in {get_config_path()}")
# Write only user config back (not the full merged defaults)
ensure_hermes_home()
with open(config_path, 'w') as f:
yaml.dump(user_config, f, default_flow_style=False, sort_keys=False)
print(f"✓ Set {key} = {value} in {config_path}")
# =============================================================================
@@ -703,7 +812,10 @@ def config_command(args):
print(f"\n {len(missing_config)} new config option(s) will be added with defaults")
required_missing = [v for v in missing_env if v.get("is_required")]
optional_missing = [v for v in missing_env if not v.get("is_required")]
optional_missing = [
v for v in missing_env
if not v.get("is_required") and not v.get("advanced")
]
if required_missing:
print(f"\n ⚠️ {len(required_missing)} required API key(s) missing:")
+61 -11
View File
@@ -10,7 +10,18 @@ import subprocess
import shutil
from pathlib import Path
PROJECT_ROOT = Path(__file__).parent.parent.resolve()
from hermes_cli.config import get_project_root, get_hermes_home, get_env_path
PROJECT_ROOT = get_project_root()
HERMES_HOME = get_hermes_home()
# Load environment variables from ~/.hermes/.env so API key checks work
from dotenv import load_dotenv
_env_path = get_env_path()
if _env_path.exists():
load_dotenv(_env_path)
# Also try project .env as fallback
load_dotenv(PROJECT_ROOT / ".env", override=False)
# ANSI colors
class Colors:
@@ -92,7 +103,6 @@ def run_doctor(args):
optional_packages = [
("croniter", "Croniter (cron expressions)"),
("browserbase", "Browserbase SDK"),
("telegram", "python-telegram-bot"),
("discord", "discord.py"),
]
@@ -118,27 +128,38 @@ def run_doctor(args):
print()
print(color("◆ Configuration Files", Colors.CYAN, Colors.BOLD))
env_path = PROJECT_ROOT / '.env'
# Check ~/.hermes/.env (primary location for user config)
env_path = HERMES_HOME / '.env'
if env_path.exists():
check_ok(".env file exists")
check_ok("~/.hermes/.env file exists")
# Check for common issues
content = env_path.read_text()
if "OPENROUTER_API_KEY" in content or "ANTHROPIC_API_KEY" in content:
check_ok("API key configured")
else:
check_warn("No API key found in .env")
check_warn("No API key found in ~/.hermes/.env")
issues.append("Run 'hermes setup' to configure API keys")
else:
check_fail(".env file missing")
check_info("Run 'hermes setup' to create one")
issues.append("Run 'hermes setup' to create .env")
# Also check project root as fallback
fallback_env = PROJECT_ROOT / '.env'
if fallback_env.exists():
check_ok(".env file exists (in project directory)")
else:
check_fail("~/.hermes/.env file missing")
check_info("Run 'hermes setup' to create one")
issues.append("Run 'hermes setup' to create .env")
config_path = PROJECT_ROOT / 'cli-config.yaml'
# Check ~/.hermes/config.yaml (primary) or project cli-config.yaml (fallback)
config_path = HERMES_HOME / 'config.yaml'
if config_path.exists():
check_ok("cli-config.yaml exists")
check_ok("~/.hermes/config.yaml exists")
else:
check_warn("cli-config.yaml not found", "(using defaults)")
fallback_config = PROJECT_ROOT / 'cli-config.yaml'
if fallback_config.exists():
check_ok("cli-config.yaml exists (in project directory)")
else:
check_warn("config.yaml not found", "(using defaults)")
# =========================================================================
# Check: Directory structure
@@ -152,6 +173,23 @@ def run_doctor(args):
else:
check_warn("~/.hermes not found", "(will be created on first use)")
# Check for SOUL.md persona file
soul_path = hermes_home / "SOUL.md"
if soul_path.exists():
content = soul_path.read_text(encoding="utf-8").strip()
# Check if it's just the template comments (no real content)
lines = [l for l in content.splitlines() if l.strip() and not l.strip().startswith(("<!--", "-->", "#"))]
if lines:
check_ok("~/.hermes/SOUL.md exists (persona configured)")
else:
check_info("~/.hermes/SOUL.md exists but is empty — edit it to customize personality")
else:
check_warn("~/.hermes/SOUL.md not found", "(create it to give Hermes a custom personality)")
if should_fix:
soul_path.parent.mkdir(parents=True, exist_ok=True)
soul_path.write_text("# Hermes Agent Persona\n\n<!-- Edit this file to customize how Hermes communicates. -->\n", encoding="utf-8")
check_ok("Created ~/.hermes/SOUL.md")
logs_dir = PROJECT_ROOT / "logs"
if logs_dir.exists():
check_ok("logs/ directory exists")
@@ -216,6 +254,18 @@ def run_doctor(args):
check_fail("TERMINAL_SSH_HOST not set", "(required for TERMINAL_ENV=ssh)")
issues.append("Set TERMINAL_SSH_HOST in .env")
# Node.js + agent-browser (for browser automation tools)
if shutil.which("node"):
check_ok("Node.js")
# Check if agent-browser is installed
agent_browser_path = PROJECT_ROOT / "node_modules" / "agent-browser"
if agent_browser_path.exists():
check_ok("agent-browser (Node.js)", "(browser automation)")
else:
check_warn("agent-browser not installed", "(run: npm install)")
else:
check_warn("Node.js not found", "(optional, needed for browser tools)")
# =========================================================================
# Check: API connectivity
# =========================================================================
+5 -1
View File
@@ -360,7 +360,11 @@ def run_gateway(verbose: bool = False):
print("└─────────────────────────────────────────────────────────┘")
print()
asyncio.run(start_gateway())
# Exit with code 1 if gateway fails to connect any platform,
# so systemd Restart=on-failure will retry on transient errors
success = asyncio.run(start_gateway())
if not success:
sys.exit(1)
# =============================================================================
+28
View File
@@ -446,6 +446,34 @@ For more help on a command:
config_parser.set_defaults(func=cmd_config)
# =========================================================================
# pairing command
# =========================================================================
pairing_parser = subparsers.add_parser(
"pairing",
help="Manage DM pairing codes for user authorization",
description="Approve or revoke user access via pairing codes"
)
pairing_sub = pairing_parser.add_subparsers(dest="pairing_action")
pairing_list_parser = pairing_sub.add_parser("list", help="Show pending + approved users")
pairing_approve_parser = pairing_sub.add_parser("approve", help="Approve a pairing code")
pairing_approve_parser.add_argument("platform", help="Platform name (telegram, discord, slack, whatsapp)")
pairing_approve_parser.add_argument("code", help="Pairing code to approve")
pairing_revoke_parser = pairing_sub.add_parser("revoke", help="Revoke user access")
pairing_revoke_parser.add_argument("platform", help="Platform name")
pairing_revoke_parser.add_argument("user_id", help="User ID to revoke")
pairing_clear_parser = pairing_sub.add_parser("clear-pending", help="Clear all pending codes")
def cmd_pairing(args):
from hermes_cli.pairing import pairing_command
pairing_command(args)
pairing_parser.set_defaults(func=cmd_pairing)
# =========================================================================
# version command
# =========================================================================
+100
View File
@@ -0,0 +1,100 @@
"""
CLI commands for the DM pairing system.
Usage:
hermes pairing list # Show all pending + approved users
hermes pairing approve <platform> <code> # Approve a pairing code
hermes pairing revoke <platform> <user_id> # Revoke user access
hermes pairing clear-pending # Clear all expired/pending codes
"""
import sys
def pairing_command(args):
"""Handle hermes pairing subcommands."""
from gateway.pairing import PairingStore
store = PairingStore()
action = getattr(args, "pairing_action", None)
if action == "list":
_cmd_list(store)
elif action == "approve":
_cmd_approve(store, args.platform, args.code)
elif action == "revoke":
_cmd_revoke(store, args.platform, args.user_id)
elif action == "clear-pending":
_cmd_clear_pending(store)
else:
print("Usage: hermes pairing {list|approve|revoke|clear-pending}")
print("Run 'hermes pairing --help' for details.")
def _cmd_list(store):
"""List all pending and approved users."""
pending = store.list_pending()
approved = store.list_approved()
if not pending and not approved:
print("No pairing data found. No one has tried to pair yet~")
return
if pending:
print(f"\n Pending Pairing Requests ({len(pending)}):")
print(f" {'Platform':<12} {'Code':<10} {'User ID':<20} {'Name':<20} {'Age'}")
print(f" {'--------':<12} {'----':<10} {'-------':<20} {'----':<20} {'---'}")
for p in pending:
print(
f" {p['platform']:<12} {p['code']:<10} {p['user_id']:<20} "
f"{p.get('user_name', ''):<20} {p['age_minutes']}m ago"
)
else:
print("\n No pending pairing requests.")
if approved:
print(f"\n Approved Users ({len(approved)}):")
print(f" {'Platform':<12} {'User ID':<20} {'Name':<20}")
print(f" {'--------':<12} {'-------':<20} {'----':<20}")
for a in approved:
print(f" {a['platform']:<12} {a['user_id']:<20} {a.get('user_name', ''):<20}")
else:
print("\n No approved users.")
print()
def _cmd_approve(store, platform: str, code: str):
"""Approve a pairing code."""
platform = platform.lower().strip()
code = code.upper().strip()
result = store.approve_code(platform, code)
if result:
uid = result["user_id"]
name = result.get("user_name", "")
display = f"{name} ({uid})" if name else uid
print(f"\n Approved! User {display} on {platform} can now use the bot~")
print(f" They'll be recognized automatically on their next message.\n")
else:
print(f"\n Code '{code}' not found or expired for platform '{platform}'.")
print(f" Run 'hermes pairing list' to see pending codes.\n")
def _cmd_revoke(store, platform: str, user_id: str):
"""Revoke a user's access."""
platform = platform.lower().strip()
if store.revoke(platform, user_id):
print(f"\n Revoked access for user {user_id} on {platform}.\n")
else:
print(f"\n User {user_id} not found in approved list for {platform}.\n")
def _cmd_clear_pending(store):
"""Clear all pending pairing codes."""
count = store.clear_pending()
if count:
print(f"\n Cleared {count} pending pairing request(s).\n")
else:
print("\n No pending requests to clear.\n")
+27
View File
@@ -186,6 +186,11 @@ def _print_setup_summary(config: dict, hermes_home):
else:
tool_status.append(("Image Generation", False, "FAL_KEY"))
# TTS (always available via Edge TTS; ElevenLabs/OpenAI are optional)
tool_status.append(("Text-to-Speech (Edge TTS)", True, None))
if get_env_value('ELEVENLABS_API_KEY'):
tool_status.append(("Text-to-Speech (ElevenLabs)", True, None))
# Tinker + WandB (RL training)
if get_env_value('TINKER_API_KEY') and get_env_value('WANDB_API_KEY'):
tool_status.append(("RL Training (Tinker)", True, None))
@@ -991,6 +996,28 @@ def run_setup_wizard(args):
print_success(" Configured ✓")
print()
# ElevenLabs - Premium TTS
print_info("" * 50)
print(color(" Text-to-Speech - ElevenLabs (Premium)", Colors.CYAN))
print_info(" Enables: Premium TTS voices (Edge TTS is free and works without a key)")
print_info(" Use case: High-quality, customizable voice synthesis")
if get_env_value('ELEVENLABS_API_KEY'):
print_success(" Status: Configured ✓")
if prompt_yes_no(" Update ElevenLabs API key?", False):
api_key = prompt(" API key", password=True)
if api_key:
save_env_value("ELEVENLABS_API_KEY", api_key)
print_success(" Updated")
else:
print_warning(" Status: Not configured (free Edge TTS will be used by default)")
if prompt_yes_no(" Set up ElevenLabs?", False):
print_info(" Get your API key at: https://elevenlabs.io/")
api_key = prompt(" API key", password=True)
if api_key:
save_env_value("ELEVENLABS_API_KEY", api_key)
print_success(" Configured ✓")
print()
# Tinker + WandB - RL Training
print_info("" * 50)
print(color(" RL Training (Tinker + WandB)", Colors.CYAN))
+11 -1
View File
@@ -76,6 +76,7 @@ def show_status(args):
"FAL": "FAL_KEY",
"Tinker": "TINKER_API_KEY",
"WandB": "WANDB_API_KEY",
"ElevenLabs": "ELEVENLABS_API_KEY",
}
for name, env_var in keys.items():
@@ -90,7 +91,16 @@ def show_status(args):
print()
print(color("◆ Terminal Backend", Colors.CYAN, Colors.BOLD))
terminal_env = os.getenv("TERMINAL_ENV", "local")
terminal_env = os.getenv("TERMINAL_ENV", "")
if not terminal_env:
# Fall back to config file value when env var isn't set
# (hermes status doesn't go through cli.py's config loading)
try:
from hermes_cli.config import load_config
_cfg = load_config()
terminal_env = _cfg.get("terminal", {}).get("backend", "local")
except Exception:
terminal_env = "local"
print(f" Backend: {terminal_env}")
if terminal_env == "ssh":
+298 -41
View File
@@ -41,7 +41,7 @@ from tools.terminal_hecate import terminal_hecate_tool, check_hecate_requirement
from tools.vision_tools import vision_analyze_tool, check_vision_requirements
from tools.mixture_of_agents_tool import mixture_of_agents_tool, check_moa_requirements
from tools.image_generation_tool import image_generate_tool, check_image_generation_requirements
from tools.skills_tool import skills_categories, skills_list, skill_view, check_skills_requirements, SKILLS_TOOL_DESCRIPTION
from tools.skills_tool import skills_list, skill_view, check_skills_requirements, SKILLS_TOOL_DESCRIPTION
# RL Training tools (Tinker-Atropos)
from tools.rl_training_tool import (
rl_list_environments,
@@ -83,6 +83,8 @@ from tools.browser_tool import (
check_browser_requirements,
BROWSER_TOOL_SCHEMAS
)
# Text-to-speech tool (Edge TTS / ElevenLabs / OpenAI)
from tools.tts_tool import text_to_speech_tool, check_tts_requirements
from toolsets import (
get_toolset, resolve_toolset, resolve_multiple_toolsets,
get_all_toolsets, get_toolset_names, validate_toolset,
@@ -143,7 +145,7 @@ TOOLSET_REQUIREMENTS = {
"env_vars": [], # Just needs skills directory
"check_fn": check_skills_requirements,
"setup_url": None,
"tools": ["skills_categories", "skills_list", "skill_view"],
"tools": ["skills_list", "skill_view"],
},
"rl": {
"name": "RL Training (Tinker-Atropos)",
@@ -165,6 +167,13 @@ TOOLSET_REQUIREMENTS = {
"setup_url": None,
"tools": ["read_file", "write_file", "patch", "search"],
},
"tts": {
"name": "Text-to-Speech",
"env_vars": [], # Edge TTS needs no key; premium providers checked at runtime
"check_fn": check_tts_requirements,
"setup_url": None,
"tools": ["text_to_speech"],
},
}
@@ -311,6 +320,20 @@ def get_terminal_tool_definitions() -> List[Dict[str, Any]]:
"type": "integer",
"description": "Command timeout in seconds (optional)",
"minimum": 1
},
"workdir": {
"type": "string",
"description": "Working directory for this command (absolute path). Defaults to the session working directory."
},
"check_interval": {
"type": "integer",
"description": "Seconds between automatic status checks for background processes (gateway/messaging only, minimum 30). When set, I'll proactively report progress.",
"minimum": 30
},
"pty": {
"type": "boolean",
"description": "Run in pseudo-terminal (PTY) mode for interactive CLI tools like Codex, Claude Code, or Python REPL. Only works with local and SSH backends. Default: false.",
"default": False
}
},
"required": ["command"]
@@ -332,13 +355,13 @@ def get_vision_tool_definitions() -> List[Dict[str, Any]]:
"type": "function",
"function": {
"name": "vision_analyze",
"description": "Analyze images from URLs using AI vision. Provides comprehensive image description and answers specific questions about the image content. Perfect for understanding visual content, reading text in images, identifying objects, analyzing scenes, and extracting visual information.",
"description": "Analyze images using AI vision. Accepts HTTP/HTTPS URLs or local file paths (e.g. from the image cache). Provides comprehensive image description and answers specific questions about the image content. Perfect for understanding visual content, reading text in images, identifying objects, analyzing scenes, and extracting visual information.",
"parameters": {
"type": "object",
"properties": {
"image_url": {
"type": "string",
"description": "The URL of the image to analyze (must be publicly accessible HTTP/HTTPS URL)"
"description": "The URL or local file path of the image to analyze. Accepts publicly accessible HTTP/HTTPS URLs or local file paths (e.g. /home/user/.hermes/image_cache/abc123.jpg)."
},
"question": {
"type": "string",
@@ -392,7 +415,7 @@ def get_image_tool_definitions() -> List[Dict[str, Any]]:
"type": "function",
"function": {
"name": "image_generate",
"description": "Generate high-quality images from text prompts using FLUX 2 Pro model with automatic 2x upscaling. Creates detailed, artistic images that are automatically upscaled for hi-rez results. Returns a single upscaled image URL that can be displayed using <img src=\"{URL}\"></img> tags.",
"description": "Generate high-quality images from text prompts using FLUX 2 Pro model with automatic 2x upscaling. Creates detailed, artistic images that are automatically upscaled for hi-rez results. Returns a single upscaled image URL. Display it using markdown: ![description](URL)",
"parameters": {
"type": "object",
"properties": {
@@ -432,24 +455,7 @@ def get_skills_tool_definitions() -> List[Dict[str, Any]]:
"properties": {
"category": {
"type": "string",
"description": "Optional category filter (from skills_categories)"
}
},
"required": []
}
}
},
{
"type": "function",
"function": {
"name": "skills_categories",
"description": "List available skill categories. Call this first to discover what skill categories exist, then use skills_list(category) to see skills in a category.",
"parameters": {
"type": "object",
"properties": {
"verbose": {
"type": "boolean",
"description": "If true, include skill counts per category. Default: false."
"description": "Optional category filter to narrow results"
}
},
"required": []
@@ -879,6 +885,123 @@ def get_file_tool_definitions() -> List[Dict[str, Any]]:
]
def get_tts_tool_definitions() -> List[Dict[str, Any]]:
"""
Get tool definitions for text-to-speech tools in OpenAI's expected format.
Returns:
List[Dict]: List of TTS tool definitions compatible with OpenAI API
"""
return [
{
"type": "function",
"function": {
"name": "text_to_speech",
"description": "Convert text to speech audio. Returns a MEDIA: path that the platform delivers as a voice message. On Telegram it plays as a voice bubble, on Discord/WhatsApp as an audio attachment. In CLI mode, saves to ~/voice-memos/. Voice and provider are user-configured, not model-selected.",
"parameters": {
"type": "object",
"properties": {
"text": {
"type": "string",
"description": "The text to convert to speech. Keep under 4000 characters."
},
"output_path": {
"type": "string",
"description": "Optional custom file path to save the audio. Defaults to ~/voice-memos/<timestamp>.mp3"
}
},
"required": ["text"]
}
}
}
]
def get_send_message_tool_definitions():
"""Tool definitions for cross-channel messaging."""
return [
{
"type": "function",
"function": {
"name": "send_message",
"description": "Send a message to a user or channel on any connected messaging platform. Use this when the user asks you to send something to a different platform, or when delivering notifications/alerts to a specific destination.",
"parameters": {
"type": "object",
"properties": {
"target": {
"type": "string",
"description": "Delivery target. Format: 'platform' (uses home channel) or 'platform:chat_id' (specific chat). Examples: 'telegram', 'discord:123456789', 'slack:C01234ABCDE'"
},
"message": {
"type": "string",
"description": "The message text to send"
}
},
"required": ["target", "message"]
}
}
}
]
def get_process_tool_definitions() -> List[Dict[str, Any]]:
"""
Get tool definitions for the process management tool.
The process tool manages background processes started with terminal(background=true).
Actions: list, poll, log, wait, kill. Phase 2 adds: write, submit.
"""
return [
{
"type": "function",
"function": {
"name": "process",
"description": (
"Manage background processes started with terminal(background=true). "
"Actions: 'list' (show all), 'poll' (check status + new output), "
"'log' (full output with pagination), 'wait' (block until done or timeout), "
"'kill' (terminate), 'write' (send raw data to stdin), 'submit' (send data + Enter). "
"Use 'wait' when you have nothing else to do and want "
"to block until a background process finishes."
),
"parameters": {
"type": "object",
"properties": {
"action": {
"type": "string",
"enum": ["list", "poll", "log", "wait", "kill", "write", "submit"],
"description": "Action to perform on background processes"
},
"session_id": {
"type": "string",
"description": "Process session ID (from terminal background output). Required for poll/log/wait/kill."
},
"data": {
"type": "string",
"description": "Text to send to process stdin (for 'write' and 'submit' actions)"
},
"timeout": {
"type": "integer",
"description": "Max seconds to block for 'wait' action. Returns partial output on timeout.",
"minimum": 1
},
"offset": {
"type": "integer",
"description": "Line offset for 'log' action (default: last 200 lines)"
},
"limit": {
"type": "integer",
"description": "Max lines to return for 'log' action",
"minimum": 1
}
},
"required": ["action"]
}
}
}
]
def get_all_tool_names() -> List[str]:
"""
Get the names of all available tools across all toolsets.
@@ -894,7 +1017,7 @@ def get_all_tool_names() -> List[str]:
# Terminal tools (mini-swe-agent backend)
if check_terminal_requirements():
tool_names.extend(["terminal"])
tool_names.extend(["terminal", "process"])
# Vision tools
if check_vision_requirements():
@@ -910,7 +1033,7 @@ def get_all_tool_names() -> List[str]:
# Skills tools
if check_skills_requirements():
tool_names.extend(["skills_categories", "skills_list", "skill_view"])
tool_names.extend(["skills_list", "skill_view"])
# Browser automation tools
if check_browser_requirements():
@@ -943,6 +1066,13 @@ def get_all_tool_names() -> List[str]:
"read_file", "write_file", "patch", "search"
])
# Text-to-speech tools
if check_tts_requirements():
tool_names.extend(["text_to_speech"])
# Cross-channel messaging (always available on messaging platforms)
tool_names.extend(["send_message"])
return tool_names
@@ -953,11 +1083,11 @@ TOOL_TO_TOOLSET_MAP = {
"web_search": "web_tools",
"web_extract": "web_tools",
"terminal": "terminal_tools",
"process": "terminal_tools",
"vision_analyze": "vision_tools",
"mixture_of_agents": "moa_tools",
"image_generate": "image_tools",
# Skills tools
"skills_categories": "skills_tools",
"skills_list": "skills_tools",
"skill_view": "skills_tools",
# Browser automation tools
@@ -985,11 +1115,16 @@ TOOL_TO_TOOLSET_MAP = {
"rl_stop_training": "rl_tools",
"rl_get_results": "rl_tools",
"rl_list_runs": "rl_tools",
"rl_test_inference": "rl_tools",
# Text-to-speech tools
"text_to_speech": "tts_tools",
# File manipulation tools
"read_file": "file_tools",
"write_file": "file_tools",
"patch": "file_tools",
"search": "file_tools",
# Cross-channel messaging
"send_message": "messaging_tools",
}
@@ -1052,6 +1187,9 @@ def get_tool_definitions(
if check_terminal_requirements():
for tool in get_terminal_tool_definitions():
all_available_tools_map[tool["function"]["name"]] = tool
# Process management tool (paired with terminal)
for tool in get_process_tool_definitions():
all_available_tools_map[tool["function"]["name"]] = tool
if check_vision_requirements():
for tool in get_vision_tool_definitions():
@@ -1088,6 +1226,15 @@ def get_tool_definitions(
for tool in get_file_tool_definitions():
all_available_tools_map[tool["function"]["name"]] = tool
# Text-to-speech tools
if check_tts_requirements():
for tool in get_tts_tool_definitions():
all_available_tools_map[tool["function"]["name"]] = tool
# Cross-channel messaging (always available on messaging platforms)
for tool in get_send_message_tool_definitions():
all_available_tools_map[tool["function"]["name"]] = tool
# Determine which tools to include based on toolsets
tools_to_include = set()
@@ -1109,7 +1256,7 @@ def get_tool_definitions(
"vision_tools": ["vision_analyze"],
"moa_tools": ["mixture_of_agents"],
"image_tools": ["image_generate"],
"skills_tools": ["skills_categories", "skills_list", "skill_view"],
"skills_tools": ["skills_list", "skill_view"],
"browser_tools": [
"browser_navigate", "browser_snapshot", "browser_click",
"browser_type", "browser_scroll", "browser_back",
@@ -1124,7 +1271,8 @@ def get_tool_definitions(
"rl_stop_training", "rl_get_results",
"rl_list_runs", "rl_test_inference"
],
"file_tools": ["read_file", "write_file", "patch", "search"]
"file_tools": ["read_file", "write_file", "patch", "search"],
"tts_tools": ["text_to_speech"]
}
legacy_tools = legacy_map.get(toolset_name, [])
tools_to_include.update(legacy_tools)
@@ -1162,7 +1310,7 @@ def get_tool_definitions(
"vision_tools": ["vision_analyze"],
"moa_tools": ["mixture_of_agents"],
"image_tools": ["image_generate"],
"skills_tools": ["skills_categories", "skills_list", "skill_view"],
"skills_tools": ["skills_list", "skill_view"],
"browser_tools": [
"browser_navigate", "browser_snapshot", "browser_click",
"browser_type", "browser_scroll", "browser_back",
@@ -1177,7 +1325,8 @@ def get_tool_definitions(
"rl_stop_training", "rl_get_results",
"rl_list_runs", "rl_test_inference"
],
"file_tools": ["read_file", "write_file", "patch", "search"]
"file_tools": ["read_file", "write_file", "patch", "search"],
"tts_tools": ["text_to_speech"]
}
legacy_tools = legacy_map.get(toolset_name, [])
tools_to_include.difference_update(legacy_tools)
@@ -1267,15 +1416,70 @@ def handle_terminal_function_call(function_name: str, function_args: Dict[str, A
command = function_args.get("command")
background = function_args.get("background", False)
timeout = function_args.get("timeout")
# Note: force parameter exists internally but is NOT exposed to the model
# Dangerous command approval is handled via user prompts only
workdir = function_args.get("workdir")
check_interval = function_args.get("check_interval")
pty = function_args.get("pty", False)
return terminal_tool(command=command, background=background, timeout=timeout, task_id=task_id)
return terminal_tool(command=command, background=background, timeout=timeout, task_id=task_id, workdir=workdir, check_interval=check_interval, pty=pty)
else:
return json.dumps({"error": f"Unknown terminal function: {function_name}"}, ensure_ascii=False)
def handle_process_function_call(function_name: str, function_args: Dict[str, Any], task_id: Optional[str] = None) -> str:
"""
Handle function calls for the process management tool.
Routes actions (list, poll, log, wait, kill) to the ProcessRegistry.
"""
from tools.process_registry import process_registry
action = function_args.get("action", "")
session_id = function_args.get("session_id", "")
if action == "list":
sessions = process_registry.list_sessions(task_id=task_id)
return json.dumps({"processes": sessions}, ensure_ascii=False)
elif action == "poll":
if not session_id:
return json.dumps({"error": "session_id is required for poll"}, ensure_ascii=False)
return json.dumps(process_registry.poll(session_id), ensure_ascii=False)
elif action == "log":
if not session_id:
return json.dumps({"error": "session_id is required for log"}, ensure_ascii=False)
offset = function_args.get("offset", 0)
limit = function_args.get("limit", 200)
return json.dumps(process_registry.read_log(session_id, offset=offset, limit=limit), ensure_ascii=False)
elif action == "wait":
if not session_id:
return json.dumps({"error": "session_id is required for wait"}, ensure_ascii=False)
timeout = function_args.get("timeout")
return json.dumps(process_registry.wait(session_id, timeout=timeout), ensure_ascii=False)
elif action == "kill":
if not session_id:
return json.dumps({"error": "session_id is required for kill"}, ensure_ascii=False)
return json.dumps(process_registry.kill_process(session_id), ensure_ascii=False)
elif action == "write":
if not session_id:
return json.dumps({"error": "session_id is required for write"}, ensure_ascii=False)
data = function_args.get("data", "")
return json.dumps(process_registry.write_stdin(session_id, data), ensure_ascii=False)
elif action == "submit":
if not session_id:
return json.dumps({"error": "session_id is required for submit"}, ensure_ascii=False)
data = function_args.get("data", "")
return json.dumps(process_registry.submit_stdin(session_id, data), ensure_ascii=False)
else:
return json.dumps({"error": f"Unknown process action: {action}. Use: list, poll, log, wait, kill, write, submit"}, ensure_ascii=False)
def handle_vision_function_call(function_name: str, function_args: Dict[str, Any]) -> str:
"""
Handle function calls for vision tools.
@@ -1391,11 +1595,7 @@ def handle_skills_function_call(function_name: str, function_args: Dict[str, Any
Returns:
str: Function result as JSON string
"""
if function_name == "skills_categories":
verbose = function_args.get("verbose", False)
return skills_categories(verbose=verbose)
elif function_name == "skills_list":
if function_name == "skills_list":
category = function_args.get("category")
return skills_list(category=category)
@@ -1639,6 +1839,44 @@ def handle_file_function_call(
return json.dumps({"error": f"Unknown file function: {function_name}"}, ensure_ascii=False)
def handle_tts_function_call(
function_name: str,
function_args: Dict[str, Any]
) -> str:
"""
Handle function calls for text-to-speech tools.
Args:
function_name (str): Name of the TTS function to call
function_args (Dict): Arguments for the function
Returns:
str: Function result as JSON string
"""
if function_name == "text_to_speech":
text = function_args.get("text", "")
output_path = function_args.get("output_path")
return text_to_speech_tool(text=text, output_path=output_path)
return json.dumps({"error": f"Unknown TTS function: {function_name}"}, ensure_ascii=False)
def handle_send_message_function_call(function_name, function_args):
"""Handle cross-channel send_message tool calls."""
import json
target = function_args.get("target", "")
message = function_args.get("message", "")
if not target or not message:
return json.dumps({"error": "Both 'target' and 'message' are required"})
# Store the pending message for the gateway to deliver
# The gateway runner checks this after the agent loop completes
import os
os.environ["_HERMES_PENDING_SEND_TARGET"] = target
os.environ["_HERMES_PENDING_SEND_MESSAGE"] = message
return json.dumps({"success": True, "delivered_to": target, "note": "Message queued for delivery"})
def handle_function_call(
function_name: str,
function_args: Dict[str, Any],
@@ -1673,6 +1911,10 @@ def handle_function_call(
elif function_name in ["terminal"]:
return handle_terminal_function_call(function_name, function_args, task_id)
# Route process management tools
elif function_name in ["process"]:
return handle_process_function_call(function_name, function_args, task_id)
# Route vision tools
elif function_name in ["vision_analyze"]:
return handle_vision_function_call(function_name, function_args)
@@ -1686,7 +1928,7 @@ def handle_function_call(
return handle_image_function_call(function_name, function_args)
# Route skills tools
elif function_name in ["skills_categories", "skills_list", "skill_view"]:
elif function_name in ["skills_list", "skill_view"]:
return handle_skills_function_call(function_name, function_args)
# Route browser automation tools
@@ -1716,6 +1958,14 @@ def handle_function_call(
elif function_name in ["read_file", "write_file", "patch", "search"]:
return handle_file_function_call(function_name, function_args, task_id)
# Route text-to-speech tools
elif function_name in ["text_to_speech"]:
return handle_tts_function_call(function_name, function_args)
# Route cross-channel messaging
elif function_name == "send_message":
return handle_send_message_function_call(function_name, function_args)
else:
error_msg = f"Unknown function: {function_name}"
print(f"{error_msg}")
@@ -1767,7 +2017,7 @@ def get_available_toolsets() -> Dict[str, Dict[str, Any]]:
},
"skills_tools": {
"available": check_skills_requirements(),
"tools": ["skills_categories", "skills_list", "skill_view"],
"tools": ["skills_list", "skill_view"],
"description": "Access skill documents that provide specialized instructions, guidelines, or knowledge the agent can load on demand",
"requirements": ["skills/ directory in repo root"]
},
@@ -1793,6 +2043,12 @@ def get_available_toolsets() -> Dict[str, Dict[str, Any]]:
"tools": ["read_file", "write_file", "patch", "search"],
"description": "File manipulation tools: read/write files, search content/files, patch with fuzzy matching",
"requirements": ["Terminal backend available (local/docker/ssh/singularity/modal)"]
},
"tts_tools": {
"available": check_tts_requirements(),
"tools": ["text_to_speech"],
"description": "Text-to-speech: convert text to audio (Edge TTS free, ElevenLabs, OpenAI)",
"requirements": ["edge-tts package (free) or ELEVENLABS_API_KEY or OPENAI_API_KEY"]
}
}
@@ -1814,7 +2070,8 @@ def check_toolset_requirements() -> Dict[str, bool]:
"skills_tools": check_skills_requirements(),
"browser_tools": check_browser_requirements(),
"cronjob_tools": check_cronjob_requirements(),
"file_tools": check_file_requirements()
"file_tools": check_file_requirements(),
"tts_tools": check_tts_requirements()
}
if __name__ == "__main__":
+9 -1
View File
@@ -27,6 +27,8 @@ dependencies = [
# Tools
"firecrawl-py",
"fal-client",
# Text-to-speech (Edge TTS is free, no API key needed)
"edge-tts",
# mini-swe-agent deps (terminal tool)
"litellm>=1.75.5",
"typer",
@@ -36,15 +38,21 @@ dependencies = [
[project.optional-dependencies]
modal = ["swe-rex[modal]>=1.4.0"]
dev = ["pytest", "pytest-asyncio"]
messaging = ["python-telegram-bot>=20.0", "discord.py>=2.0", "aiohttp>=3.9.0"]
messaging = ["python-telegram-bot>=20.0", "discord.py>=2.0", "aiohttp>=3.9.0", "slack-bolt>=1.18.0", "slack-sdk>=3.27.0"]
cron = ["croniter"]
slack = ["slack-bolt>=1.18.0", "slack-sdk>=3.27.0"]
cli = ["simple-term-menu"]
tts-premium = ["elevenlabs"]
pty = ["ptyprocess>=0.7.0"]
all = [
"hermes-agent[modal]",
"hermes-agent[messaging]",
"hermes-agent[cron]",
"hermes-agent[cli]",
"hermes-agent[dev]",
"hermes-agent[tts-premium]",
"hermes-agent[slack]",
"hermes-agent[pty]",
]
[project.scripts]
+6
View File
@@ -29,6 +29,12 @@ platformdirs
# Optional: For Modal backend (cloud execution)
# swe-rex[modal]>=1.4.0 # Includes modal + boto3 + swe-rex runtime
# Text-to-speech (Edge TTS is free, no API key needed)
edge-tts
# Optional: Premium TTS providers
# elevenlabs # Uncomment if using ElevenLabs TTS (needs ELEVENLABS_API_KEY)
# Optional: For cron expression parsing (cronjob scheduling)
croniter
+585 -37
View File
@@ -20,6 +20,7 @@ Usage:
response = agent.run_conversation("Tell me about the latest Python updates")
"""
import copy
import json
import logging
import os
@@ -48,11 +49,46 @@ elif not os.getenv("HERMES_QUIET"):
# Import our tool system
from model_tools import get_tool_definitions, handle_function_call, check_toolset_requirements
from tools.terminal_tool import cleanup_vm
from tools.terminal_tool import cleanup_vm, set_interrupt_event as _set_terminal_interrupt
from tools.browser_tool import cleanup_browser
import requests
# =============================================================================
# Default Agent Identity & Platform Hints
# =============================================================================
# The default identity prompt is prepended to every conversation so the agent
# knows who it is and behaves consistently across platforms.
DEFAULT_AGENT_IDENTITY = (
"You are Hermes Agent, an intelligent AI assistant created by Nous Research. "
"You are helpful, knowledgeable, and direct. You assist users with a wide "
"range of tasks including answering questions, writing and editing code, "
"analyzing information, creative work, and executing actions via your tools. "
"You communicate clearly, admit uncertainty when appropriate, and prioritize "
"being genuinely useful over being verbose unless otherwise directed below."
)
# Platform-specific formatting hints appended to the system prompt.
# These tell the agent how to format its output for the current interface.
PLATFORM_HINTS = {
"whatsapp": (
"You are on a text messaging communication platform, WhatsApp. "
"Please do not use markdown as it does not render."
),
"telegram": (
"You are on a text messaging communication platform, Telegram. "
"Please do not use markdown as it does not render."
),
"discord": (
"You are in a Discord server or group chat communicating with your user."
),
"cli": (
"You are a CLI AI Agent. Try not to use markdown but simple text "
"renderable inside a terminal."
),
}
# =============================================================================
# Model Context Management
# =============================================================================
@@ -457,18 +493,404 @@ Write only the summary, starting with "[CONTEXT SUMMARY]:" prefix."""
return compressed
# =============================================================================
# Anthropic Prompt Caching (system_and_3 strategy)
# =============================================================================
# Reduces input token costs by ~75% on multi-turn conversations by caching
# the conversation prefix. Uses 4 cache_control breakpoints (Anthropic max):
# 1. System prompt (stable across all turns)
# 2-4. Last 3 non-system messages (rolling window)
#
# Cached tokens are read at 0.1x input price. Cache writes cost 1.25x (5m TTL)
# or 2x (1h TTL). Only applied to Claude models via OpenRouter.
def _apply_cache_marker(msg: dict, cache_marker: dict) -> None:
"""
Add cache_control to a single message, handling all format variations.
- tool messages: cache_control at message level (Anthropic API quirk)
- string content: converted to multipart content array
- list content: marker added to last item
- None content (assistant with tool_calls): message level
"""
role = msg.get("role", "")
content = msg.get("content")
if role == "tool":
msg["cache_control"] = cache_marker
return
if content is None:
msg["cache_control"] = cache_marker
return
if isinstance(content, str):
msg["content"] = [{"type": "text", "text": content, "cache_control": cache_marker}]
return
if isinstance(content, list) and content:
last = content[-1]
if isinstance(last, dict):
last["cache_control"] = cache_marker
def apply_anthropic_cache_control(
api_messages: List[Dict[str, Any]],
cache_ttl: str = "5m",
) -> List[Dict[str, Any]]:
"""
Apply system_and_3 caching strategy to messages for Anthropic models.
Places up to 4 cache_control breakpoints:
1. System prompt (index 0, stable across all turns)
2-4. Last 3 non-system messages (rolling cache frontier)
Each breakpoint tells Anthropic "cache everything from the start up to here."
Multiple breakpoints create a ladder of cached prefixes at different depths,
which provides robust cache hits even when the most recent cache entry hasn't
propagated yet.
Args:
api_messages: Fully assembled message list (system prompt first).
cache_ttl: "5m" (default, 1.25x write cost) or "1h" (2x write cost).
Returns:
Deep copy of messages with cache_control breakpoints injected.
"""
messages = copy.deepcopy(api_messages)
if not messages:
return messages
marker = {"type": "ephemeral"}
if cache_ttl == "1h":
marker["ttl"] = "1h"
breakpoints_used = 0
# Breakpoint 1: System prompt (always stable, gives a guaranteed minimum hit)
if messages[0].get("role") == "system":
_apply_cache_marker(messages[0], marker)
breakpoints_used += 1
# Breakpoints 2-4: Last 3 non-system messages (rolling window)
remaining = 4 - breakpoints_used
non_sys = [i for i in range(len(messages)) if messages[i].get("role") != "system"]
for idx in non_sys[-remaining:]:
_apply_cache_marker(messages[idx], marker)
return messages
# =============================================================================
# Default System Prompt Components
# =============================================================================
# Skills guidance - instructs the model to check skills before technical tasks
SKILLS_SYSTEM_PROMPT = """## Skills
Before answering technical questions about tools, frameworks, or workflows:
1. Check skills_categories to see if a relevant category exists
2. If a category matches your task, use skills_list with that category
3. If a skill matches, load it with skill_view and follow its instructions
# Skills guidance - embeds a compact skill index in the system prompt so
# the model can match skills at a glance without extra tool calls.
def build_skills_system_prompt() -> str:
"""
Build a dynamic skills system prompt by scanning the skills/ directory.
Returns a prompt section that lists all skill categories (with descriptions
from DESCRIPTION.md) and their skill names inline, so the model can
immediately see if a relevant skill exists and load it with a single
skill_view(name) call -- no discovery tool calls needed.
Returns:
str: The skills system prompt section, or empty string if no skills found.
"""
import re
from pathlib import Path
skills_dir = Path(__file__).parent / "skills"
if not skills_dir.exists():
return ""
# Scan for SKILL.md files grouped by category
skills_by_category = {}
for skill_file in skills_dir.rglob("SKILL.md"):
rel_path = skill_file.relative_to(skills_dir)
parts = rel_path.parts
if len(parts) >= 2:
category = parts[0]
skill_name = parts[-2] # Folder containing SKILL.md
else:
category = "general"
skill_name = skill_file.parent.name
skills_by_category.setdefault(category, []).append(skill_name)
if not skills_by_category:
return ""
# Load category descriptions from DESCRIPTION.md files (YAML frontmatter)
category_descriptions = {}
for category in skills_by_category:
desc_file = skills_dir / category / "DESCRIPTION.md"
if desc_file.exists():
try:
content = desc_file.read_text(encoding="utf-8")
# Parse description from YAML frontmatter: ---\ndescription: ...\n---
match = re.search(r"^---\s*\n.*?description:\s*(.+?)\s*\n.*?^---", content, re.MULTILINE | re.DOTALL)
if match:
category_descriptions[category] = match.group(1).strip()
except Exception:
pass
# Build compact index: category with description + skill names
index_lines = []
for category in sorted(skills_by_category.keys()):
desc = category_descriptions.get(category, "")
names = ", ".join(sorted(skills_by_category[category]))
if desc:
index_lines.append(f" {category}: {desc}")
else:
index_lines.append(f" {category}:")
index_lines.append(f" skills: {names}")
return (
"## Skills (mandatory)\n"
"Before replying, scan the skills below. If one clearly matches your task, "
"load it with skill_view(name) and follow its instructions.\n"
"\n"
"<available_skills>\n"
+ "\n".join(index_lines) + "\n"
"</available_skills>\n"
"\n"
"If none match, proceed normally without loading a skill."
)
Skills contain vetted, up-to-date instructions for specific tools and workflows."""
# =============================================================================
# Context File Injection (SOUL.md, AGENTS.md, .cursorrules)
# =============================================================================
# Maximum characters per context file before truncation
CONTEXT_FILE_MAX_CHARS = 20_000
# Truncation strategy: keep 70% from the head, 20% from the tail
CONTEXT_TRUNCATE_HEAD_RATIO = 0.7
CONTEXT_TRUNCATE_TAIL_RATIO = 0.2
def _truncate_content(content: str, filename: str, max_chars: int = CONTEXT_FILE_MAX_CHARS) -> str:
"""
Truncate content if it exceeds max_chars using a head/tail strategy.
Keeps 70% from the start and 20% from the end, with a truncation
marker in the middle so the model knows content was cut.
"""
if len(content) <= max_chars:
return content
head_chars = int(max_chars * CONTEXT_TRUNCATE_HEAD_RATIO)
tail_chars = int(max_chars * CONTEXT_TRUNCATE_TAIL_RATIO)
head = content[:head_chars]
tail = content[-tail_chars:]
marker = f"\n\n[...truncated {filename}: kept {head_chars}+{tail_chars} of {len(content)} chars. Use file tools to read the full file.]\n\n"
return head + marker + tail
def build_context_files_prompt(cwd: str = None) -> str:
"""
Discover and load context files (SOUL.md, AGENTS.md, .cursorrules)
for injection into the system prompt.
Discovery rules:
- AGENTS.md: Recursively search from cwd (only if top-level exists).
Each file becomes a ## section with its relative path.
- .cursorrules: Check cwd for .cursorrules file and .cursor/rules/*.mdc
- SOUL.md: Check cwd first, then ~/.hermes/SOUL.md as global fallback
Args:
cwd: Working directory to search from. Defaults to os.getcwd().
Returns:
str: The context files prompt section, or empty string if none found.
"""
import os
import glob as glob_mod
from pathlib import Path
if cwd is None:
cwd = os.getcwd()
cwd_path = Path(cwd).resolve()
sections = []
# ----- AGENTS.md (hierarchical, recursive) -----
top_level_agents = None
for name in ["AGENTS.md", "agents.md"]:
candidate = cwd_path / name
if candidate.exists():
top_level_agents = candidate
break
if top_level_agents:
# Recursively find all AGENTS.md files (case-insensitive)
agents_files = []
for root, dirs, files in os.walk(cwd_path):
# Skip hidden directories and common non-project dirs
dirs[:] = [d for d in dirs if not d.startswith('.') and d not in ('node_modules', '__pycache__', 'venv', '.venv')]
for f in files:
if f.lower() == "agents.md":
agents_files.append(Path(root) / f)
# Sort by path depth (top-level first, then deeper)
agents_files.sort(key=lambda p: len(p.parts))
total_agents_content = ""
for agents_path in agents_files:
try:
content = agents_path.read_text(encoding="utf-8").strip()
if content:
rel_path = agents_path.relative_to(cwd_path)
total_agents_content += f"## {rel_path}\n\n{content}\n\n"
except Exception:
pass
if total_agents_content:
total_agents_content = _truncate_content(total_agents_content, "AGENTS.md")
sections.append(total_agents_content)
# ----- .cursorrules -----
cursorrules_content = ""
# Check for .cursorrules file
cursorrules_file = cwd_path / ".cursorrules"
if cursorrules_file.exists():
try:
content = cursorrules_file.read_text(encoding="utf-8").strip()
if content:
cursorrules_content += f"## .cursorrules\n\n{content}\n\n"
except Exception:
pass
# Check for .cursor/rules/*.mdc files
cursor_rules_dir = cwd_path / ".cursor" / "rules"
if cursor_rules_dir.exists() and cursor_rules_dir.is_dir():
mdc_files = sorted(cursor_rules_dir.glob("*.mdc"))
for mdc_file in mdc_files:
try:
content = mdc_file.read_text(encoding="utf-8").strip()
if content:
cursorrules_content += f"## .cursor/rules/{mdc_file.name}\n\n{content}\n\n"
except Exception:
pass
if cursorrules_content:
cursorrules_content = _truncate_content(cursorrules_content, ".cursorrules")
sections.append(cursorrules_content)
# ----- SOUL.md (cwd first, then ~/.hermes/ fallback) -----
soul_content = ""
soul_path = None
for name in ["SOUL.md", "soul.md"]:
candidate = cwd_path / name
if candidate.exists():
soul_path = candidate
break
if not soul_path:
# Global fallback
global_soul = Path.home() / ".hermes" / "SOUL.md"
if global_soul.exists():
soul_path = global_soul
if soul_path:
try:
content = soul_path.read_text(encoding="utf-8").strip()
if content:
content = _truncate_content(content, "SOUL.md")
soul_content = f"## SOUL.md\n\nIf SOUL.md is present, embody its persona and tone. Avoid stiff, generic replies; follow its guidance unless higher-priority instructions override it.\n\n{content}"
sections.append(soul_content)
except Exception:
pass
# ----- Assemble -----
if not sections:
return ""
return "# Project Context\n\nThe following project context files have been loaded and should be followed:\n\n" + "\n".join(sections)
def _build_tool_preview(tool_name: str, args: dict, max_len: int = 40) -> str:
"""
Build a short preview of a tool call's primary argument for display.
Returns a truncated string showing the most informative argument,
or None if no meaningful preview is available.
Args:
tool_name: Name of the tool being called
args: The tool call arguments dict
max_len: Maximum preview length before truncation
Returns:
str or None: Short preview string, or None
"""
# Map tool names to their primary argument key(s)
primary_args = {
"terminal": "command",
"web_search": "query",
"web_extract": "urls",
"read_file": "path",
"write_file": "path",
"patch": "path",
"search": "pattern",
"browser_navigate": "url",
"browser_click": "ref",
"browser_type": "text",
"image_generate": "prompt",
"text_to_speech": "text",
"vision_analyze": "question",
"mixture_of_agents": "user_prompt",
"skill_view": "name",
"skills_list": "category",
"schedule_cronjob": "name",
}
# Special handling for the process tool -- show action + session_id
if tool_name == "process":
action = args.get("action", "")
session_id = args.get("session_id", "")
data = args.get("data", "")
timeout = args.get("timeout")
parts = [action]
if session_id:
parts.append(session_id[:16])
if data:
parts.append(f'"{data[:20]}"')
if timeout and action == "wait":
parts.append(f"{timeout}s")
return " ".join(parts) if parts else None
key = primary_args.get(tool_name)
if not key:
# Try common arg names as fallback
for fallback_key in ("query", "text", "command", "path", "name", "prompt"):
if fallback_key in args:
key = fallback_key
break
if not key or key not in args:
return None
value = args[key]
# Handle list values (e.g., urls)
if isinstance(value, list):
value = value[0] if value else ""
preview = str(value).strip()
if not preview:
return None
# Truncate
if len(preview) > max_len:
preview = preview[:max_len - 3] + "..."
return preview
class KawaiiSpinner:
@@ -605,6 +1027,8 @@ class AIAgent:
max_tokens: int = None,
reasoning_config: Dict[str, Any] = None,
prefill_messages: List[Dict[str, Any]] = None,
platform: str = None,
skip_context_files: bool = False,
):
"""
Initialize the AI Agent.
@@ -635,6 +1059,11 @@ class AIAgent:
prefill_messages (List[Dict]): Messages to prepend to conversation history as prefilled context.
Useful for injecting a few-shot example or priming the model's response style.
Example: [{"role": "user", "content": "Hi!"}, {"role": "assistant", "content": "Hello!"}]
platform (str): The interface platform the user is on (e.g. "cli", "telegram", "discord", "whatsapp").
Used to inject platform-specific formatting hints into the system prompt.
skip_context_files (bool): If True, skip auto-injection of SOUL.md, AGENTS.md, and .cursorrules
into the system prompt. Use this for batch processing and data generation to avoid
polluting trajectories with user-specific persona or project instructions.
"""
self.model = model
self.max_iterations = max_iterations
@@ -643,9 +1072,13 @@ class AIAgent:
self.verbose_logging = verbose_logging
self.quiet_mode = quiet_mode
self.ephemeral_system_prompt = ephemeral_system_prompt
self.platform = platform # "cli", "telegram", "discord", "whatsapp", etc.
self.skip_context_files = skip_context_files
self.log_prefix_chars = log_prefix_chars
self.log_prefix = f"{log_prefix} " if log_prefix else ""
self.base_url = base_url or "" # Store for OpenRouter detection
# Store effective base URL for feature detection (prompt caching, reasoning, etc.)
# When no base_url is provided, the client defaults to OpenRouter, so reflect that here.
self.base_url = base_url or "https://openrouter.ai/api/v1"
self.tool_progress_callback = tool_progress_callback
self._last_reported_tool = None # Track for "new tool" mode
@@ -668,6 +1101,14 @@ class AIAgent:
self.reasoning_config = reasoning_config # None = use default (xhigh for OpenRouter)
self.prefill_messages = prefill_messages or [] # Prefilled conversation turns
# Anthropic prompt caching: auto-enabled for Claude models via OpenRouter.
# Reduces input costs by ~75% on multi-turn conversations by caching the
# conversation prefix. Uses system_and_3 strategy (4 breakpoints).
is_openrouter = "openrouter" in self.base_url.lower()
is_claude = "claude" in self.model.lower()
self._use_prompt_caching = is_openrouter and is_claude
self._cache_ttl = "5m" # Default 5-minute TTL (1.25x write cost)
# Configure logging
if self.verbose_logging:
logging.basicConfig(
@@ -773,6 +1214,10 @@ class AIAgent:
prompt_preview = self.ephemeral_system_prompt[:60] + "..." if len(self.ephemeral_system_prompt) > 60 else self.ephemeral_system_prompt
print(f"🔒 Ephemeral system prompt: '{prompt_preview}' (not saved to trajectories)")
# Show prompt caching status
if self._use_prompt_caching and not self.quiet_mode:
print(f"💾 Prompt caching: ENABLED (Claude via OpenRouter, {self._cache_ttl} TTL)")
# Session logging setup - auto-save conversation trajectories for debugging
self.session_start = datetime.now()
if session_id:
@@ -951,10 +1396,6 @@ class AIAgent:
return f"{face} 🎨 creating '{prompt}'... {time_str}"
# Skills - use large pool for variety
elif tool_name == "skills_categories":
face = random.choice(self.KAWAII_SKILL)
return f"{face} 📚 listing categories... {time_str}"
elif tool_name == "skills_list":
category = args.get("category", "skills")
face = random.choice(self.KAWAII_SKILL)
@@ -965,19 +1406,65 @@ class AIAgent:
face = random.choice(self.KAWAII_SKILL)
return f"{face} 📖 loading {name}... {time_str}"
# File tools
elif tool_name == "read_file":
path = args.get("path", "file")
if len(path) > 30:
path = "..." + path[-27:]
face = random.choice(self.KAWAII_READ)
return f"{face} 📖 reading \"{path}\" {time_str}"
elif tool_name == "write_file":
path = args.get("path", "file")
if len(path) > 30:
path = "..." + path[-27:]
face = random.choice(self.KAWAII_CREATE)
return f"{face} ✍️ writing \"{path}\" {time_str}"
elif tool_name == "patch":
path = args.get("path", "file")
if path and len(path) > 30:
path = "..." + path[-27:]
face = random.choice(self.KAWAII_CREATE)
return f"{face} 🔧 patching \"{path}\" {time_str}"
elif tool_name == "search":
pattern = args.get("pattern", "")
if len(pattern) > 25:
pattern = pattern[:22] + "..."
face = random.choice(self.KAWAII_SEARCH)
return f"{face} 🔎 searching \"{pattern}\" {time_str}"
# TTS
elif tool_name == "text_to_speech":
text = args.get("text", "")
if len(text) > 25:
text = text[:22] + "..."
face = random.choice(self.KAWAII_CREATE)
return f"{face} 🔊 speaking \"{text}\" {time_str}"
# Vision tools
elif tool_name == "vision_analyze":
question = args.get("question", "")
if len(question) > 25:
question = question[:22] + "..."
face = random.choice(self.KAWAII_BROWSER)
return f"{face} 👁️✨ analyzing image... {time_str}"
return f"{face} 👁️✨ analyzing \"{question}\" {time_str}"
# Mixture of agents
elif tool_name == "mixture_of_agents":
prompt = args.get("user_prompt", "")
if len(prompt) > 25:
prompt = prompt[:22] + "..."
face = random.choice(self.KAWAII_THINK)
return f"{face} 🧠💭 thinking REALLY hard... {time_str}"
return f"{face} 🧠💭 deep thinking \"{prompt}\" {time_str}"
# Default fallback - random generic kawaii
# Default fallback - random generic kawaii with primary arg preview
else:
face = random.choice(self.KAWAII_GENERIC)
preview = _build_tool_preview(tool_name, args)
if preview:
return f"{face}{tool_name}... \"{preview}\" {time_str}"
return f"{face}{tool_name}... {time_str}"
def _has_content_after_think_block(self, content: str) -> bool:
@@ -1446,6 +1933,9 @@ class AIAgent:
Call this from another thread (e.g., input handler, message receiver)
to gracefully stop the agent and process a new message.
Also signals long-running tool executions (e.g. terminal commands)
to terminate early, so the agent can respond immediately.
Args:
message: Optional new message that triggered the interrupt.
If provided, the agent will include this in its response context.
@@ -1462,6 +1952,8 @@ class AIAgent:
"""
self._interrupt_requested = True
self._interrupt_message = message
# Signal the terminal tool to kill any running subprocess immediately
_set_terminal_interrupt(True)
if not self.quiet_mode:
print(f"\n⚡ Interrupt requested" + (f": '{message[:40]}...'" if message and len(message) > 40 else f": '{message}'" if message else ""))
@@ -1469,6 +1961,7 @@ class AIAgent:
"""Clear any pending interrupt request."""
self._interrupt_requested = False
self._interrupt_message = None
_set_terminal_interrupt(False)
@property
def is_interrupted(self) -> bool:
@@ -1521,20 +2014,48 @@ class AIAgent:
if not self.quiet_mode:
print(f"💬 Starting conversation: '{user_message[:60]}{'...' if len(user_message) > 60 else ''}'")
# Determine which system prompt to use for API calls (ephemeral)
# Priority: explicit system_message > ephemeral_system_prompt > None
base_system_prompt = system_message if system_message is not None else self.ephemeral_system_prompt
# Auto-include skills guidance if skills tools are available
has_skills_tools = any(name in self.valid_tool_names for name in ['skills_list', 'skills_categories', 'skill_view'])
if has_skills_tools:
if base_system_prompt:
active_system_prompt = f"{base_system_prompt}\n\n{SKILLS_SYSTEM_PROMPT}"
else:
active_system_prompt = SKILLS_SYSTEM_PROMPT
else:
active_system_prompt = base_system_prompt
# ── Build the full system prompt ──
# Layers (in order):
# 1. Default agent identity (always present)
# 2. User / gateway system prompt (if provided)
# 3. Skills guidance (if skills tools are loaded)
# 4. Context files (SOUL.md, AGENTS.md, .cursorrules)
# 5. Current date & time
# 6. Platform-specific formatting hint
prompt_parts = [DEFAULT_AGENT_IDENTITY]
# Layer in the caller-supplied system prompt (explicit > ephemeral).
caller_prompt = system_message if system_message is not None else self.ephemeral_system_prompt
if caller_prompt:
prompt_parts.append(caller_prompt)
# Auto-include skills guidance if skills tools are available.
has_skills_tools = any(name in self.valid_tool_names for name in ['skills_list', 'skill_view'])
skills_prompt = build_skills_system_prompt() if has_skills_tools else ""
if skills_prompt:
prompt_parts.append(skills_prompt)
# Auto-include context files (SOUL.md, AGENTS.md, .cursorrules).
# Skipped for batch processing / data generation to avoid polluting trajectories.
if not self.skip_context_files:
context_files_prompt = build_context_files_prompt()
if context_files_prompt:
prompt_parts.append(context_files_prompt)
# Current local date and time so the model is never confused about
# what day/time it is (LLM training cutoffs can otherwise mislead it).
now = datetime.now()
prompt_parts.append(
f"Current local date and time: {now.strftime('%A, %B %d, %Y %I:%M %p')}"
)
# Platform-specific formatting hint (no markdown on WhatsApp, etc.).
platform_key = (self.platform or "").lower().strip()
if platform_key in PLATFORM_HINTS:
prompt_parts.append(PLATFORM_HINTS[platform_key])
active_system_prompt = "\n\n".join(prompt_parts)
# Main conversation loop
api_call_count = 0
final_response = None
@@ -1582,6 +2103,13 @@ class AIAgent:
# Insert system message at the beginning
api_messages = [{"role": "system", "content": active_system_prompt}] + api_messages
# Apply Anthropic prompt caching for Claude models via OpenRouter.
# Auto-detected: if model name contains "claude" and base_url is OpenRouter,
# inject cache_control breakpoints (system + last 3 messages) to reduce
# input token costs by ~75% on multi-turn conversations.
if self._use_prompt_caching:
api_messages = apply_anthropic_cache_control(api_messages, cache_ttl=self._cache_ttl)
# Calculate approximate request size for logging
total_chars = sum(len(str(msg)) for msg in api_messages)
approx_tokens = total_chars // 4 # Rough estimate: 4 chars per token
@@ -1811,6 +2339,16 @@ class AIAgent:
if self.verbose_logging:
logging.debug(f"Token usage: prompt={usage_dict['prompt_tokens']:,}, completion={usage_dict['completion_tokens']:,}, total={usage_dict['total_tokens']:,}")
# Log cache hit stats when prompt caching is active
if self._use_prompt_caching:
details = getattr(response.usage, 'prompt_tokens_details', None)
cached = getattr(details, 'cached_tokens', 0) or 0 if details else 0
written = getattr(details, 'cache_write_tokens', 0) or 0 if details else 0
prompt = usage_dict["prompt_tokens"]
hit_pct = (cached / prompt * 100) if prompt > 0 else 0
if not self.quiet_mode:
print(f"{self.log_prefix} 💾 Cache: {cached:,}/{prompt:,} tokens ({hit_pct:.0f}% hit, {written:,} written)")
break # Success, exit retry loop
@@ -2124,12 +2662,8 @@ class AIAgent:
# Fire progress callback if registered (for messaging platforms)
if self.tool_progress_callback:
try:
# Build preview for terminal commands
if function_name == "terminal":
cmd = function_args.get("command", "")
preview = cmd[:50] + "..." if len(cmd) > 50 else cmd
else:
preview = None
# Build a short preview of the primary argument
preview = _build_tool_preview(function_name, function_args)
self.tool_progress_callback(function_name, preview)
except Exception as cb_err:
logging.debug(f"Tool progress callback error: {cb_err}")
@@ -2151,7 +2685,6 @@ class AIAgent:
'image_generate': ('sparkle', ['🎨', '', '🖼️', '🌟']),
'skill_view': ('star', ['📚', '📖', '🎓', '']),
'skills_list': ('pulse', ['📋', '📝', '📑', '📜']),
'skills_categories': ('pulse', ['📂', '🗂️', '📁', '🏷️']),
'moa_query': ('brain', ['🧠', '💭', '🤔', '💡']),
'analyze_image': ('sparkle', ['👁️', '🔍', '📷', '']),
}
@@ -2189,6 +2722,21 @@ class AIAgent:
response_preview = function_result[:self.log_prefix_chars] + "..." if len(function_result) > self.log_prefix_chars else function_result
print(f" ✅ Tool {i} completed in {tool_duration:.2f}s - {response_preview}")
# Check for interrupt between tool calls - skip remaining
# tools so the agent can respond to the user immediately
if self._interrupt_requested and i < len(assistant_message.tool_calls):
remaining = len(assistant_message.tool_calls) - i
print(f"{self.log_prefix}⚡ Interrupt: skipping {remaining} remaining tool call(s)")
# Add placeholder results for skipped tool calls so the
# message sequence stays valid (assistant tool_calls need matching tool results)
for skipped_tc in assistant_message.tool_calls[i:]:
messages.append({
"role": "tool",
"content": "[Tool execution skipped - user sent a new message]",
"tool_call_id": skipped_tc.id
})
break
# Delay between tool calls
if self.tool_delay > 0 and i < len(assistant_message.tool_calls):
time.sleep(self.tool_delay)
+47
View File
@@ -262,6 +262,25 @@ function Test-Ripgrep {
return $true # Don't fail - ripgrep is optional
}
function Test-Ffmpeg {
Write-Info "Checking ffmpeg (optional, for TTS voice messages)..."
if (Get-Command ffmpeg -ErrorAction SilentlyContinue) {
$version = ffmpeg -version 2>&1 | Select-Object -First 1
Write-Success "ffmpeg found"
$script:HasFfmpeg = $true
return $true
}
Write-Warn "ffmpeg not found (TTS voice bubbles on Telegram will send as audio files instead)"
Write-Info " Install with: winget install ffmpeg"
Write-Info " Or: choco install ffmpeg"
Write-Info " Or download from: https://ffmpeg.org/download.html"
$script:HasFfmpeg = $false
return $true # Don't fail - ffmpeg is optional
}
# ============================================================================
# Installation
# ============================================================================
@@ -424,6 +443,10 @@ function Copy-ConfigTemplates {
New-Item -ItemType Directory -Force -Path "$HermesHome\cron" | Out-Null
New-Item -ItemType Directory -Force -Path "$HermesHome\sessions" | Out-Null
New-Item -ItemType Directory -Force -Path "$HermesHome\logs" | Out-Null
New-Item -ItemType Directory -Force -Path "$HermesHome\pairing" | Out-Null
New-Item -ItemType Directory -Force -Path "$HermesHome\hooks" | Out-Null
New-Item -ItemType Directory -Force -Path "$HermesHome\image_cache" | Out-Null
New-Item -ItemType Directory -Force -Path "$HermesHome\audio_cache" | Out-Null
# Create .env
$envPath = "$HermesHome\.env"
@@ -452,6 +475,29 @@ function Copy-ConfigTemplates {
Write-Info "~/.hermes/config.yaml already exists, keeping it"
}
# Create SOUL.md if it doesn't exist (global persona file)
$soulPath = "$HermesHome\SOUL.md"
if (-not (Test-Path $soulPath)) {
@"
# Hermes Agent Persona
<!--
This file defines the agent's personality and tone.
The agent will embody whatever you write here.
Edit this to customize how Hermes communicates with you.
Examples:
- "You are a warm, playful assistant who uses kaomoji occasionally."
- "You are a concise technical expert. No fluff, just facts."
- "You speak like a friendly coworker who happens to know everything."
This file is loaded fresh each message -- no restart needed.
Delete the contents (or this file) to use the default personality.
-->
"@ | Set-Content -Path $soulPath -Encoding UTF8
Write-Success "Created ~/.hermes/SOUL.md (edit to customize personality)"
}
Write-Success "Configuration directory ready: ~/.hermes/"
}
@@ -567,6 +613,7 @@ function Main {
if (-not (Test-Git)) { exit 1 }
Test-Node # Optional, doesn't fail
Test-Ripgrep # Optional, doesn't fail
Test-Ffmpeg # Optional, doesn't fail
Install-Repository
Install-Venv
+67 -3
View File
@@ -413,6 +413,49 @@ check_ripgrep() {
# Don't exit - ripgrep is optional (grep fallback exists)
}
check_ffmpeg() {
log_info "Checking ffmpeg (optional, for TTS voice messages)..."
if command -v ffmpeg &> /dev/null; then
local ffmpeg_version=$(ffmpeg -version 2>/dev/null | head -1 | awk '{print $3}')
log_success "ffmpeg found: $ffmpeg_version"
HAS_FFMPEG=true
return
fi
log_warn "ffmpeg not found"
log_info "ffmpeg is needed for Telegram voice bubbles when using the default Edge TTS provider."
log_info "Without it, Edge TTS audio is sent as a file instead of a voice bubble."
log_info "(OpenAI and ElevenLabs TTS produce Opus natively and don't need ffmpeg.)"
log_info ""
log_info "To install ffmpeg:"
case "$OS" in
linux)
case "$DISTRO" in
ubuntu|debian)
log_info " sudo apt install ffmpeg"
;;
fedora)
log_info " sudo dnf install ffmpeg"
;;
arch)
log_info " sudo pacman -S ffmpeg"
;;
*)
log_info " https://ffmpeg.org/download.html"
;;
esac
;;
macos)
log_info " brew install ffmpeg"
;;
esac
HAS_FFMPEG=false
# Don't exit - ffmpeg is optional
}
# ============================================================================
# Installation
# ============================================================================
@@ -571,9 +614,7 @@ copy_config_templates() {
log_info "Setting up configuration files..."
# Create ~/.hermes directory structure (config at top level, code in subdir)
mkdir -p "$HERMES_HOME/cron"
mkdir -p "$HERMES_HOME/sessions"
mkdir -p "$HERMES_HOME/logs"
mkdir -p "$HERMES_HOME"/{cron,sessions,logs,pairing,hooks,image_cache,audio_cache}
# Create .env at ~/.hermes/.env (top level, easy to find)
if [ ! -f "$HERMES_HOME/.env" ]; then
@@ -598,6 +639,28 @@ copy_config_templates() {
log_info "~/.hermes/config.yaml already exists, keeping it"
fi
# Create SOUL.md if it doesn't exist (global persona file)
if [ ! -f "$HERMES_HOME/SOUL.md" ]; then
cat > "$HERMES_HOME/SOUL.md" << 'SOUL_EOF'
# Hermes Agent Persona
<!--
This file defines the agent's personality and tone.
The agent will embody whatever you write here.
Edit this to customize how Hermes communicates with you.
Examples:
- "You are a warm, playful assistant who uses kaomoji occasionally."
- "You are a concise technical expert. No fluff, just facts."
- "You speak like a friendly coworker who happens to know everything."
This file is loaded fresh each message -- no restart needed.
Delete the contents (or this file) to use the default personality.
-->
SOUL_EOF
log_success "Created ~/.hermes/SOUL.md (edit to customize personality)"
fi
log_success "Configuration directory ready: ~/.hermes/"
}
@@ -707,6 +770,7 @@ main() {
check_git
check_node
check_ripgrep
check_ffmpeg
clone_repo
setup_venv
+34
View File
@@ -0,0 +1,34 @@
#!/bin/bash
# Kill all running Modal apps (sandboxes, deployments, etc.)
#
# Usage:
# bash scripts/kill_modal.sh # Stop swe-rex (the sandbox app)
# bash scripts/kill_modal.sh --all # Stop ALL Modal apps
set -uo pipefail
echo "Fetching Modal app list..."
APP_LIST=$(modal app list 2>/dev/null)
if [[ "${1:-}" == "--all" ]]; then
echo "Stopping ALL Modal apps..."
echo "$APP_LIST" | grep -oE 'ap-[A-Za-z0-9]+' | sort -u | while read app_id; do
echo " Stopping $app_id"
modal app stop "$app_id" 2>/dev/null || true
done
else
echo "Stopping swe-rex sandboxes..."
APPS=$(echo "$APP_LIST" | grep 'swe-rex' | grep -oE 'ap-[A-Za-z0-9]+' || true)
if [[ -z "$APPS" ]]; then
echo " No swe-rex apps found."
else
echo "$APPS" | while read app_id; do
echo " Stopping $app_id"
modal app stop "$app_id" 2>/dev/null || true
done
fi
fi
echo ""
echo "Current swe-rex status:"
modal app list 2>/dev/null | grep -E 'State|swe-rex' || echo " (none)"
+3
View File
@@ -0,0 +1,3 @@
---
description: Diagram creation skills for generating visual diagrams, flowcharts, architecture diagrams, and illustrations using tools like Excalidraw.
---
+191
View File
@@ -0,0 +1,191 @@
---
name: excalidraw
description: Create hand-drawn style diagrams using Excalidraw JSON format. Generate .excalidraw files for architecture diagrams, flowcharts, sequence diagrams, concept maps, and more. Files can be opened at excalidraw.com or uploaded for shareable links.
version: 1.0.0
author: Hermes Agent
license: MIT
tags: [Excalidraw, Diagrams, Flowcharts, Architecture, Visualization, JSON]
dependencies: []
related_skills: []
---
# Excalidraw Diagram Skill
Create diagrams by writing standard Excalidraw element JSON and saving as `.excalidraw` files. These files can be drag-and-dropped onto [excalidraw.com](https://excalidraw.com) for viewing and editing. No accounts, no API keys, no rendering libraries -- just JSON.
## Workflow
1. **Load this skill** (you already did)
2. **Write the elements JSON** -- an array of Excalidraw element objects
3. **Save the file** using `write_file` to create a `.excalidraw` file
4. **Optionally upload** for a shareable link using `scripts/upload.py` via `terminal`
### Saving a Diagram
Wrap your elements array in the standard `.excalidraw` envelope and save with `write_file`:
```json
{
"type": "excalidraw",
"version": 2,
"source": "hermes-agent",
"elements": [ ...your elements array here... ],
"appState": {
"viewBackgroundColor": "#ffffff"
}
}
```
Save to any path, e.g. `~/diagrams/my_diagram.excalidraw`.
### Uploading for a Shareable Link
Run the upload script (located in this skill's `scripts/` directory) via terminal:
```bash
python skills/diagramming/excalidraw/scripts/upload.py ~/diagrams/my_diagram.excalidraw
```
This uploads to excalidraw.com (no account needed) and prints a shareable URL. Requires the `cryptography` pip package (`pip install cryptography`).
---
## Element Format Reference
### Required Fields (all elements)
`type`, `id` (unique string), `x`, `y`, `width`, `height`
### Defaults (skip these -- they're applied automatically)
- `strokeColor`: `"#1e1e1e"`
- `backgroundColor`: `"transparent"`
- `fillStyle`: `"solid"`
- `strokeWidth`: `2`
- `roughness`: `1` (hand-drawn look)
- `opacity`: `100`
Canvas background is white.
### Element Types
**Rectangle**:
```json
{ "type": "rectangle", "id": "r1", "x": 100, "y": 100, "width": 200, "height": 100 }
```
- `roundness: { "type": 3 }` for rounded corners
- `backgroundColor: "#a5d8ff"`, `fillStyle: "solid"` for filled
**Ellipse**:
```json
{ "type": "ellipse", "id": "e1", "x": 100, "y": 100, "width": 150, "height": 150 }
```
**Diamond**:
```json
{ "type": "diamond", "id": "d1", "x": 100, "y": 100, "width": 150, "height": 150 }
```
**Labeled shape (container binding)** -- create a text element bound to the shape:
> **WARNING:** Do NOT use `"label": { "text": "..." }` on shapes. This is NOT a valid
> Excalidraw property and will be silently ignored, producing blank shapes. You MUST
> use the container binding approach below.
The shape needs `boundElements` listing the text, and the text needs `containerId` pointing back:
```json
{ "type": "rectangle", "id": "r1", "x": 100, "y": 100, "width": 200, "height": 80,
"roundness": { "type": 3 }, "backgroundColor": "#a5d8ff", "fillStyle": "solid",
"boundElements": [{ "id": "t_r1", "type": "text" }] },
{ "type": "text", "id": "t_r1", "x": 105, "y": 110, "width": 190, "height": 25,
"text": "Hello", "fontSize": 20, "fontFamily": 1, "strokeColor": "#1e1e1e",
"textAlign": "center", "verticalAlign": "middle",
"containerId": "r1", "originalText": "Hello", "autoResize": true }
```
- Works on rectangle, ellipse, diamond
- Text is auto-centered by Excalidraw when `containerId` is set
- The text `x`/`y`/`width`/`height` are approximate -- Excalidraw recalculates them on load
- `originalText` should match `text`
- Always include `fontFamily: 1` (Virgil/hand-drawn font)
**Labeled arrow** -- same container binding approach:
```json
{ "type": "arrow", "id": "a1", "x": 300, "y": 150, "width": 200, "height": 0,
"points": [[0,0],[200,0]], "endArrowhead": "arrow",
"boundElements": [{ "id": "t_a1", "type": "text" }] },
{ "type": "text", "id": "t_a1", "x": 370, "y": 130, "width": 60, "height": 20,
"text": "connects", "fontSize": 16, "fontFamily": 1, "strokeColor": "#1e1e1e",
"textAlign": "center", "verticalAlign": "middle",
"containerId": "a1", "originalText": "connects", "autoResize": true }
```
**Standalone text** (titles and annotations only -- no container):
```json
{ "type": "text", "id": "t1", "x": 150, "y": 138, "text": "Hello", "fontSize": 20,
"fontFamily": 1, "strokeColor": "#1e1e1e", "originalText": "Hello", "autoResize": true }
```
- `x` is the LEFT edge. To center at position `cx`: `x = cx - (text.length * fontSize * 0.5) / 2`
- Do NOT rely on `textAlign` or `width` for positioning
**Arrow**:
```json
{ "type": "arrow", "id": "a1", "x": 300, "y": 150, "width": 200, "height": 0,
"points": [[0,0],[200,0]], "endArrowhead": "arrow" }
```
- `points`: `[dx, dy]` offsets from element `x`, `y`
- `endArrowhead`: `null` | `"arrow"` | `"bar"` | `"dot"` | `"triangle"`
- `strokeStyle`: `"solid"` (default) | `"dashed"` | `"dotted"`
### Arrow Bindings (connect arrows to shapes)
```json
{
"type": "arrow", "id": "a1", "x": 300, "y": 150, "width": 150, "height": 0,
"points": [[0,0],[150,0]], "endArrowhead": "arrow",
"startBinding": { "elementId": "r1", "fixedPoint": [1, 0.5] },
"endBinding": { "elementId": "r2", "fixedPoint": [0, 0.5] }
}
```
`fixedPoint` coordinates: `top=[0.5,0]`, `bottom=[0.5,1]`, `left=[0,0.5]`, `right=[1,0.5]`
### Drawing Order (z-order)
- Array order = z-order (first = back, last = front)
- Emit progressively: background zones → shape → its bound text → its arrows → next shape
- BAD: all rectangles, then all texts, then all arrows
- GOOD: bg_zone → shape1 → text_for_shape1 → arrow1 → arrow_label_text → shape2 → text_for_shape2 → ...
- Always place the bound text element immediately after its container shape
### Sizing Guidelines
**Font sizes:**
- Minimum `fontSize`: **16** for body text, labels, descriptions
- Minimum `fontSize`: **20** for titles and headings
- Minimum `fontSize`: **14** for secondary annotations only (sparingly)
- NEVER use `fontSize` below 14
**Element sizes:**
- Minimum shape size: 120x60 for labeled rectangles/ellipses
- Leave 20-30px gaps between elements minimum
- Prefer fewer, larger elements over many tiny ones
### Color Palette
See `references/colors.md` for full color tables. Quick reference:
| Use | Fill Color | Hex |
|-----|-----------|-----|
| Primary / Input | Light Blue | `#a5d8ff` |
| Success / Output | Light Green | `#b2f2bb` |
| Warning / External | Light Orange | `#ffd8a8` |
| Processing / Special | Light Purple | `#d0bfff` |
| Error / Critical | Light Red | `#ffc9c9` |
| Notes / Decisions | Light Yellow | `#fff3bf` |
| Storage / Data | Light Teal | `#c3fae8` |
### Tips
- Use the color palette consistently across the diagram
- **Text contrast is CRITICAL** -- never use light gray on white backgrounds. Minimum text color on white: `#757575`
- Do NOT use emoji in text -- they don't render in Excalidraw's font
- For dark mode diagrams, see `references/dark-mode.md`
- For larger examples, see `references/examples.md`
@@ -0,0 +1,44 @@
# Excalidraw Color Palette
Use these colors consistently across diagrams.
## Primary Colors (for strokes, arrows, and accents)
| Name | Hex | Use |
|------|-----|-----|
| Blue | `#4a9eed` | Primary actions, links, data series 1 |
| Amber | `#f59e0b` | Warnings, highlights, data series 2 |
| Green | `#22c55e` | Success, positive, data series 3 |
| Red | `#ef4444` | Errors, negative, data series 4 |
| Purple | `#8b5cf6` | Accents, special items, data series 5 |
| Pink | `#ec4899` | Decorative, data series 6 |
| Cyan | `#06b6d4` | Info, secondary, data series 7 |
| Lime | `#84cc16` | Extra, data series 8 |
## Pastel Fills (for shape backgrounds)
| Color | Hex | Good For |
|-------|-----|----------|
| Light Blue | `#a5d8ff` | Input, sources, primary nodes |
| Light Green | `#b2f2bb` | Success, output, completed |
| Light Orange | `#ffd8a8` | Warning, pending, external |
| Light Purple | `#d0bfff` | Processing, middleware, special |
| Light Red | `#ffc9c9` | Error, critical, alerts |
| Light Yellow | `#fff3bf` | Notes, decisions, planning |
| Light Teal | `#c3fae8` | Storage, data, memory |
| Light Pink | `#eebefa` | Analytics, metrics |
## Background Zones (use with opacity: 30-35 for layered diagrams)
| Color | Hex | Good For |
|-------|-----|----------|
| Blue zone | `#dbe4ff` | UI / frontend layer |
| Purple zone | `#e5dbff` | Logic / agent layer |
| Green zone | `#d3f9d8` | Data / tool layer |
## Text Contrast Rules
- **On white backgrounds**: minimum text color is `#757575`. Default `#1e1e1e` is best.
- **Colored text on light fills**: use dark variants (`#15803d` not `#22c55e`, `#2563eb` not `#4a9eed`)
- **White text**: only on dark backgrounds (`#9a5030` not `#c4795b`)
- **Never**: light gray (`#b0b0b0`, `#999`) on white -- unreadable
@@ -0,0 +1,68 @@
# Excalidraw Dark Mode Diagrams
To create a dark-themed diagram, use a massive dark background rectangle as the **first element** in the array. Make it large enough to cover any viewport:
```json
{
"type": "rectangle", "id": "darkbg",
"x": -4000, "y": -3000, "width": 10000, "height": 7500,
"backgroundColor": "#1e1e2e", "fillStyle": "solid",
"strokeColor": "transparent", "strokeWidth": 0
}
```
Then use the following color palettes for elements on the dark background.
## Text Colors (on dark)
| Color | Hex | Use |
|-------|-----|-----|
| White | `#e5e5e5` | Primary text, titles |
| Muted | `#a0a0a0` | Secondary text, annotations |
| NEVER | `#555` or darker | Invisible on dark bg! |
## Shape Fills (on dark)
| Color | Hex | Good For |
|-------|-----|----------|
| Dark Blue | `#1e3a5f` | Primary nodes |
| Dark Green | `#1a4d2e` | Success, output |
| Dark Purple | `#2d1b69` | Processing, special |
| Dark Orange | `#5c3d1a` | Warning, pending |
| Dark Red | `#5c1a1a` | Error, critical |
| Dark Teal | `#1a4d4d` | Storage, data |
## Stroke and Arrow Colors (on dark)
Use the standard Primary Colors from the main color palette -- they're bright enough on dark backgrounds:
- Blue `#4a9eed`, Amber `#f59e0b`, Green `#22c55e`, Red `#ef4444`, Purple `#8b5cf6`
For subtle shape borders, use `#555555`.
## Example: Dark mode labeled rectangle
Use container binding (NOT the `"label"` property, which doesn't work). On dark backgrounds, set text `strokeColor` to `"#e5e5e5"` so it's visible:
```json
[
{
"type": "rectangle", "id": "r1",
"x": 100, "y": 100, "width": 200, "height": 80,
"backgroundColor": "#1e3a5f", "fillStyle": "solid",
"strokeColor": "#4a9eed", "strokeWidth": 2,
"roundness": { "type": 3 },
"boundElements": [{ "id": "t_r1", "type": "text" }]
},
{
"type": "text", "id": "t_r1",
"x": 105, "y": 120, "width": 190, "height": 25,
"text": "Dark Node", "fontSize": 20, "fontFamily": 1,
"strokeColor": "#e5e5e5",
"textAlign": "center", "verticalAlign": "middle",
"containerId": "r1", "originalText": "Dark Node", "autoResize": true
}
]
```
Note: For standalone text elements on dark backgrounds, always set `"strokeColor": "#e5e5e5"` explicitly. The default `#1e1e1e` is invisible on dark.
@@ -0,0 +1,141 @@
# Excalidraw Diagram Examples
Complete, copy-pasteable examples. Wrap each in the `.excalidraw` envelope before saving:
```json
{
"type": "excalidraw",
"version": 2,
"source": "hermes-agent",
"elements": [ ...elements from examples below... ],
"appState": { "viewBackgroundColor": "#ffffff" }
}
```
> **IMPORTANT:** All text labels on shapes and arrows use container binding (`containerId` + `boundElements`).
> Do NOT use the non-existent `"label"` property -- it will be silently ignored, producing blank shapes.
---
## Example 1: Two Connected Labeled Boxes
A minimal flowchart with two boxes and an arrow between them.
```json
[
{ "type": "text", "id": "title", "x": 280, "y": 30, "text": "Simple Flow", "fontSize": 28, "fontFamily": 1, "strokeColor": "#1e1e1e", "originalText": "Simple Flow", "autoResize": true },
{ "type": "rectangle", "id": "b1", "x": 100, "y": 100, "width": 200, "height": 100, "roundness": { "type": 3 }, "backgroundColor": "#a5d8ff", "fillStyle": "solid", "boundElements": [{ "id": "t_b1", "type": "text" }, { "id": "a1", "type": "arrow" }] },
{ "type": "text", "id": "t_b1", "x": 105, "y": 130, "width": 190, "height": 25, "text": "Start", "fontSize": 20, "fontFamily": 1, "strokeColor": "#1e1e1e", "textAlign": "center", "verticalAlign": "middle", "containerId": "b1", "originalText": "Start", "autoResize": true },
{ "type": "rectangle", "id": "b2", "x": 450, "y": 100, "width": 200, "height": 100, "roundness": { "type": 3 }, "backgroundColor": "#b2f2bb", "fillStyle": "solid", "boundElements": [{ "id": "t_b2", "type": "text" }, { "id": "a1", "type": "arrow" }] },
{ "type": "text", "id": "t_b2", "x": 455, "y": 130, "width": 190, "height": 25, "text": "End", "fontSize": 20, "fontFamily": 1, "strokeColor": "#1e1e1e", "textAlign": "center", "verticalAlign": "middle", "containerId": "b2", "originalText": "End", "autoResize": true },
{ "type": "arrow", "id": "a1", "x": 300, "y": 150, "width": 150, "height": 0, "points": [[0,0],[150,0]], "endArrowhead": "arrow", "startBinding": { "elementId": "b1", "fixedPoint": [1, 0.5] }, "endBinding": { "elementId": "b2", "fixedPoint": [0, 0.5] } }
]
```
---
## Example 2: Photosynthesis Process Diagram
A larger diagram with background zones, multiple nodes, and directional arrows showing inputs/outputs.
```json
[
{"type":"text","id":"ti","x":280,"y":10,"text":"Photosynthesis","fontSize":28,"fontFamily":1,"strokeColor":"#1e1e1e","originalText":"Photosynthesis","autoResize":true},
{"type":"text","id":"fo","x":245,"y":48,"text":"6CO2 + 6H2O --> C6H12O6 + 6O2","fontSize":16,"fontFamily":1,"strokeColor":"#757575","originalText":"6CO2 + 6H2O --> C6H12O6 + 6O2","autoResize":true},
{"type":"rectangle","id":"lf","x":150,"y":90,"width":520,"height":380,"backgroundColor":"#d3f9d8","fillStyle":"solid","roundness":{"type":3},"strokeColor":"#22c55e","strokeWidth":1,"opacity":35},
{"type":"text","id":"lfl","x":170,"y":96,"text":"Inside the Leaf","fontSize":16,"fontFamily":1,"strokeColor":"#15803d","originalText":"Inside the Leaf","autoResize":true},
{"type":"rectangle","id":"lr","x":190,"y":190,"width":160,"height":70,"backgroundColor":"#fff3bf","fillStyle":"solid","roundness":{"type":3},"strokeColor":"#f59e0b","boundElements":[{"id":"t_lr","type":"text"},{"id":"a1","type":"arrow"},{"id":"a2","type":"arrow"},{"id":"a3","type":"arrow"},{"id":"a5","type":"arrow"}]},
{"type":"text","id":"t_lr","x":195,"y":205,"width":150,"height":20,"text":"Light Reactions","fontSize":16,"fontFamily":1,"strokeColor":"#1e1e1e","textAlign":"center","verticalAlign":"middle","containerId":"lr","originalText":"Light Reactions","autoResize":true},
{"type":"arrow","id":"a1","x":350,"y":225,"width":120,"height":0,"points":[[0,0],[120,0]],"strokeColor":"#1e1e1e","strokeWidth":2,"endArrowhead":"arrow","boundElements":[{"id":"t_a1","type":"text"}]},
{"type":"text","id":"t_a1","x":390,"y":205,"width":40,"height":20,"text":"ATP","fontSize":14,"fontFamily":1,"strokeColor":"#1e1e1e","textAlign":"center","verticalAlign":"middle","containerId":"a1","originalText":"ATP","autoResize":true},
{"type":"rectangle","id":"cc","x":470,"y":190,"width":160,"height":70,"backgroundColor":"#d0bfff","fillStyle":"solid","roundness":{"type":3},"strokeColor":"#8b5cf6","boundElements":[{"id":"t_cc","type":"text"},{"id":"a1","type":"arrow"},{"id":"a4","type":"arrow"},{"id":"a6","type":"arrow"}]},
{"type":"text","id":"t_cc","x":475,"y":205,"width":150,"height":20,"text":"Calvin Cycle","fontSize":16,"fontFamily":1,"strokeColor":"#1e1e1e","textAlign":"center","verticalAlign":"middle","containerId":"cc","originalText":"Calvin Cycle","autoResize":true},
{"type":"rectangle","id":"sl","x":10,"y":200,"width":120,"height":50,"backgroundColor":"#fff3bf","fillStyle":"solid","roundness":{"type":3},"strokeColor":"#f59e0b","boundElements":[{"id":"t_sl","type":"text"},{"id":"a2","type":"arrow"}]},
{"type":"text","id":"t_sl","x":15,"y":210,"width":110,"height":20,"text":"Sunlight","fontSize":16,"fontFamily":1,"strokeColor":"#1e1e1e","textAlign":"center","verticalAlign":"middle","containerId":"sl","originalText":"Sunlight","autoResize":true},
{"type":"arrow","id":"a2","x":130,"y":225,"width":60,"height":0,"points":[[0,0],[60,0]],"strokeColor":"#f59e0b","strokeWidth":2,"endArrowhead":"arrow"},
{"type":"rectangle","id":"wa","x":200,"y":360,"width":140,"height":50,"backgroundColor":"#a5d8ff","fillStyle":"solid","roundness":{"type":3},"strokeColor":"#4a9eed","boundElements":[{"id":"t_wa","type":"text"},{"id":"a3","type":"arrow"}]},
{"type":"text","id":"t_wa","x":205,"y":370,"width":130,"height":20,"text":"Water (H2O)","fontSize":16,"fontFamily":1,"strokeColor":"#1e1e1e","textAlign":"center","verticalAlign":"middle","containerId":"wa","originalText":"Water (H2O)","autoResize":true},
{"type":"arrow","id":"a3","x":270,"y":360,"width":0,"height":-100,"points":[[0,0],[0,-100]],"strokeColor":"#4a9eed","strokeWidth":2,"endArrowhead":"arrow"},
{"type":"rectangle","id":"co","x":480,"y":360,"width":130,"height":50,"backgroundColor":"#ffd8a8","fillStyle":"solid","roundness":{"type":3},"strokeColor":"#f59e0b","boundElements":[{"id":"t_co","type":"text"},{"id":"a4","type":"arrow"}]},
{"type":"text","id":"t_co","x":485,"y":370,"width":120,"height":20,"text":"CO2","fontSize":16,"fontFamily":1,"strokeColor":"#1e1e1e","textAlign":"center","verticalAlign":"middle","containerId":"co","originalText":"CO2","autoResize":true},
{"type":"arrow","id":"a4","x":545,"y":360,"width":0,"height":-100,"points":[[0,0],[0,-100]],"strokeColor":"#f59e0b","strokeWidth":2,"endArrowhead":"arrow"},
{"type":"rectangle","id":"ox","x":540,"y":100,"width":100,"height":40,"backgroundColor":"#ffc9c9","fillStyle":"solid","roundness":{"type":3},"strokeColor":"#ef4444","boundElements":[{"id":"t_ox","type":"text"},{"id":"a5","type":"arrow"}]},
{"type":"text","id":"t_ox","x":545,"y":105,"width":90,"height":20,"text":"O2","fontSize":16,"fontFamily":1,"strokeColor":"#1e1e1e","textAlign":"center","verticalAlign":"middle","containerId":"ox","originalText":"O2","autoResize":true},
{"type":"arrow","id":"a5","x":310,"y":190,"width":230,"height":-50,"points":[[0,0],[230,-50]],"strokeColor":"#ef4444","strokeWidth":2,"endArrowhead":"arrow"},
{"type":"rectangle","id":"gl","x":690,"y":195,"width":120,"height":60,"backgroundColor":"#c3fae8","fillStyle":"solid","roundness":{"type":3},"strokeColor":"#22c55e","boundElements":[{"id":"t_gl","type":"text"},{"id":"a6","type":"arrow"}]},
{"type":"text","id":"t_gl","x":695,"y":210,"width":110,"height":25,"text":"Glucose","fontSize":18,"fontFamily":1,"strokeColor":"#1e1e1e","textAlign":"center","verticalAlign":"middle","containerId":"gl","originalText":"Glucose","autoResize":true},
{"type":"arrow","id":"a6","x":630,"y":225,"width":60,"height":0,"points":[[0,0],[60,0]],"strokeColor":"#22c55e","strokeWidth":2,"endArrowhead":"arrow"},
{"type":"ellipse","id":"sun","x":30,"y":110,"width":50,"height":50,"backgroundColor":"#fff3bf","fillStyle":"solid","strokeColor":"#f59e0b","strokeWidth":2},
{"type":"arrow","id":"r1","x":55,"y":108,"width":0,"height":-14,"points":[[0,0],[0,-14]],"strokeColor":"#f59e0b","strokeWidth":2,"endArrowhead":null,"startArrowhead":null},
{"type":"arrow","id":"r2","x":55,"y":162,"width":0,"height":14,"points":[[0,0],[0,14]],"strokeColor":"#f59e0b","strokeWidth":2,"endArrowhead":null,"startArrowhead":null},
{"type":"arrow","id":"r3","x":28,"y":135,"width":-14,"height":0,"points":[[0,0],[-14,0]],"strokeColor":"#f59e0b","strokeWidth":2,"endArrowhead":null,"startArrowhead":null},
{"type":"arrow","id":"r4","x":82,"y":135,"width":14,"height":0,"points":[[0,0],[14,0]],"strokeColor":"#f59e0b","strokeWidth":2,"endArrowhead":null,"startArrowhead":null}
]
```
---
## Example 3: Sequence Diagram (UML-style)
Demonstrates a sequence diagram with actors, dashed lifelines, and message arrows.
```json
[
{"type":"text","id":"title","x":200,"y":15,"text":"MCP Apps -- Sequence Flow","fontSize":24,"fontFamily":1,"strokeColor":"#1e1e1e","originalText":"MCP Apps -- Sequence Flow","autoResize":true},
{"type":"rectangle","id":"uHead","x":60,"y":60,"width":100,"height":40,"backgroundColor":"#a5d8ff","fillStyle":"solid","roundness":{"type":3},"strokeColor":"#4a9eed","strokeWidth":2,"boundElements":[{"id":"t_uHead","type":"text"}]},
{"type":"text","id":"t_uHead","x":65,"y":65,"width":90,"height":20,"text":"User","fontSize":16,"fontFamily":1,"strokeColor":"#1e1e1e","textAlign":"center","verticalAlign":"middle","containerId":"uHead","originalText":"User","autoResize":true},
{"type":"arrow","id":"uLine","x":110,"y":100,"width":0,"height":400,"points":[[0,0],[0,400]],"strokeColor":"#b0b0b0","strokeWidth":1,"strokeStyle":"dashed","endArrowhead":null},
{"type":"rectangle","id":"aHead","x":230,"y":60,"width":100,"height":40,"backgroundColor":"#d0bfff","fillStyle":"solid","roundness":{"type":3},"strokeColor":"#8b5cf6","strokeWidth":2,"boundElements":[{"id":"t_aHead","type":"text"}]},
{"type":"text","id":"t_aHead","x":235,"y":65,"width":90,"height":20,"text":"Agent","fontSize":16,"fontFamily":1,"strokeColor":"#1e1e1e","textAlign":"center","verticalAlign":"middle","containerId":"aHead","originalText":"Agent","autoResize":true},
{"type":"arrow","id":"aLine","x":280,"y":100,"width":0,"height":400,"points":[[0,0],[0,400]],"strokeColor":"#b0b0b0","strokeWidth":1,"strokeStyle":"dashed","endArrowhead":null},
{"type":"rectangle","id":"sHead","x":420,"y":60,"width":130,"height":40,"backgroundColor":"#ffd8a8","fillStyle":"solid","roundness":{"type":3},"strokeColor":"#f59e0b","strokeWidth":2,"boundElements":[{"id":"t_sHead","type":"text"}]},
{"type":"text","id":"t_sHead","x":425,"y":65,"width":120,"height":20,"text":"Server","fontSize":16,"fontFamily":1,"strokeColor":"#1e1e1e","textAlign":"center","verticalAlign":"middle","containerId":"sHead","originalText":"Server","autoResize":true},
{"type":"arrow","id":"sLine","x":485,"y":100,"width":0,"height":400,"points":[[0,0],[0,400]],"strokeColor":"#b0b0b0","strokeWidth":1,"strokeStyle":"dashed","endArrowhead":null},
{"type":"arrow","id":"m1","x":110,"y":150,"width":170,"height":0,"points":[[0,0],[170,0]],"strokeColor":"#1e1e1e","strokeWidth":2,"endArrowhead":"arrow","boundElements":[{"id":"t_m1","type":"text"}]},
{"type":"text","id":"t_m1","x":165,"y":130,"width":60,"height":20,"text":"request","fontSize":14,"fontFamily":1,"strokeColor":"#1e1e1e","textAlign":"center","verticalAlign":"middle","containerId":"m1","originalText":"request","autoResize":true},
{"type":"arrow","id":"m2","x":280,"y":200,"width":205,"height":0,"points":[[0,0],[205,0]],"strokeColor":"#8b5cf6","strokeWidth":2,"endArrowhead":"arrow","boundElements":[{"id":"t_m2","type":"text"}]},
{"type":"text","id":"t_m2","x":352,"y":180,"width":60,"height":20,"text":"tools/call","fontSize":14,"fontFamily":1,"strokeColor":"#1e1e1e","textAlign":"center","verticalAlign":"middle","containerId":"m2","originalText":"tools/call","autoResize":true},
{"type":"arrow","id":"m3","x":485,"y":260,"width":-205,"height":0,"points":[[0,0],[-205,0]],"strokeColor":"#f59e0b","strokeWidth":2,"endArrowhead":"arrow","strokeStyle":"dashed","boundElements":[{"id":"t_m3","type":"text"}]},
{"type":"text","id":"t_m3","x":352,"y":240,"width":60,"height":20,"text":"result","fontSize":14,"fontFamily":1,"strokeColor":"#1e1e1e","textAlign":"center","verticalAlign":"middle","containerId":"m3","originalText":"result","autoResize":true},
{"type":"arrow","id":"m4","x":280,"y":320,"width":-170,"height":0,"points":[[0,0],[-170,0]],"strokeColor":"#8b5cf6","strokeWidth":2,"endArrowhead":"arrow","strokeStyle":"dashed","boundElements":[{"id":"t_m4","type":"text"}]},
{"type":"text","id":"t_m4","x":165,"y":300,"width":60,"height":20,"text":"response","fontSize":14,"fontFamily":1,"strokeColor":"#1e1e1e","textAlign":"center","verticalAlign":"middle","containerId":"m4","originalText":"response","autoResize":true}
]
```
---
## Common Mistakes to Avoid
- **Do NOT use `"label"` property** -- this is the #1 mistake. It is NOT part of the Excalidraw file format and will be silently ignored, producing blank shapes with no visible text. Always use container binding (`containerId` + `boundElements`) as shown in the examples above.
- **Every bound text needs both sides linked** -- the shape needs `boundElements: [{"id": "t_xxx", "type": "text"}]` AND the text needs `containerId: "shape_id"`. If either is missing, the binding won't work.
- **Include `originalText` and `autoResize: true`** on all text elements -- Excalidraw uses these for proper text reflow.
- **Include `fontFamily: 1`** on all text elements -- without it, text may not render with the expected hand-drawn font.
- **Elements overlap when y-coordinates are close** -- always check that text, boxes, and labels don't stack on top of each other
- **Arrow labels need space** -- long labels like "ATP + NADPH" overflow short arrows. Keep labels short or make arrows wider
- **Center titles relative to the diagram** -- estimate total width and center the title text over it
- **Draw decorations LAST** -- cute illustrations (sun, stars, icons) should appear at the end of the array so they're drawn on top
@@ -0,0 +1,133 @@
#!/usr/bin/env python3
"""
Upload an .excalidraw file to excalidraw.com and print a shareable URL.
No account required. The diagram is encrypted client-side (AES-GCM) before
upload -- the encryption key is embedded in the URL fragment, so the server
never sees plaintext.
Requirements:
pip install cryptography
Usage:
python upload.py <path-to-file.excalidraw>
Example:
python upload.py ~/diagrams/architecture.excalidraw
# prints: https://excalidraw.com/#json=abc123,encryptionKeyHere
"""
import json
import os
import struct
import sys
import zlib
import base64
import urllib.request
try:
from cryptography.hazmat.primitives.ciphers.aead import AESGCM
except ImportError:
print("Error: 'cryptography' package is required for upload.")
print("Install it with: pip install cryptography")
sys.exit(1)
# Excalidraw public upload endpoint (no auth needed)
UPLOAD_URL = "https://json.excalidraw.com/api/v2/post/"
def concat_buffers(*buffers: bytes) -> bytes:
"""
Build the Excalidraw v2 concat-buffers binary format.
Layout: [version=1 (4B big-endian)] then for each buffer:
[length (4B big-endian)] [data bytes]
"""
parts = [struct.pack(">I", 1)] # version = 1
for buf in buffers:
parts.append(struct.pack(">I", len(buf)))
parts.append(buf)
return b"".join(parts)
def upload(excalidraw_json: str) -> str:
"""
Encrypt and upload Excalidraw JSON to excalidraw.com.
Args:
excalidraw_json: The full .excalidraw file content as a string.
Returns:
Shareable URL string.
"""
# 1. Inner payload: concat_buffers(file_metadata, data)
file_metadata = json.dumps({}).encode("utf-8")
data_bytes = excalidraw_json.encode("utf-8")
inner_payload = concat_buffers(file_metadata, data_bytes)
# 2. Compress with zlib
compressed = zlib.compress(inner_payload)
# 3. AES-GCM 128-bit encrypt
raw_key = os.urandom(16) # 128-bit key
iv = os.urandom(12) # 12-byte nonce
aesgcm = AESGCM(raw_key)
encrypted = aesgcm.encrypt(iv, compressed, None)
# 4. Encoding metadata
encoding_meta = json.dumps({
"version": 2,
"compression": "pako@1",
"encryption": "AES-GCM",
}).encode("utf-8")
# 5. Outer payload: concat_buffers(encoding_meta, iv, encrypted)
payload = concat_buffers(encoding_meta, iv, encrypted)
# 6. Upload
req = urllib.request.Request(UPLOAD_URL, data=payload, method="POST")
with urllib.request.urlopen(req, timeout=30) as resp:
if resp.status != 200:
raise RuntimeError(f"Upload failed with HTTP {resp.status}")
result = json.loads(resp.read().decode("utf-8"))
file_id = result.get("id")
if not file_id:
raise RuntimeError(f"Upload returned no file ID. Response: {result}")
# 7. Key as base64url (JWK 'k' format, no padding)
key_b64 = base64.urlsafe_b64encode(raw_key).rstrip(b"=").decode("ascii")
return f"https://excalidraw.com/#json={file_id},{key_b64}"
def main():
if len(sys.argv) < 2:
print("Usage: python upload.py <path-to-file.excalidraw>")
sys.exit(1)
file_path = sys.argv[1]
if not os.path.isfile(file_path):
print(f"Error: File not found: {file_path}")
sys.exit(1)
with open(file_path, "r", encoding="utf-8") as f:
content = f.read()
# Basic validation: should be valid JSON with an "elements" key
try:
doc = json.loads(content)
except json.JSONDecodeError as e:
print(f"Error: File is not valid JSON: {e}")
sys.exit(1)
if "elements" not in doc:
print("Warning: File does not contain an 'elements' key. Uploading anyway.")
url = upload(content)
print(url)
if __name__ == "__main__":
main()
+13 -2
View File
@@ -31,6 +31,8 @@ from .terminal_tool import (
cleanup_vm,
cleanup_all_environments,
get_active_environments_info,
register_task_env_overrides,
clear_task_env_overrides,
TERMINAL_TOOL_DESCRIPTION
)
@@ -57,7 +59,6 @@ from .image_generation_tool import (
)
from .skills_tool import (
skills_categories,
skills_list,
skill_view,
check_skills_requirements,
@@ -121,6 +122,12 @@ from .file_tools import (
clear_file_ops_cache,
)
# Text-to-speech tools (Edge TTS / ElevenLabs / OpenAI)
from .tts_tool import (
text_to_speech_tool,
check_tts_requirements,
)
# File tools have no external requirements - they use the terminal backend
def check_file_requirements():
"""File tools only require terminal backend to be available."""
@@ -139,6 +146,8 @@ __all__ = [
'cleanup_vm',
'cleanup_all_environments',
'get_active_environments_info',
'register_task_env_overrides',
'clear_task_env_overrides',
'TERMINAL_TOOL_DESCRIPTION',
# Terminal tools (Hecate/MorphCloud backend)
'terminal_hecate_tool',
@@ -154,7 +163,6 @@ __all__ = [
'image_generate_tool',
'check_image_generation_requirements',
# Skills tools
'skills_categories',
'skills_list',
'skill_view',
'check_skills_requirements',
@@ -205,5 +213,8 @@ __all__ = [
'get_file_tools',
'clear_file_ops_cache',
'check_file_requirements',
# Text-to-speech tools
'text_to_speech_tool',
'check_tts_requirements',
]
+33 -2
View File
@@ -51,6 +51,7 @@ import subprocess
import shutil
import sys
import asyncio
import tempfile
import threading
import time
import requests
@@ -644,17 +645,25 @@ def _find_agent_browser() -> str:
"""
Find the agent-browser CLI executable.
Checks in order: PATH, local node_modules/.bin/, npx fallback.
Returns:
Path to agent-browser executable
Raises:
FileNotFoundError: If agent-browser is not installed
"""
# Check if it's in PATH
# Check if it's in PATH (global install)
which_result = shutil.which("agent-browser")
if which_result:
return which_result
# Check local node_modules/.bin/ (npm install in repo root)
repo_root = Path(__file__).parent.parent
local_bin = repo_root / "node_modules" / ".bin" / "agent-browser"
if local_bin.exists():
return str(local_bin)
# Check common npx locations
npx_path = shutil.which("npx")
if npx_path:
@@ -662,6 +671,7 @@ def _find_agent_browser() -> str:
raise FileNotFoundError(
"agent-browser CLI not found. Install it with: npm install -g agent-browser\n"
"Or run 'npm install' in the repo root to install locally.\n"
"Or ensure npx is available in your PATH."
)
@@ -708,12 +718,26 @@ def _run_browser_command(
] + args
try:
# Give each task its own socket directory to prevent concurrency conflicts.
# Without this, parallel workers fight over the same default socket path,
# causing "Failed to create socket directory: Permission denied" errors.
task_socket_dir = os.path.join(
tempfile.gettempdir(),
f"agent-browser-{session_info['session_name']}"
)
os.makedirs(task_socket_dir, exist_ok=True)
browser_env = {
**os.environ,
"AGENT_BROWSER_SOCKET_DIR": task_socket_dir,
}
result = subprocess.run(
cmd_parts,
capture_output=True,
text=True,
timeout=timeout,
env={**os.environ}
env=browser_env,
)
# Parse JSON output
@@ -1487,6 +1511,13 @@ def cleanup_browser(task_id: Optional[str] = None) -> None:
except Exception as e:
print(f"[browser_tool] Exception during BrowserBase session close: {e}", file=sys.stderr)
# Clean up per-task socket directory
session_name = session_info.get("session_name", "")
if session_name:
socket_dir = os.path.join(tempfile.gettempdir(), f"agent-browser-{session_name}")
if os.path.exists(socket_dir):
shutil.rmtree(socket_dir, ignore_errors=True)
del _active_sessions[task_id]
if not os.getenv("HERMES_QUIET"):
print(f"[browser_tool] Removed task {task_id} from active sessions", file=sys.stderr)
+5 -2
View File
@@ -257,9 +257,12 @@ class ShellFileOperations(FileOperations):
cwd: Working directory (defaults to env's cwd or current directory)
"""
self.env = terminal_env
# Determine cwd from various possible sources
# Determine cwd from various possible sources.
# IMPORTANT: do NOT fall back to os.getcwd() -- that's the HOST's local
# path which doesn't exist inside container/cloud backends (modal, docker).
# If nothing provides a cwd, use "/" as a safe universal default.
self.cwd = cwd or getattr(terminal_env, 'cwd', None) or \
getattr(getattr(terminal_env, 'config', None), 'cwd', None) or os.getcwd()
getattr(getattr(terminal_env, 'config', None), 'cwd', None) or "/"
# Cache for command availability checks
self._command_cache: Dict[str, bool] = {}
+66 -58
View File
@@ -13,80 +13,88 @@ _file_ops_cache: dict = {}
def _get_file_ops(task_id: str = "default") -> ShellFileOperations:
"""Get or create ShellFileOperations for a terminal environment.
Respects the TERMINAL_ENV setting -- if the task_id doesn't have an
environment yet, creates one using the configured backend (local, docker,
modal, etc.) rather than always defaulting to local.
Thread-safe: uses the same per-task creation locks as terminal_tool to
prevent duplicate sandbox creation from concurrent tool calls.
"""
from tools.terminal_tool import (
_active_environments, _env_lock, _create_environment,
_get_env_config, _last_activity, _start_cleanup_thread,
_check_disk_usage_warning,
_creation_locks, _creation_locks_lock,
)
import time
# Fast path: check cache without heavy locks
# Fast path: check cache -- but also verify the underlying environment
# is still alive (it may have been killed by the cleanup thread).
with _file_ops_lock:
if task_id in _file_ops_cache:
return _file_ops_cache[task_id]
# Check if we need to create a new environment
needs_creation = False
with _env_lock:
if task_id not in _active_environments:
needs_creation = True
# Create environment OUTSIDE locks so we don't block other rollouts
# during slow Modal/Docker startup (~10s)
if needs_creation:
config = _get_env_config()
env_type = config["env_type"]
if env_type == "docker":
image = config["docker_image"]
elif env_type == "singularity":
image = config["singularity_image"]
elif env_type == "modal":
image = config["modal_image"]
else:
image = ""
cwd = config["cwd"]
_check_disk_usage_warning()
if not os.getenv("HERMES_QUIET"):
print(f"[FileTools] Creating new {env_type} environment for task {task_id[:8]}...", flush=True)
new_env = _create_environment(
env_type=env_type,
image=image,
cwd=cwd,
timeout=config["timeout"],
)
# Store under lock (brief) -- do NOT call _start_cleanup_thread inside
# the lock because it also acquires _env_lock (non-reentrant = deadlock)
created = False
cached = _file_ops_cache.get(task_id)
if cached is not None:
with _env_lock:
if task_id not in _active_environments:
_active_environments[task_id] = new_env
created = True
if task_id in _active_environments:
_last_activity[task_id] = time.time()
return cached
else:
try:
if hasattr(new_env, 'stop'):
new_env.stop()
except Exception:
pass
if created:
# Environment was cleaned up -- invalidate stale cache entry
with _file_ops_lock:
_file_ops_cache.pop(task_id, None)
# Need to ensure the environment exists before building file_ops.
# Acquire per-task lock so only one thread creates the sandbox.
with _creation_locks_lock:
if task_id not in _creation_locks:
_creation_locks[task_id] = threading.Lock()
task_lock = _creation_locks[task_id]
with task_lock:
# Double-check: another thread may have created it while we waited
with _env_lock:
if task_id in _active_environments:
_last_activity[task_id] = time.time()
terminal_env = _active_environments[task_id]
else:
terminal_env = None
if terminal_env is None:
from tools.terminal_tool import _task_env_overrides
config = _get_env_config()
env_type = config["env_type"]
overrides = _task_env_overrides.get(task_id, {})
if env_type == "docker":
image = overrides.get("docker_image") or config["docker_image"]
elif env_type == "singularity":
image = overrides.get("singularity_image") or config["singularity_image"]
elif env_type == "modal":
image = overrides.get("modal_image") or config["modal_image"]
else:
image = ""
cwd = overrides.get("cwd") or config["cwd"]
if not os.getenv("HERMES_QUIET"):
print(f"[FileTools] Creating new {env_type} environment for task {task_id[:8]}...", flush=True)
terminal_env = _create_environment(
env_type=env_type,
image=image,
cwd=cwd,
timeout=config["timeout"],
)
with _env_lock:
_active_environments[task_id] = terminal_env
_last_activity[task_id] = time.time()
_start_cleanup_thread()
if not os.getenv("HERMES_QUIET"):
print(f"[FileTools] {env_type} environment ready for task {task_id[:8]}", flush=True)
# Now get the environment and build file_ops
with _env_lock:
_last_activity[task_id] = time.time()
terminal_env = _active_environments[task_id]
# Build file_ops from the (guaranteed live) environment and cache it
file_ops = ShellFileOperations(terminal_env)
with _file_ops_lock:
_file_ops_cache[task_id] = file_ops
+727
View File
@@ -0,0 +1,727 @@
"""
Process Registry -- In-memory registry for managed background processes.
Tracks processes spawned via terminal(background=true), providing:
- Output buffering (rolling 200KB window)
- Status polling and log retrieval
- Blocking wait with interrupt support
- Process killing
- Crash recovery via JSON checkpoint file
- Session-scoped tracking for gateway reset protection
Background processes execute THROUGH the environment interface -- nothing
runs on the host machine unless TERMINAL_ENV=local. For Docker, Singularity,
Modal, and SSH backends, the command runs inside the sandbox.
Usage:
from tools.process_registry import process_registry
# Spawn a background process (called from terminal_tool)
session = process_registry.spawn(env, "pytest -v", task_id="task_123")
# Poll for status
result = process_registry.poll(session.id)
# Block until done
result = process_registry.wait(session.id, timeout=300)
# Kill it
process_registry.kill(session.id)
"""
import json
import os
import signal
import subprocess
import threading
import time
import uuid
from dataclasses import dataclass, field
from pathlib import Path
from typing import Any, Dict, List, Optional
# Checkpoint file for crash recovery (gateway only)
CHECKPOINT_PATH = Path(os.path.expanduser("~/.hermes/processes.json"))
# Limits
MAX_OUTPUT_CHARS = 200_000 # 200KB rolling output buffer
FINISHED_TTL_SECONDS = 1800 # Keep finished processes for 30 minutes
MAX_PROCESSES = 64 # Max concurrent tracked processes (LRU pruning)
@dataclass
class ProcessSession:
"""A tracked background process with output buffering."""
id: str # Unique session ID ("proc_xxxxxxxxxxxx")
command: str # Original command string
task_id: str = "" # Task/sandbox isolation key
session_key: str = "" # Gateway session key (for reset protection)
pid: Optional[int] = None # OS process ID
process: Optional[subprocess.Popen] = None # Popen handle (local only)
env_ref: Any = None # Reference to the environment object
cwd: Optional[str] = None # Working directory
started_at: float = 0.0 # time.time() of spawn
exited: bool = False # Whether the process has finished
exit_code: Optional[int] = None # Exit code (None if still running)
output_buffer: str = "" # Rolling output (last MAX_OUTPUT_CHARS)
max_output_chars: int = MAX_OUTPUT_CHARS
detached: bool = False # True if recovered from crash (no pipe)
_lock: threading.Lock = field(default_factory=threading.Lock)
_reader_thread: Optional[threading.Thread] = field(default=None, repr=False)
_pty: Any = field(default=None, repr=False) # ptyprocess handle (when use_pty=True)
class ProcessRegistry:
"""
In-memory registry of running and finished background processes.
Thread-safe. Accessed from:
- Executor threads (terminal_tool, process tool handlers)
- Gateway asyncio loop (watcher tasks, session reset checks)
- Cleanup thread (sandbox reaping coordination)
"""
def __init__(self):
self._running: Dict[str, ProcessSession] = {}
self._finished: Dict[str, ProcessSession] = {}
self._lock = threading.Lock()
# Side-channel for check_interval watchers (gateway reads after agent run)
self.pending_watchers: List[Dict[str, Any]] = []
# ----- Spawn -----
def spawn_local(
self,
command: str,
cwd: str = None,
task_id: str = "",
session_key: str = "",
env_vars: dict = None,
use_pty: bool = False,
) -> ProcessSession:
"""
Spawn a background process locally.
Only for TERMINAL_ENV=local. Other backends use spawn_via_env().
Args:
use_pty: If True, use a pseudo-terminal via ptyprocess for interactive
CLI tools (Codex, Claude Code, Python REPL). Falls back to
subprocess.Popen if ptyprocess is not installed.
"""
session = ProcessSession(
id=f"proc_{uuid.uuid4().hex[:12]}",
command=command,
task_id=task_id,
session_key=session_key,
cwd=cwd or os.getcwd(),
started_at=time.time(),
)
if use_pty:
# Try PTY mode for interactive CLI tools
try:
import ptyprocess
pty_proc = ptyprocess.PtyProcess.spawn(
["bash", "-c", command],
cwd=session.cwd,
env=os.environ | (env_vars or {}),
dimensions=(30, 120),
)
session.pid = pty_proc.pid
# Store the pty handle on the session for read/write
session._pty = pty_proc
# PTY reader thread
reader = threading.Thread(
target=self._pty_reader_loop,
args=(session,),
daemon=True,
name=f"proc-pty-reader-{session.id}",
)
session._reader_thread = reader
reader.start()
with self._lock:
self._prune_if_needed()
self._running[session.id] = session
self._write_checkpoint()
return session
except ImportError:
# ptyprocess not installed -- fall back to Popen
print(f"[ProcessRegistry] ptyprocess not installed, falling back to pipe mode", flush=True)
except Exception as e:
print(f"[ProcessRegistry] PTY spawn failed ({e}), falling back to pipe mode", flush=True)
# Standard Popen path (non-PTY or PTY fallback)
proc = subprocess.Popen(
command,
shell=True,
text=True,
cwd=session.cwd,
env=os.environ | (env_vars or {}),
encoding="utf-8",
errors="replace",
stdout=subprocess.PIPE,
stderr=subprocess.STDOUT,
stdin=subprocess.PIPE,
preexec_fn=os.setsid,
)
session.process = proc
session.pid = proc.pid
# Start output reader thread
reader = threading.Thread(
target=self._reader_loop,
args=(session,),
daemon=True,
name=f"proc-reader-{session.id}",
)
session._reader_thread = reader
reader.start()
with self._lock:
self._prune_if_needed()
self._running[session.id] = session
self._write_checkpoint()
return session
def spawn_via_env(
self,
env: Any,
command: str,
cwd: str = None,
task_id: str = "",
session_key: str = "",
timeout: int = 10,
) -> ProcessSession:
"""
Spawn a background process through a non-local environment backend.
For Docker/Singularity/Modal/SSH: runs the command inside the sandbox
using the environment's execute() interface. We wrap the command to
capture the in-sandbox PID and redirect output to a log file inside
the sandbox, then poll the log via subsequent execute() calls.
This is less capable than local spawn (no live stdout pipe, no stdin),
but it ensures the command runs in the correct sandbox context.
"""
session = ProcessSession(
id=f"proc_{uuid.uuid4().hex[:12]}",
command=command,
task_id=task_id,
session_key=session_key,
cwd=cwd,
started_at=time.time(),
env_ref=env,
)
# Run the command in the sandbox with output capture
log_path = f"/tmp/hermes_bg_{session.id}.log"
pid_path = f"/tmp/hermes_bg_{session.id}.pid"
bg_command = (
f"nohup bash -c '{command}' > {log_path} 2>&1 & "
f"echo $! > {pid_path} && cat {pid_path}"
)
try:
result = env.execute(bg_command, timeout=timeout)
output = result.get("output", "").strip()
# Try to extract the PID from the output
for line in output.splitlines():
line = line.strip()
if line.isdigit():
session.pid = int(line)
break
except Exception as e:
session.exited = True
session.exit_code = -1
session.output_buffer = f"Failed to start: {e}"
if not session.exited:
# Start a poller thread that periodically reads the log file
reader = threading.Thread(
target=self._env_poller_loop,
args=(session, env, log_path, pid_path),
daemon=True,
name=f"proc-poller-{session.id}",
)
session._reader_thread = reader
reader.start()
with self._lock:
self._prune_if_needed()
self._running[session.id] = session
self._write_checkpoint()
return session
# ----- Reader / Poller Threads -----
def _reader_loop(self, session: ProcessSession):
"""Background thread: read stdout from a local Popen process."""
try:
while True:
chunk = session.process.stdout.read(4096)
if not chunk:
break
with session._lock:
session.output_buffer += chunk
if len(session.output_buffer) > session.max_output_chars:
session.output_buffer = session.output_buffer[-session.max_output_chars:]
except Exception:
pass
# Process exited
try:
session.process.wait(timeout=5)
except Exception:
pass
session.exited = True
session.exit_code = session.process.returncode
self._move_to_finished(session)
def _env_poller_loop(
self, session: ProcessSession, env: Any, log_path: str, pid_path: str
):
"""Background thread: poll a sandbox log file for non-local backends."""
while not session.exited:
time.sleep(2) # Poll every 2 seconds
try:
# Read new output from the log file
result = env.execute(f"cat {log_path} 2>/dev/null", timeout=10)
new_output = result.get("output", "")
if new_output:
with session._lock:
session.output_buffer = new_output
if len(session.output_buffer) > session.max_output_chars:
session.output_buffer = session.output_buffer[-session.max_output_chars:]
# Check if process is still running
check = env.execute(
f"kill -0 $(cat {pid_path} 2>/dev/null) 2>/dev/null; echo $?",
timeout=5,
)
check_output = check.get("output", "").strip()
if check_output and check_output.splitlines()[-1].strip() != "0":
# Process has exited -- get exit code
exit_result = env.execute(
f"wait $(cat {pid_path} 2>/dev/null) 2>/dev/null; echo $?",
timeout=5,
)
exit_str = exit_result.get("output", "").strip()
try:
session.exit_code = int(exit_str.splitlines()[-1].strip())
except (ValueError, IndexError):
session.exit_code = -1
session.exited = True
self._move_to_finished(session)
return
except Exception:
# Environment might be gone (sandbox reaped, etc.)
session.exited = True
session.exit_code = -1
self._move_to_finished(session)
return
def _pty_reader_loop(self, session: ProcessSession):
"""Background thread: read output from a PTY process."""
pty = session._pty
try:
while pty.isalive():
try:
chunk = pty.read(4096)
if chunk:
# ptyprocess returns bytes
text = chunk if isinstance(chunk, str) else chunk.decode("utf-8", errors="replace")
with session._lock:
session.output_buffer += text
if len(session.output_buffer) > session.max_output_chars:
session.output_buffer = session.output_buffer[-session.max_output_chars:]
except EOFError:
break
except Exception:
break
except Exception:
pass
# Process exited
try:
pty.wait()
except Exception:
pass
session.exited = True
session.exit_code = pty.exitstatus if hasattr(pty, 'exitstatus') else -1
self._move_to_finished(session)
def _move_to_finished(self, session: ProcessSession):
"""Move a session from running to finished."""
with self._lock:
self._running.pop(session.id, None)
self._finished[session.id] = session
self._write_checkpoint()
# ----- Query Methods -----
def get(self, session_id: str) -> Optional[ProcessSession]:
"""Get a session by ID (running or finished)."""
with self._lock:
return self._running.get(session_id) or self._finished.get(session_id)
def poll(self, session_id: str) -> dict:
"""Check status and get new output for a background process."""
session = self.get(session_id)
if session is None:
return {"status": "not_found", "error": f"No process with ID {session_id}"}
with session._lock:
output_preview = session.output_buffer[-1000:] if session.output_buffer else ""
result = {
"session_id": session.id,
"command": session.command,
"status": "exited" if session.exited else "running",
"pid": session.pid,
"uptime_seconds": int(time.time() - session.started_at),
"output_preview": output_preview,
}
if session.exited:
result["exit_code"] = session.exit_code
if session.detached:
result["detached"] = True
result["note"] = "Process recovered after restart -- output history unavailable"
return result
def read_log(self, session_id: str, offset: int = 0, limit: int = 200) -> dict:
"""Read the full output log with optional pagination by lines."""
session = self.get(session_id)
if session is None:
return {"status": "not_found", "error": f"No process with ID {session_id}"}
with session._lock:
full_output = session.output_buffer
lines = full_output.splitlines()
total_lines = len(lines)
# Default: last N lines
if offset == 0 and limit > 0:
selected = lines[-limit:]
else:
selected = lines[offset:offset + limit]
return {
"session_id": session.id,
"status": "exited" if session.exited else "running",
"output": "\n".join(selected),
"total_lines": total_lines,
"showing": f"{len(selected)} lines",
}
def wait(self, session_id: str, timeout: int = None) -> dict:
"""
Block until a process exits, timeout, or interrupt.
Args:
session_id: The process to wait for.
timeout: Max seconds to block. Falls back to TERMINAL_TIMEOUT config.
Returns:
dict with status ("exited", "timeout", "interrupted", "not_found")
and output snapshot.
"""
from tools.terminal_tool import _interrupt_event
default_timeout = int(os.getenv("TERMINAL_TIMEOUT", "180"))
max_timeout = default_timeout
requested_timeout = timeout
timeout_note = None
if requested_timeout and requested_timeout > max_timeout:
effective_timeout = max_timeout
timeout_note = (
f"Requested wait of {requested_timeout}s was clamped "
f"to configured limit of {max_timeout}s"
)
else:
effective_timeout = requested_timeout or max_timeout
session = self.get(session_id)
if session is None:
return {"status": "not_found", "error": f"No process with ID {session_id}"}
deadline = time.monotonic() + effective_timeout
while time.monotonic() < deadline:
if session.exited:
result = {
"status": "exited",
"exit_code": session.exit_code,
"output": session.output_buffer[-2000:],
}
if timeout_note:
result["timeout_note"] = timeout_note
return result
if _interrupt_event.is_set():
result = {
"status": "interrupted",
"output": session.output_buffer[-1000:],
"note": "User sent a new message -- wait interrupted",
}
if timeout_note:
result["timeout_note"] = timeout_note
return result
time.sleep(1)
result = {
"status": "timeout",
"output": session.output_buffer[-1000:],
}
if timeout_note:
result["timeout_note"] = timeout_note
else:
result["timeout_note"] = f"Waited {effective_timeout}s, process still running"
return result
def kill_process(self, session_id: str) -> dict:
"""Kill a background process."""
session = self.get(session_id)
if session is None:
return {"status": "not_found", "error": f"No process with ID {session_id}"}
if session.exited:
return {
"status": "already_exited",
"exit_code": session.exit_code,
}
# Kill via PTY, Popen (local), or env execute (non-local)
try:
if session._pty:
# PTY process -- terminate via ptyprocess
try:
session._pty.terminate(force=True)
except Exception:
if session.pid:
os.kill(session.pid, signal.SIGTERM)
elif session.process:
# Local process -- kill the process group
try:
os.killpg(os.getpgid(session.process.pid), signal.SIGTERM)
except (ProcessLookupError, PermissionError):
session.process.kill()
elif session.env_ref and session.pid:
# Non-local -- kill inside sandbox
session.env_ref.execute(f"kill {session.pid} 2>/dev/null", timeout=5)
session.exited = True
session.exit_code = -15 # SIGTERM
self._move_to_finished(session)
self._write_checkpoint()
return {"status": "killed", "session_id": session.id}
except Exception as e:
return {"status": "error", "error": str(e)}
def write_stdin(self, session_id: str, data: str) -> dict:
"""Send raw data to a running process's stdin (no newline appended)."""
session = self.get(session_id)
if session is None:
return {"status": "not_found", "error": f"No process with ID {session_id}"}
if session.exited:
return {"status": "already_exited", "error": "Process has already finished"}
# PTY mode -- write through pty handle (expects bytes)
if hasattr(session, '_pty') and session._pty:
try:
pty_data = data.encode("utf-8") if isinstance(data, str) else data
session._pty.write(pty_data)
return {"status": "ok", "bytes_written": len(data)}
except Exception as e:
return {"status": "error", "error": str(e)}
# Popen mode -- write through stdin pipe
if not session.process or not session.process.stdin:
return {"status": "error", "error": "Process stdin not available (non-local backend or stdin closed)"}
try:
session.process.stdin.write(data)
session.process.stdin.flush()
return {"status": "ok", "bytes_written": len(data)}
except Exception as e:
return {"status": "error", "error": str(e)}
def submit_stdin(self, session_id: str, data: str = "") -> dict:
"""Send data + newline to a running process's stdin (like pressing Enter)."""
return self.write_stdin(session_id, data + "\n")
def list_sessions(self, task_id: str = None) -> list:
"""List all running and recently-finished processes."""
with self._lock:
all_sessions = list(self._running.values()) + list(self._finished.values())
if task_id:
all_sessions = [s for s in all_sessions if s.task_id == task_id]
result = []
for s in all_sessions:
entry = {
"session_id": s.id,
"command": s.command[:200],
"cwd": s.cwd,
"pid": s.pid,
"started_at": time.strftime("%Y-%m-%dT%H:%M:%S", time.localtime(s.started_at)),
"uptime_seconds": int(time.time() - s.started_at),
"status": "exited" if s.exited else "running",
"output_preview": s.output_buffer[-200:] if s.output_buffer else "",
}
if s.exited:
entry["exit_code"] = s.exit_code
if s.detached:
entry["detached"] = True
result.append(entry)
return result
# ----- Session/Task Queries (for gateway integration) -----
def has_active_processes(self, task_id: str) -> bool:
"""Check if there are active (running) processes for a task_id."""
with self._lock:
return any(
s.task_id == task_id and not s.exited
for s in self._running.values()
)
def has_active_for_session(self, session_key: str) -> bool:
"""Check if there are active processes for a gateway session key."""
with self._lock:
return any(
s.session_key == session_key and not s.exited
for s in self._running.values()
)
def kill_all(self, task_id: str = None) -> int:
"""Kill all running processes, optionally filtered by task_id. Returns count killed."""
with self._lock:
targets = [
s for s in self._running.values()
if (task_id is None or s.task_id == task_id) and not s.exited
]
killed = 0
for session in targets:
result = self.kill_process(session.id)
if result.get("status") in ("killed", "already_exited"):
killed += 1
return killed
# ----- Cleanup / Pruning -----
def _prune_if_needed(self):
"""Remove oldest finished sessions if over MAX_PROCESSES. Must hold _lock."""
# First prune expired finished sessions
now = time.time()
expired = [
sid for sid, s in self._finished.items()
if (now - s.started_at) > FINISHED_TTL_SECONDS
]
for sid in expired:
del self._finished[sid]
# If still over limit, remove oldest finished
total = len(self._running) + len(self._finished)
if total >= MAX_PROCESSES and self._finished:
oldest_id = min(self._finished, key=lambda sid: self._finished[sid].started_at)
del self._finished[oldest_id]
def cleanup_expired(self):
"""Public method to prune expired finished sessions."""
with self._lock:
self._prune_if_needed()
# ----- Checkpoint (crash recovery) -----
def _write_checkpoint(self):
"""Write running process metadata to checkpoint file."""
try:
with self._lock:
entries = []
for s in self._running.values():
if not s.exited:
entries.append({
"session_id": s.id,
"command": s.command,
"pid": s.pid,
"cwd": s.cwd,
"started_at": s.started_at,
"task_id": s.task_id,
"session_key": s.session_key,
})
CHECKPOINT_PATH.parent.mkdir(parents=True, exist_ok=True)
CHECKPOINT_PATH.write_text(
json.dumps(entries, indent=2), encoding="utf-8"
)
except Exception:
pass # Best-effort
def recover_from_checkpoint(self) -> int:
"""
On gateway startup, probe PIDs from checkpoint file.
Returns the number of processes recovered as detached.
"""
if not CHECKPOINT_PATH.exists():
return 0
try:
entries = json.loads(CHECKPOINT_PATH.read_text(encoding="utf-8"))
except Exception:
return 0
recovered = 0
for entry in entries:
pid = entry.get("pid")
if not pid:
continue
# Check if PID is still alive
alive = False
try:
os.kill(pid, 0)
alive = True
except (ProcessLookupError, PermissionError):
pass
if alive:
session = ProcessSession(
id=entry["session_id"],
command=entry.get("command", "unknown"),
task_id=entry.get("task_id", ""),
session_key=entry.get("session_key", ""),
pid=pid,
cwd=entry.get("cwd"),
started_at=entry.get("started_at", time.time()),
detached=True, # Can't read output, but can report status + kill
)
with self._lock:
self._running[session.id] = session
recovered += 1
print(f"[ProcessRegistry] Recovered detached process: {session.command[:60]} (pid={pid})", flush=True)
# Clear the checkpoint (will be rewritten as processes finish)
try:
CHECKPOINT_PATH.write_text("[]", encoding="utf-8")
except Exception:
pass
return recovered
# Module-level singleton
process_registry = ProcessRegistry()
+512 -202
View File
@@ -28,6 +28,7 @@ Usage:
import json
import os
import signal
import sys
import time
import threading
@@ -39,6 +40,28 @@ import uuid
from pathlib import Path
from typing import Optional, Dict, Any
# ---------------------------------------------------------------------------
# Global interrupt event: set by the agent when a user interrupt arrives.
# The terminal tool polls this during command execution so it can kill
# long-running subprocesses immediately instead of blocking until timeout.
# ---------------------------------------------------------------------------
_interrupt_event = threading.Event()
def set_interrupt_event(active: bool) -> None:
"""Called by the agent to signal or clear the interrupt."""
if active:
_interrupt_event.set()
else:
_interrupt_event.clear()
def is_interrupted() -> bool:
"""Check if an interrupt has been requested."""
return _interrupt_event.is_set()
# Add mini-swe-agent to path if not installed
mini_swe_path = Path(__file__).parent.parent / "mini-swe-agent" / "src"
if mini_swe_path.exists():
@@ -83,9 +106,9 @@ def _get_apptainer_cache_dir() -> Path:
cache_path.mkdir(parents=True, exist_ok=True)
return cache_path
# Use scratch dir parent for cache (one level up from sandboxes)
# Use user-specific subdirectory in scratch for cache
scratch = _get_scratch_dir()
cache_path = scratch.parent / ".apptainer"
cache_path = scratch / ".apptainer"
cache_path.mkdir(parents=True, exist_ok=True)
return cache_path
@@ -214,6 +237,10 @@ _cached_sudo_password: str = ""
# Session-cached dangerous command approvals (pattern -> approved)
_session_approved_patterns: set = set()
# Last approval-required command (for gateway to pick up)
# Set by _check_dangerous_command when in ask mode, read by gateway
_last_pending_approval: dict = {}
# Dangerous command patterns (regex, description)
DANGEROUS_PATTERNS = [
(r'\brm\s+(-[^\s]*\s+)*/', "delete in root path"),
@@ -385,12 +412,22 @@ def _check_dangerous_command(command: str, env_type: str) -> dict:
# Programmatic use - allow (user opted into local backend)
return {"approved": True, "message": None}
if is_gateway:
# Messaging context - return informative denial, agent should ask user
if is_gateway or os.getenv("HERMES_EXEC_ASK"):
# Messaging context - return approval_required so the gateway can
# prompt the user interactively instead of just blocking
global _last_pending_approval
_last_pending_approval = {
"command": command,
"pattern_key": pattern_key,
"description": description,
}
return {
"approved": False,
"pattern_key": pattern_key,
"message": f"BLOCKED: This command is potentially dangerous ({description}). Tell the user and ask if they want to add this command pattern to their allowlist. They can do this via 'hermes config edit' or by running the command directly on their machine."
"status": "approval_required",
"command": command,
"description": description,
"message": f"⚠️ This command is potentially dangerous ({description}). Asking the user for approval..."
}
# CLI context - prompt user
@@ -599,7 +636,13 @@ class _LocalEnvironment:
self.env = env or {}
def execute(self, command: str, cwd: str = "", *, timeout: int | None = None) -> dict:
"""Execute a command locally with sudo support."""
"""
Execute a command locally with sudo support.
Uses Popen + polling so the global interrupt event can kill the
process early when the user sends a new message, instead of
blocking for the full timeout.
"""
work_dir = cwd or self.cwd or os.getcwd()
effective_timeout = timeout or self.timeout
@@ -607,22 +650,56 @@ class _LocalEnvironment:
exec_command = _transform_sudo_command(command)
try:
result = subprocess.run(
proc = subprocess.Popen(
exec_command,
shell=True,
text=True,
cwd=work_dir,
env=os.environ | self.env,
timeout=effective_timeout,
encoding="utf-8",
errors="replace",
stdout=subprocess.PIPE,
stderr=subprocess.STDOUT,
stdin=subprocess.DEVNULL, # Prevent hanging on interactive prompts
# Start in a new process group so we can kill the whole tree
preexec_fn=os.setsid,
)
return {"output": result.stdout, "returncode": result.returncode}
except subprocess.TimeoutExpired:
return {"output": f"Command timed out after {effective_timeout}s", "returncode": 124}
deadline = time.monotonic() + effective_timeout
# Poll every 200ms so we notice interrupts quickly
while proc.poll() is None:
if _interrupt_event.is_set():
# User sent a new message — kill the process tree and return
# what we have so far
try:
os.killpg(os.getpgid(proc.pid), signal.SIGTERM)
except (ProcessLookupError, PermissionError):
proc.kill()
# Grab any partial output
partial, _ = proc.communicate(timeout=2)
output = partial or ""
return {
"output": output + "\n[Command interrupted — user sent a new message]",
"returncode": 130 # Standard interrupted exit code
}
if time.monotonic() > deadline:
# Timeout — kill process tree
try:
os.killpg(os.getpgid(proc.pid), signal.SIGTERM)
except (ProcessLookupError, PermissionError):
proc.kill()
proc.communicate(timeout=2)
return {"output": f"Command timed out after {effective_timeout}s", "returncode": 124}
# Short sleep to avoid busy-waiting
time.sleep(0.2)
# Process finished normally — read all output
stdout, _ = proc.communicate()
return {"output": stdout or "", "returncode": proc.returncode}
except Exception as e:
return {"output": f"Execution error: {str(e)}", "returncode": 1}
@@ -637,15 +714,21 @@ class _LocalEnvironment:
class _SingularityEnvironment:
"""
Custom Singularity/Apptainer environment with better space management.
Persistent Singularity/Apptainer container environment.
- Automatically builds/caches SIF images from docker:// URLs
- Builds sandbox in /scratch (if available) or configurable location
- Binds a large working directory into the container
- Keeps container isolated from host filesystem
Uses `apptainer instance` to create a long-running container that persists
state (files, installs, env changes) across all commands within a task.
The model experiences this as a real Linux VM.
Features:
- Persistent filesystem: files created in one command are visible in the next
- Package installs persist: pip/apt installs survive across tool calls
- Full isolation: --containall gives PID, IPC, and environment isolation
- Writable tmpfs overlay: full root filesystem is writable (RAM-backed)
- Automatic SIF caching: docker:// images converted to SIF once, reused forever
"""
def __init__(self, image: str, cwd: str = "/workspace", timeout: int = 60):
def __init__(self, image: str, cwd: str = "/root", timeout: int = 60):
self.cwd = cwd
self.timeout = timeout
@@ -655,60 +738,60 @@ class _SingularityEnvironment:
# Get or build SIF from docker:// URL (fast if already cached)
self.image = _get_or_build_sif(image, self.executable)
# Get scratch directory for sandbox
self.scratch_dir = _get_scratch_dir()
# Create unique instance name (must be alphanumeric + underscores)
self.instance_id = f"hermes_{uuid.uuid4().hex[:12]}"
self._instance_started = False
# Create unique sandbox directory
self.sandbox_id = f"hermes-{uuid.uuid4().hex[:12]}"
self.sandbox_dir = self.scratch_dir / self.sandbox_id
# Create a working directory that will be bound into the container
self.work_dir = self.scratch_dir / f"{self.sandbox_id}-work"
self.work_dir.mkdir(parents=True, exist_ok=True)
# Build the sandbox
self._build_sandbox()
# Start the persistent instance
self._start_instance()
def _build_sandbox(self):
"""Build a writable sandbox from the container image (SIF or other)."""
def _start_instance(self):
"""Start a persistent apptainer instance.
The instance runs as a background process. All subsequent execute() calls
run commands inside this same instance, so state persists across calls.
"""
cmd = [
self.executable, "instance", "start",
"--writable-tmpfs", # RAM-backed writable overlay on read-only SIF
"--containall", # Full isolation: PID, IPC, environment, filesystem
str(self.image),
self.instance_id,
]
try:
result = subprocess.run(
[self.executable, "build", "--sandbox", str(self.sandbox_dir), self.image],
cmd,
capture_output=True,
text=True,
timeout=300 # 5 min timeout for building
timeout=120, # 2 min for instance startup
)
if result.returncode != 0:
raise RuntimeError(f"Failed to build sandbox: {result.stderr}")
raise RuntimeError(f"Failed to start instance: {result.stderr}")
# Create /workspace directory inside the sandbox for bind mounting
workspace_in_sandbox = self.sandbox_dir / "workspace"
workspace_in_sandbox.mkdir(parents=True, exist_ok=True)
self._instance_started = True
print(f"[Singularity] Instance {self.instance_id} started (persistent container)", flush=True)
except subprocess.TimeoutExpired:
shutil.rmtree(self.sandbox_dir, ignore_errors=True)
raise RuntimeError("Sandbox build timed out")
raise RuntimeError("Instance start timed out")
def execute(self, command: str, cwd: str = "", *, timeout: int | None = None) -> dict:
"""Execute a command in the Singularity container."""
"""Execute a command in the persistent Singularity instance.
All commands run in the same container, so files, installs, and
environment changes persist between calls.
"""
if not self._instance_started:
return {"output": "Instance not started", "returncode": -1}
cmd = [self.executable, "exec"]
# Isolation flags - contain but allow network
cmd.extend(["--contain", "--cleanenv"])
# Bind the working directory into the container at /workspace
# This gives the container access to a large writable space
cmd.extend(["--bind", f"{self.work_dir}:/workspace"])
# Also bind it to /tmp inside container for pip cache etc.
cmd.extend(["--bind", f"{self.work_dir}:/tmp"])
# Set working directory
work_dir = cwd or self.cwd
cmd.extend(["--pwd", work_dir])
# Use writable sandbox
cmd.extend(["--writable", str(self.sandbox_dir)])
# Connect to the running instance
cmd.append(f"instance://{self.instance_id}")
# Transform sudo commands if SUDO_PASSWORD is available
exec_command = _transform_sudo_command(command)
@@ -732,9 +815,19 @@ class _SingularityEnvironment:
return {"output": f"Command timed out after {timeout or self.timeout}s", "returncode": 124}
def cleanup(self):
"""Clean up sandbox and working directory."""
shutil.rmtree(self.sandbox_dir, ignore_errors=True)
shutil.rmtree(self.work_dir, ignore_errors=True)
"""Stop the persistent instance and clean up."""
if self._instance_started:
try:
subprocess.run(
[self.executable, "instance", "stop", self.instance_id],
capture_output=True,
text=True,
timeout=30,
)
print(f"[Singularity] Instance {self.instance_id} stopped", flush=True)
except Exception as e:
print(f"[Singularity] Warning: failed to stop instance {self.instance_id}: {e}", flush=True)
self._instance_started = False
def stop(self):
"""Alias for cleanup."""
@@ -742,7 +835,10 @@ class _SingularityEnvironment:
def __del__(self):
"""Cleanup on destruction."""
self.cleanup()
try:
self.cleanup()
except:
pass
class _SSHEnvironment:
@@ -957,13 +1053,37 @@ class _ModalEnvironment:
Wraps mini-swe-agent's SwerexModalEnvironment but adds:
- SUDO_PASSWORD support via _transform_sudo_command
- Automatic async-safety patches (applied once, before first use)
Note: stdin handling is not needed for Modal since it uses remote async execution.
The patches replace SwerexModalEnvironment's asyncio.run() calls with a
background thread approach, making it safe to use inside any event loop
(e.g., Atropos). Applied here at the point of use rather than relying on
import-time side effects, so ALL callers get the fix automatically.
"""
# Class-level flag: patches only need to be applied once
_patches_applied = False
def __init__(self, image: str, cwd: str = "/root", timeout: int = 60):
# Ensure async-safety patches are applied before creating any
# SwerexModalEnvironment instance. This is the single authoritative
# place -- no other module needs to call apply_patches() for Modal.
if not _ModalEnvironment._patches_applied:
try:
from environments.patches import apply_patches
apply_patches()
except ImportError:
pass # patches module not available (standalone use)
_ModalEnvironment._patches_applied = True
from minisweagent.environments.extra.swerex_modal import SwerexModalEnvironment
self._inner = SwerexModalEnvironment(image=image, cwd=cwd, timeout=timeout)
# Generous startup timeout: sandbox creation can take 30-60s for cold images,
# and the SWE-ReX runtime needs another 10-30s to boot inside it.
self._inner = SwerexModalEnvironment(
image=image, cwd=cwd, timeout=timeout,
startup_timeout=180.0,
runtime_timeout=3600.0,
)
self.cwd = cwd
self.timeout = timeout
@@ -1002,22 +1122,33 @@ TERMINAL_TOOL_DESCRIPTION = """Execute commands on a secure Linux environment.
**Command Execution:**
- Simple commands: Just provide the 'command' parameter
- Background processes: Set 'background': True for servers/long-running tasks
- Background processes: Set 'background': true to get a session_id for monitoring via the 'process' tool
- Command timeout: Optional 'timeout' parameter in seconds
- Working directory: Optional 'workdir' parameter for per-command cwd
- PTY mode: Set 'pty': true for interactive CLI tools (Codex, Claude Code, etc.)
**Examples:**
- Run command: `{"command": "ls -la"}`
- Background task: `{"command": "source venv/bin/activate && python server.py", "background": True}`
- Background task: `{"command": "pytest -v tests/", "background": true}` -- returns session_id, use process tool to poll/wait/kill
- With workdir: `{"command": "npm install", "workdir": "/home/user/project"}`
- With timeout: `{"command": "long_task.sh", "timeout": 300}`
- Interactive CLI: `{"command": "codex exec 'Add tests'", "background": true, "pty": true}`
**Background Process Workflow:**
1. Start: `terminal(command="...", background=true)` -- returns session_id
2. Monitor: `process(action="poll", session_id="...")` -- check status + new output
3. Wait: `process(action="wait", session_id="...", timeout=600)` -- block until done
4. Interact: `process(action="write/submit", session_id="...", data="y")` -- send stdin
5. Kill: `process(action="kill", session_id="...")` -- terminate
**Best Practices:**
- Run servers/long processes in background
- Monitor disk usage for large tasks
- Use background mode for long-running tasks, then process(wait) to block until completion
- Use workdir to run commands in specific project directories
- Install whatever tools you need with apt-get or pip
- Do not be afraid to run pip with --break-system-packages
- Try to create or use a venv with uv or python -m venv to keep isolation from global system packages
**Things to avoid:**
- Do NOT use interactive tools such as tmux, vim, nano, python repl - you will get stuck.
- Do NOT use interactive tools (vim, nano, python repl) without pty=true -- they will hang without a pseudo-terminal.
- Even git sometimes becomes interactive if the output is large. If you're not sure, pipe to cat.
"""
@@ -1026,9 +1157,49 @@ _active_environments: Dict[str, Any] = {}
_task_workdirs: Dict[str, str] = {} # Maps task_id to working directory
_last_activity: Dict[str, float] = {}
_env_lock = threading.Lock()
_creation_locks: Dict[str, threading.Lock] = {} # Per-task locks for sandbox creation
_creation_locks_lock = threading.Lock() # Protects _creation_locks dict itself
_cleanup_thread = None
_cleanup_running = False
# Per-task environment overrides registry.
# Allows environments (e.g., TerminalBench2Env) to specify a custom Docker/Modal
# image for a specific task_id BEFORE the agent loop starts. When the terminal or
# file tools create a new sandbox for that task_id, they check this registry first
# and fall back to the TERMINAL_MODAL_IMAGE (etc.) env var if no override is set.
#
# This is never exposed to the model -- only infrastructure code calls it.
# Thread-safe because each task_id is unique per rollout.
_task_env_overrides: Dict[str, Dict[str, Any]] = {}
def register_task_env_overrides(task_id: str, overrides: Dict[str, Any]):
"""
Register environment overrides for a specific task/rollout.
Called by Atropos environments before the agent loop to configure
per-task sandbox settings (e.g., a custom Dockerfile for the Modal image).
Supported override keys:
- modal_image: str -- Path to Dockerfile or Docker Hub image name
- docker_image: str -- Docker image name
- cwd: str -- Working directory inside the sandbox
Args:
task_id: The rollout's unique task identifier
overrides: Dict of config keys to override
"""
_task_env_overrides[task_id] = overrides
def clear_task_env_overrides(task_id: str):
"""
Clear environment overrides for a task after rollout completes.
Called during cleanup to avoid stale entries accumulating.
"""
_task_env_overrides.pop(task_id, None)
# Configuration from environment variables
def _get_env_config() -> Dict[str, Any]:
"""Get terminal environment configuration from environment variables."""
@@ -1037,22 +1208,42 @@ def _get_env_config() -> Dict[str, Any]:
env_type = os.getenv("TERMINAL_ENV", "local")
# Default cwd depends on backend:
# - local/ssh: current working directory (CLI resolves "." before we get here)
# - docker/singularity: /tmp inside the container (singularity bind-mounts /scratch there)
# - modal: /root (ephemeral cloud container, full filesystem access)
if env_type == "modal":
# - local: host's current working directory
# - ssh: remote user's home (agent code is local, execution is remote)
# - docker: / inside the container
# - singularity/modal: /root (ephemeral cloud/container)
if env_type in ("modal", "singularity"):
default_cwd = "/root"
elif env_type in ("docker", "singularity"):
default_cwd = "/tmp"
elif env_type == "docker":
default_cwd = "/"
elif env_type == "ssh":
default_cwd = "~"
else:
default_cwd = os.getcwd()
# Read TERMINAL_CWD but sanity-check it for non-local backends.
# If the CWD looks like a host-local path that can't exist inside a
# container/sandbox, fall back to the backend's own default. This
# catches the case where cli.py (or .env) leaked the host's CWD.
cwd = os.getenv("TERMINAL_CWD", default_cwd)
if env_type in ("modal", "docker", "singularity", "ssh") and cwd:
# Paths containing common host-only prefixes are clearly wrong
# inside a container. Also catch Windows-style paths (C:\...).
host_prefixes = ("/Users/", "/home/", "C:\\", "C:/")
if any(cwd.startswith(p) for p in host_prefixes) and cwd != default_cwd:
if not os.getenv("HERMES_QUIET"):
print(
f"[Terminal] Ignoring TERMINAL_CWD={cwd!r} for {env_type} backend "
f"(host path won't exist in sandbox). Using {default_cwd!r} instead."
)
cwd = default_cwd
return {
"env_type": env_type,
"docker_image": os.getenv("TERMINAL_DOCKER_IMAGE", default_image),
"singularity_image": os.getenv("TERMINAL_SINGULARITY_IMAGE", f"docker://{default_image}"),
"modal_image": os.getenv("TERMINAL_MODAL_IMAGE", default_image),
"cwd": os.getenv("TERMINAL_CWD", default_cwd),
"cwd": cwd,
"timeout": int(os.getenv("TERMINAL_TIMEOUT", "60")),
"lifetime_seconds": int(os.getenv("TERMINAL_LIFETIME_SECONDS", "300")),
# SSH-specific config
@@ -1114,49 +1305,66 @@ def _cleanup_inactive_envs(lifetime_seconds: int = 300):
global _active_environments, _last_activity
current_time = time.time()
tasks_to_cleanup = []
# Check the process registry -- skip cleanup for sandboxes with active
# background processes (their _last_activity gets refreshed to keep them alive).
try:
from tools.process_registry import process_registry
for task_id in list(_last_activity.keys()):
if process_registry.has_active_processes(task_id):
_last_activity[task_id] = current_time # Keep sandbox alive
except ImportError:
pass
# Phase 1: collect stale entries and remove them from tracking dicts while
# holding the lock. Do NOT call env.cleanup() inside the lock -- Modal and
# Docker teardown can block for 10-15s, which would stall every concurrent
# terminal/file tool call waiting on _env_lock.
envs_to_stop = [] # list of (task_id, env) pairs
with _env_lock:
for task_id, last_time in list(_last_activity.items()):
if current_time - last_time > lifetime_seconds:
tasks_to_cleanup.append(task_id)
env = _active_environments.pop(task_id, None)
_last_activity.pop(task_id, None)
_task_workdirs.pop(task_id, None)
if env is not None:
envs_to_stop.append((task_id, env))
for task_id in tasks_to_cleanup:
try:
if task_id in _active_environments:
env = _active_environments[task_id]
# Try various cleanup methods
if hasattr(env, 'cleanup'):
env.cleanup()
elif hasattr(env, 'stop'):
env.stop()
elif hasattr(env, 'terminate'):
env.terminate()
# Also purge per-task creation locks for cleaned-up tasks
with _creation_locks_lock:
for task_id, _ in envs_to_stop:
_creation_locks.pop(task_id, None)
del _active_environments[task_id]
if not os.getenv("HERMES_QUIET"):
print(f"[Terminal Cleanup] Cleaned up inactive environment for task: {task_id}")
# Phase 2: stop the actual sandboxes OUTSIDE the lock so other tool calls
# are not blocked while Modal/Docker sandboxes shut down.
for task_id, env in envs_to_stop:
# Invalidate stale file_ops cache entry (Bug fix: prevents
# ShellFileOperations from referencing a dead sandbox)
try:
from tools.file_tools import clear_file_ops_cache
clear_file_ops_cache(task_id)
except ImportError:
pass
if task_id in _last_activity:
del _last_activity[task_id]
if task_id in _task_workdirs:
del _task_workdirs[task_id]
try:
if hasattr(env, 'cleanup'):
env.cleanup()
elif hasattr(env, 'stop'):
env.stop()
elif hasattr(env, 'terminate'):
env.terminate()
except Exception as e:
error_str = str(e)
if not os.getenv("HERMES_QUIET"):
if "404" in error_str or "not found" in error_str.lower():
print(f"[Terminal Cleanup] Environment for task {task_id} already cleaned up")
else:
print(f"[Terminal Cleanup] Error cleaning up environment for task {task_id}: {e}")
# Always remove from tracking dicts
if task_id in _active_environments:
del _active_environments[task_id]
if task_id in _last_activity:
del _last_activity[task_id]
if task_id in _task_workdirs:
del _task_workdirs[task_id]
if not os.getenv("HERMES_QUIET"):
print(f"[Terminal Cleanup] Cleaned up inactive environment for task: {task_id}")
except Exception as e:
error_str = str(e)
if not os.getenv("HERMES_QUIET"):
if "404" in error_str or "not found" in error_str.lower():
print(f"[Terminal Cleanup] Environment for task {task_id} already cleaned up")
else:
print(f"[Terminal Cleanup] Error cleaning up environment for task {task_id}: {e}")
def _cleanup_thread_worker():
@@ -1246,7 +1454,8 @@ def cleanup_all_environments():
except:
pass
print(f"[Terminal Cleanup] Cleaned {cleaned} environments")
if not os.getenv("HERMES_QUIET") and cleaned > 0:
print(f"[Terminal Cleanup] Cleaned {cleaned} environments")
return cleaned
@@ -1254,37 +1463,58 @@ def cleanup_vm(task_id: str):
"""Manually clean up a specific environment by task_id."""
global _active_environments, _last_activity, _task_workdirs
# Remove from tracking dicts while holding the lock, but defer the
# actual (potentially slow) env.cleanup() call to outside the lock
# so other tool calls aren't blocked.
env = None
with _env_lock:
try:
if task_id in _active_environments:
env = _active_environments[task_id]
if hasattr(env, 'cleanup'):
env.cleanup()
elif hasattr(env, 'stop'):
env.stop()
elif hasattr(env, 'terminate'):
env.terminate()
env = _active_environments.pop(task_id, None)
_task_workdirs.pop(task_id, None)
_last_activity.pop(task_id, None)
del _active_environments[task_id]
if not os.getenv("HERMES_QUIET"):
print(f"[Terminal Cleanup] Manually cleaned up environment for task: {task_id}")
# Clean up per-task creation lock
with _creation_locks_lock:
_creation_locks.pop(task_id, None)
if task_id in _task_workdirs:
del _task_workdirs[task_id]
# Invalidate stale file_ops cache entry
try:
from tools.file_tools import clear_file_ops_cache
clear_file_ops_cache(task_id)
except ImportError:
pass
if task_id in _last_activity:
del _last_activity[task_id]
if env is None:
return
except Exception as e:
if not os.getenv("HERMES_QUIET"):
error_str = str(e)
if "404" in error_str or "not found" in error_str.lower():
print(f"[Terminal Cleanup] Environment for task {task_id} already cleaned up")
else:
print(f"[Terminal Cleanup] Error cleaning up environment for task {task_id}: {e}")
try:
if hasattr(env, 'cleanup'):
env.cleanup()
elif hasattr(env, 'stop'):
env.stop()
elif hasattr(env, 'terminate'):
env.terminate()
if not os.getenv("HERMES_QUIET"):
print(f"[Terminal Cleanup] Manually cleaned up environment for task: {task_id}")
except Exception as e:
if not os.getenv("HERMES_QUIET"):
error_str = str(e)
if "404" in error_str or "not found" in error_str.lower():
print(f"[Terminal Cleanup] Environment for task {task_id} already cleaned up")
else:
print(f"[Terminal Cleanup] Error cleaning up environment for task {task_id}: {e}")
atexit.register(_stop_cleanup_thread)
def _atexit_cleanup():
"""Stop cleanup thread and shut down all remaining sandboxes on exit."""
_stop_cleanup_thread()
if _active_environments:
count = len(_active_environments)
print(f"\n[Terminal Cleanup] Shutting down {count} remaining sandbox(es)...")
cleanup_all_environments()
atexit.register(_atexit_cleanup)
def terminal_tool(
@@ -1292,7 +1522,10 @@ def terminal_tool(
background: bool = False,
timeout: Optional[int] = None,
task_id: Optional[str] = None,
force: bool = False
force: bool = False,
workdir: Optional[str] = None,
check_interval: Optional[int] = None,
pty: bool = False,
) -> str:
"""
Execute a command using mini-swe-agent's execution environments.
@@ -1303,6 +1536,9 @@ def terminal_tool(
timeout: Command timeout in seconds (default: from config)
task_id: Unique identifier for environment isolation (optional)
force: If True, skip dangerous command check (use after user confirms)
workdir: Working directory for this command (optional, uses session cwd if not set)
check_interval: Seconds between auto-checks for background processes (gateway only, min 30)
pty: If True, use pseudo-terminal for interactive CLI tools (local backend only)
Returns:
str: JSON string with output, exit_code, and error fields
@@ -1326,24 +1562,28 @@ def terminal_tool(
# Get configuration
config = _get_env_config()
env_type = config["env_type"]
# Select image based on env type
if env_type == "docker":
image = config["docker_image"]
elif env_type == "singularity":
image = config["singularity_image"]
elif env_type == "modal":
image = config["modal_image"]
else:
image = ""
cwd = config["cwd"]
default_timeout = config["timeout"]
effective_timeout = timeout or default_timeout
# Use task_id for environment isolation
effective_task_id = task_id or "default"
# Check per-task overrides (set by environments like TerminalBench2Env)
# before falling back to global env var config
overrides = _task_env_overrides.get(effective_task_id, {})
# Select image based on env type, with per-task override support
if env_type == "docker":
image = overrides.get("docker_image") or config["docker_image"]
elif env_type == "singularity":
image = overrides.get("singularity_image") or config["singularity_image"]
elif env_type == "modal":
image = overrides.get("modal_image") or config["modal_image"]
else:
image = ""
cwd = overrides.get("cwd") or config["cwd"]
default_timeout = config["timeout"]
effective_timeout = timeout or default_timeout
# For local environment in batch mode, create a unique subdirectory per task
# This prevents parallel tasks from overwriting each other's files
# In CLI mode (HERMES_QUIET), use the cwd directly without subdirectories
@@ -1359,68 +1599,86 @@ def terminal_tool(
# Start cleanup thread
_start_cleanup_thread()
# Get or create environment
# Check under lock, but create OUTSIDE lock so we don't block
# other concurrent rollouts during slow Modal/Docker startup
needs_creation = False
# Get or create environment.
# Use a per-task creation lock so concurrent tool calls for the same
# task_id wait for the first one to finish creating the sandbox,
# instead of each creating their own (wasting Modal resources).
with _env_lock:
if effective_task_id not in _active_environments:
needs_creation = True
else:
if effective_task_id in _active_environments:
_last_activity[effective_task_id] = time.time()
env = _active_environments[effective_task_id]
needs_creation = False
else:
needs_creation = True
if needs_creation:
_check_disk_usage_warning()
if not os.getenv("HERMES_QUIET"):
print(f"[Terminal] Creating new {env_type} environment for task {effective_task_id[:8]}...", flush=True)
try:
ssh_config = None
if env_type == "ssh":
ssh_config = {
"host": config.get("ssh_host", ""),
"user": config.get("ssh_user", ""),
"port": config.get("ssh_port", 22),
"key": config.get("ssh_key", ""),
}
# Per-task lock: only one thread creates the sandbox, others wait
with _creation_locks_lock:
if effective_task_id not in _creation_locks:
_creation_locks[effective_task_id] = threading.Lock()
task_lock = _creation_locks[effective_task_id]
new_env = _create_environment(
env_type=env_type,
image=image,
cwd=cwd,
timeout=effective_timeout,
ssh_config=ssh_config
)
except ImportError as e:
return json.dumps({
"output": "",
"exit_code": -1,
"error": f"Terminal tool disabled: mini-swe-agent not available ({e})",
"status": "disabled"
}, ensure_ascii=False)
with task_lock:
# Double-check after acquiring the per-task lock
with _env_lock:
if effective_task_id in _active_environments:
_last_activity[effective_task_id] = time.time()
env = _active_environments[effective_task_id]
needs_creation = False
# Store under lock (brief)
with _env_lock:
if effective_task_id not in _active_environments:
_active_environments[effective_task_id] = new_env
else:
# Another thread created it while we were building -- clean up ours
if needs_creation:
if env_type in ("singularity", "local"):
_check_disk_usage_warning()
if not os.getenv("HERMES_QUIET"):
print(f"[Terminal] Creating new {env_type} environment for task {effective_task_id[:8]}...", flush=True)
try:
if hasattr(new_env, 'stop'):
new_env.stop()
except Exception:
pass
ssh_config = None
if env_type == "ssh":
ssh_config = {
"host": config.get("ssh_host", ""),
"user": config.get("ssh_user", ""),
"port": config.get("ssh_port", 22),
"key": config.get("ssh_key", ""),
}
_last_activity[effective_task_id] = time.time()
env = _active_environments[effective_task_id]
if not os.getenv("HERMES_QUIET"):
print(f"[Terminal] {env_type} environment ready for task {effective_task_id[:8]}", flush=True)
new_env = _create_environment(
env_type=env_type,
image=image,
cwd=cwd,
timeout=effective_timeout,
ssh_config=ssh_config
)
except ImportError as e:
return json.dumps({
"output": "",
"exit_code": -1,
"error": f"Terminal tool disabled: mini-swe-agent not available ({e})",
"status": "disabled"
}, ensure_ascii=False)
with _env_lock:
_active_environments[effective_task_id] = new_env
_last_activity[effective_task_id] = time.time()
env = new_env
if not os.getenv("HERMES_QUIET"):
print(f"[Terminal] {env_type} environment ready for task {effective_task_id[:8]}", flush=True)
# Check for dangerous commands (only for local/ssh in interactive modes)
# Skip check if force=True (user has confirmed they want to run it)
if not force:
approval = _check_dangerous_command(command, env_type)
if not approval["approved"]:
# Check if this is an approval_required (gateway ask mode)
if approval.get("status") == "approval_required":
return json.dumps({
"output": "",
"exit_code": -1,
"error": approval.get("message", "Waiting for user approval"),
"status": "approval_required",
"command": approval.get("command", command),
"description": approval.get("description", "dangerous command"),
"pattern_key": approval.get("pattern_key", ""),
}, ensure_ascii=False)
# Command was blocked - return informative message
return json.dumps({
"output": "",
@@ -1431,20 +1689,69 @@ def terminal_tool(
# Prepare command for execution
if background:
# Run in background with nohup and redirect output
exec_command = f"nohup {command} > /tmp/bg_output.log 2>&1 &"
# Spawn a tracked background process via the process registry.
# For local backends: uses subprocess.Popen with output buffering.
# For non-local backends: runs inside the sandbox via env.execute().
from tools.process_registry import process_registry
session_key = os.getenv("HERMES_SESSION_KEY", "")
effective_cwd = workdir or cwd
try:
result = env.execute(exec_command, timeout=10)
return json.dumps({
"output": "Background task started successfully",
if env_type == "local":
proc_session = process_registry.spawn_local(
command=command,
cwd=effective_cwd,
task_id=effective_task_id,
session_key=session_key,
env_vars=env.env if hasattr(env, 'env') else None,
use_pty=pty,
)
else:
proc_session = process_registry.spawn_via_env(
env=env,
command=command,
cwd=effective_cwd,
task_id=effective_task_id,
session_key=session_key,
)
result_data = {
"output": "Background process started",
"session_id": proc_session.id,
"pid": proc_session.pid,
"exit_code": 0,
"error": None
}, ensure_ascii=False)
"error": None,
}
# Transparent timeout clamping note
max_timeout = effective_timeout
if timeout and timeout > max_timeout:
result_data["timeout_note"] = (
f"Requested timeout {timeout}s was clamped to "
f"configured limit of {max_timeout}s"
)
# Register check_interval watcher (gateway picks this up after agent run)
if check_interval and background:
effective_interval = max(30, check_interval)
if check_interval < 30:
result_data["check_interval_note"] = (
f"Requested {check_interval}s raised to minimum 30s"
)
process_registry.pending_watchers.append({
"session_id": proc_session.id,
"check_interval": effective_interval,
"session_key": session_key,
"platform": os.getenv("HERMES_SESSION_PLATFORM", ""),
"chat_id": os.getenv("HERMES_SESSION_CHAT_ID", ""),
})
return json.dumps(result_data, ensure_ascii=False)
except Exception as e:
return json.dumps({
"output": "",
"exit_code": -1,
"error": f"Failed to start background task: {str(e)}"
"error": f"Failed to start background process: {str(e)}"
}, ensure_ascii=False)
else:
# Run foreground command with retry logic
@@ -1454,7 +1761,10 @@ def terminal_tool(
while retry_count <= max_retries:
try:
result = env.execute(command, timeout=effective_timeout)
execute_kwargs = {"timeout": effective_timeout}
if workdir:
execute_kwargs["cwd"] = workdir
result = env.execute(command, **execute_kwargs)
except Exception as e:
error_str = str(e).lower()
if "timeout" in error_str:
+106
View File
@@ -0,0 +1,106 @@
#!/usr/bin/env python3
"""
Transcription Tools Module
Provides speech-to-text transcription using OpenAI's Whisper API.
Used by the messaging gateway to automatically transcribe voice messages
sent by users on Telegram, Discord, WhatsApp, and Slack.
Supported models:
- whisper-1 (cheapest, good quality)
- gpt-4o-mini-transcribe (better quality, higher cost)
- gpt-4o-transcribe (best quality, highest cost)
Supported input formats: mp3, mp4, mpeg, mpga, m4a, wav, webm, ogg
Usage:
from tools.transcription_tools import transcribe_audio
result = transcribe_audio("/path/to/audio.ogg")
if result["success"]:
print(result["transcript"])
"""
import os
from pathlib import Path
from typing import Optional
# Default STT model -- cheapest and widely available
DEFAULT_STT_MODEL = "whisper-1"
def transcribe_audio(file_path: str, model: Optional[str] = None) -> dict:
"""
Transcribe an audio file using OpenAI's Whisper API.
This function calls the OpenAI Audio Transcriptions endpoint directly
(not via OpenRouter, since Whisper isn't available there).
Args:
file_path: Absolute path to the audio file to transcribe.
model: Whisper model to use. Defaults to config or "whisper-1".
Returns:
dict with keys:
- "success" (bool): Whether transcription succeeded
- "transcript" (str): The transcribed text (empty on failure)
- "error" (str, optional): Error message if success is False
"""
# Use HERMES_OPENAI_API_KEY to avoid interference with the OpenAI SDK's
# auto-detection of OPENAI_API_KEY (which would break OpenRouter calls).
# Falls back to OPENAI_API_KEY for backward compatibility.
api_key = os.getenv("HERMES_OPENAI_API_KEY") or os.getenv("OPENAI_API_KEY")
if not api_key:
return {
"success": False,
"transcript": "",
"error": "HERMES_OPENAI_API_KEY not set",
}
audio_path = Path(file_path)
if not audio_path.is_file():
return {
"success": False,
"transcript": "",
"error": f"Audio file not found: {file_path}",
}
# Use provided model, or fall back to default
if model is None:
model = DEFAULT_STT_MODEL
try:
from openai import OpenAI
client = OpenAI(api_key=api_key)
with open(file_path, "rb") as audio_file:
transcription = client.audio.transcriptions.create(
model=model,
file=audio_file,
response_format="text",
)
# The response is a plain string when response_format="text"
transcript_text = str(transcription).strip()
print(f"[STT] Transcribed {audio_path.name} ({len(transcript_text)} chars)", flush=True)
return {
"success": True,
"transcript": transcript_text,
}
except Exception as e:
print(f"[STT] Transcription error: {e}", flush=True)
return {
"success": False,
"transcript": "",
"error": str(e),
}
def check_stt_requirements() -> bool:
"""Check if OpenAI API key is available for speech-to-text."""
return bool(os.getenv("HERMES_OPENAI_API_KEY") or os.getenv("OPENAI_API_KEY"))
+415
View File
@@ -0,0 +1,415 @@
#!/usr/bin/env python3
"""
Text-to-Speech Tool Module
Supports three TTS providers:
- Edge TTS (default, free, no API key): Microsoft Edge neural voices
- ElevenLabs (premium): High-quality voices, needs ELEVENLABS_API_KEY
- OpenAI TTS: Good quality, needs OPENAI_API_KEY
Output formats:
- Opus (.ogg) for Telegram voice bubbles (requires ffmpeg for Edge TTS)
- MP3 (.mp3) for everything else (CLI, Discord, WhatsApp)
Configuration is loaded from ~/.hermes/config.yaml under the 'tts:' key.
The user chooses the provider and voice; the model just sends text.
Usage:
from tools.tts_tool import text_to_speech_tool, check_tts_requirements
result = text_to_speech_tool(text="Hello world")
"""
import asyncio
import datetime
import json
import os
import shutil
import subprocess
import tempfile
from pathlib import Path
from typing import Dict, Any, Optional
# ---------------------------------------------------------------------------
# Optional imports -- providers degrade gracefully if not installed
# ---------------------------------------------------------------------------
try:
import edge_tts
_HAS_EDGE_TTS = True
except ImportError:
_HAS_EDGE_TTS = False
try:
from elevenlabs.client import ElevenLabs
_HAS_ELEVENLABS = True
except ImportError:
_HAS_ELEVENLABS = False
# openai is a core dependency, but guard anyway
try:
from openai import OpenAI as OpenAIClient
_HAS_OPENAI = True
except ImportError:
_HAS_OPENAI = False
# ===========================================================================
# Defaults
# ===========================================================================
DEFAULT_PROVIDER = "edge"
DEFAULT_EDGE_VOICE = "en-US-AriaNeural"
DEFAULT_ELEVENLABS_VOICE_ID = "pNInz6obpgDQGcFmaJgB" # Adam
DEFAULT_ELEVENLABS_MODEL_ID = "eleven_multilingual_v2"
DEFAULT_OPENAI_MODEL = "gpt-4o-mini-tts"
DEFAULT_OPENAI_VOICE = "alloy"
DEFAULT_OUTPUT_DIR = os.path.expanduser("~/voice-memos")
MAX_TEXT_LENGTH = 4000
# ===========================================================================
# Config loader -- reads tts: section from ~/.hermes/config.yaml
# ===========================================================================
def _load_tts_config() -> Dict[str, Any]:
"""
Load TTS configuration from ~/.hermes/config.yaml.
Returns a dict with provider settings. Falls back to defaults
for any missing fields.
"""
try:
from hermes_cli.config import load_config
config = load_config()
return config.get("tts", {})
except Exception:
return {}
def _get_provider(tts_config: Dict[str, Any]) -> str:
"""Get the configured TTS provider name."""
return tts_config.get("provider", DEFAULT_PROVIDER).lower().strip()
# ===========================================================================
# ffmpeg Opus conversion (Edge TTS MP3 -> OGG Opus for Telegram)
# ===========================================================================
def _has_ffmpeg() -> bool:
"""Check if ffmpeg is available on the system."""
return shutil.which("ffmpeg") is not None
def _convert_to_opus(mp3_path: str) -> Optional[str]:
"""
Convert an MP3 file to OGG Opus format for Telegram voice bubbles.
Args:
mp3_path: Path to the input MP3 file.
Returns:
Path to the .ogg file, or None if conversion fails.
"""
if not _has_ffmpeg():
return None
ogg_path = mp3_path.rsplit(".", 1)[0] + ".ogg"
try:
subprocess.run(
["ffmpeg", "-i", mp3_path, "-acodec", "libopus",
"-ac", "1", "-b:a", "64k", "-vbr", "off", ogg_path, "-y"],
capture_output=True, timeout=30,
)
if os.path.exists(ogg_path) and os.path.getsize(ogg_path) > 0:
return ogg_path
except Exception:
pass
return None
# ===========================================================================
# Provider: Edge TTS (free)
# ===========================================================================
async def _generate_edge_tts(text: str, output_path: str, tts_config: Dict[str, Any]) -> str:
"""
Generate audio using Edge TTS.
Args:
text: Text to convert.
output_path: Where to save the MP3 file.
tts_config: TTS config dict.
Returns:
Path to the saved audio file.
"""
edge_config = tts_config.get("edge", {})
voice = edge_config.get("voice", DEFAULT_EDGE_VOICE)
communicate = edge_tts.Communicate(text, voice)
await communicate.save(output_path)
return output_path
# ===========================================================================
# Provider: ElevenLabs (premium)
# ===========================================================================
def _generate_elevenlabs(text: str, output_path: str, tts_config: Dict[str, Any]) -> str:
"""
Generate audio using ElevenLabs.
Args:
text: Text to convert.
output_path: Where to save the audio file.
tts_config: TTS config dict.
Returns:
Path to the saved audio file.
"""
api_key = os.getenv("ELEVENLABS_API_KEY", "")
if not api_key:
raise ValueError("ELEVENLABS_API_KEY not set. Get one at https://elevenlabs.io/")
el_config = tts_config.get("elevenlabs", {})
voice_id = el_config.get("voice_id", DEFAULT_ELEVENLABS_VOICE_ID)
model_id = el_config.get("model_id", DEFAULT_ELEVENLABS_MODEL_ID)
# Determine output format based on file extension
if output_path.endswith(".ogg"):
output_format = "opus_48000_64"
else:
output_format = "mp3_44100_128"
client = ElevenLabs(api_key=api_key)
audio_generator = client.text_to_speech.convert(
text=text,
voice_id=voice_id,
model_id=model_id,
output_format=output_format,
)
# audio_generator yields chunks -- write them all
with open(output_path, "wb") as f:
for chunk in audio_generator:
f.write(chunk)
return output_path
# ===========================================================================
# Provider: OpenAI TTS
# ===========================================================================
def _generate_openai_tts(text: str, output_path: str, tts_config: Dict[str, Any]) -> str:
"""
Generate audio using OpenAI TTS.
Args:
text: Text to convert.
output_path: Where to save the audio file.
tts_config: TTS config dict.
Returns:
Path to the saved audio file.
"""
api_key = os.getenv("HERMES_OPENAI_API_KEY") or os.getenv("OPENAI_API_KEY", "")
if not api_key:
raise ValueError("HERMES_OPENAI_API_KEY not set. Get one at https://platform.openai.com/api-keys")
oai_config = tts_config.get("openai", {})
model = oai_config.get("model", DEFAULT_OPENAI_MODEL)
voice = oai_config.get("voice", DEFAULT_OPENAI_VOICE)
# Determine response format from extension
if output_path.endswith(".ogg"):
response_format = "opus"
else:
response_format = "mp3"
client = OpenAIClient(api_key=api_key)
response = client.audio.speech.create(
model=model,
voice=voice,
input=text,
response_format=response_format,
)
response.stream_to_file(output_path)
return output_path
# ===========================================================================
# Main tool function
# ===========================================================================
def text_to_speech_tool(
text: str,
output_path: Optional[str] = None,
) -> str:
"""
Convert text to speech audio.
Reads provider/voice config from ~/.hermes/config.yaml (tts: section).
The model sends text; the user configures voice and provider.
On messaging platforms, the returned MEDIA:<path> tag is intercepted
by the send pipeline and delivered as a native voice message.
In CLI mode, the file is saved to ~/voice-memos/.
Args:
text: The text to convert to speech.
output_path: Optional custom save path. Defaults to ~/voice-memos/<timestamp>.mp3
Returns:
str: JSON result with success, file_path, and optionally MEDIA tag.
"""
if not text or not text.strip():
return json.dumps({"success": False, "error": "Text is required"}, ensure_ascii=False)
# Truncate very long text with a warning
if len(text) > MAX_TEXT_LENGTH:
print(f"⚠️ TTS text too long ({len(text)} chars), truncating to {MAX_TEXT_LENGTH}")
text = text[:MAX_TEXT_LENGTH]
tts_config = _load_tts_config()
provider = _get_provider(tts_config)
# Detect platform from gateway env var to choose the best output format.
# Telegram voice bubbles require Opus (.ogg); OpenAI and ElevenLabs can
# produce Opus natively (no ffmpeg needed). Edge TTS always outputs MP3
# and needs ffmpeg for conversion.
platform = os.getenv("HERMES_SESSION_PLATFORM", "").lower()
want_opus = (platform == "telegram")
# Determine output path
if output_path:
file_path = Path(output_path).expanduser()
else:
timestamp = datetime.datetime.now().strftime("%Y%m%d_%H%M%S")
out_dir = Path(DEFAULT_OUTPUT_DIR)
out_dir.mkdir(parents=True, exist_ok=True)
# Use .ogg for Telegram with providers that support native Opus output,
# otherwise fall back to .mp3 (Edge TTS will attempt ffmpeg conversion later).
if want_opus and provider in ("openai", "elevenlabs"):
file_path = out_dir / f"tts_{timestamp}.ogg"
else:
file_path = out_dir / f"tts_{timestamp}.mp3"
# Ensure parent directory exists
file_path.parent.mkdir(parents=True, exist_ok=True)
file_str = str(file_path)
try:
# Generate audio with the configured provider
if provider == "elevenlabs":
if not _HAS_ELEVENLABS:
return json.dumps({
"success": False,
"error": "ElevenLabs provider selected but 'elevenlabs' package not installed. Run: pip install elevenlabs"
}, ensure_ascii=False)
print(f"🔊 Generating speech with ElevenLabs...")
_generate_elevenlabs(text, file_str, tts_config)
elif provider == "openai":
if not _HAS_OPENAI:
return json.dumps({
"success": False,
"error": "OpenAI provider selected but 'openai' package not installed."
}, ensure_ascii=False)
print(f"🔊 Generating speech with OpenAI TTS...")
_generate_openai_tts(text, file_str, tts_config)
else:
# Default: Edge TTS (free)
if not _HAS_EDGE_TTS:
return json.dumps({
"success": False,
"error": "Edge TTS not available. Run: pip install edge-tts"
}, ensure_ascii=False)
print(f"🔊 Generating speech with Edge TTS...")
# Edge TTS is async, run it
try:
loop = asyncio.get_running_loop()
import concurrent.futures
with concurrent.futures.ThreadPoolExecutor(max_workers=1) as pool:
pool.submit(
lambda: asyncio.run(_generate_edge_tts(text, file_str, tts_config))
).result(timeout=60)
except RuntimeError:
asyncio.run(_generate_edge_tts(text, file_str, tts_config))
# Check the file was actually created
if not os.path.exists(file_str) or os.path.getsize(file_str) == 0:
return json.dumps({
"success": False,
"error": f"TTS generation produced no output (provider: {provider})"
}, ensure_ascii=False)
# Try Opus conversion for Telegram compatibility (Edge TTS only outputs MP3)
voice_compatible = False
if provider == "edge" and file_str.endswith(".mp3"):
opus_path = _convert_to_opus(file_str)
if opus_path:
file_str = opus_path
voice_compatible = True
elif provider in ("elevenlabs", "openai"):
# These providers can output Opus natively if the path ends in .ogg
voice_compatible = file_str.endswith(".ogg")
file_size = os.path.getsize(file_str)
print(f"✅ TTS audio saved: {file_str} ({file_size:,} bytes, provider: {provider})")
# Build response with MEDIA tag for platform delivery
media_tag = f"MEDIA:{file_str}"
if voice_compatible:
media_tag = f"[[audio_as_voice]]\n{media_tag}"
return json.dumps({
"success": True,
"file_path": file_str,
"media_tag": media_tag,
"provider": provider,
"voice_compatible": voice_compatible,
}, ensure_ascii=False)
except Exception as e:
error_msg = f"TTS generation failed ({provider}): {e}"
print(f"{error_msg}")
return json.dumps({"success": False, "error": error_msg}, ensure_ascii=False)
# ===========================================================================
# Requirements check
# ===========================================================================
def check_tts_requirements() -> bool:
"""
Check if at least one TTS provider is available.
Edge TTS needs no API key and is the default, so if the package
is installed, TTS is available.
Returns:
bool: True if at least one provider can work.
"""
if _HAS_EDGE_TTS:
return True
if _HAS_ELEVENLABS and os.getenv("ELEVENLABS_API_KEY"):
return True
if _HAS_OPENAI and (os.getenv("HERMES_OPENAI_API_KEY") or os.getenv("OPENAI_API_KEY")):
return True
return False
# ===========================================================================
# Main -- quick diagnostics
# ===========================================================================
if __name__ == "__main__":
print("🔊 Text-to-Speech Tool Module")
print("=" * 50)
print(f"\nProvider availability:")
print(f" Edge TTS: {'✅ installed' if _HAS_EDGE_TTS else '❌ not installed (pip install edge-tts)'}")
print(f" ElevenLabs: {'✅ installed' if _HAS_ELEVENLABS else '❌ not installed (pip install elevenlabs)'}")
print(f" API Key: {'✅ set' if os.getenv('ELEVENLABS_API_KEY') else '❌ not set'}")
print(f" OpenAI: {'✅ installed' if _HAS_OPENAI else '❌ not installed'}")
print(f" API Key: {'✅ set' if (os.getenv('HERMES_OPENAI_API_KEY') or os.getenv('OPENAI_API_KEY')) else '❌ not set'}")
print(f" ffmpeg: {'✅ found' if _has_ffmpeg() else '❌ not found (needed for Telegram Opus)'}")
print(f"\n Output dir: {DEFAULT_OUTPUT_DIR}")
config = _load_tts_config()
provider = _get_provider(config)
print(f" Configured provider: {provider}")
+33 -21
View File
@@ -248,18 +248,19 @@ async def vision_analyze_tool(
model: str = DEFAULT_VISION_MODEL
) -> str:
"""
Analyze an image from a URL using vision AI.
Analyze an image from a URL or local file path using vision AI.
This tool downloads images from URLs, converts them to base64, and processes
them using Gemini 3 Flash Preview via OpenRouter API. The image is downloaded to a
temporary location and automatically cleaned up after processing.
This tool accepts either an HTTP/HTTPS URL or a local file path. For URLs,
it downloads the image first. In both cases, the image is converted to base64
and processed using Gemini 3 Flash Preview via OpenRouter API.
The user_prompt parameter is expected to be pre-formatted by the calling
function (typically model_tools.py) to include both full description
requests and specific questions.
Args:
image_url (str): The URL of the image to analyze (must be http:// or https://)
image_url (str): The URL or local file path of the image to analyze.
Accepts http://, https:// URLs or absolute/relative file paths.
user_prompt (str): The pre-formatted prompt for the vision model
model (str): The vision model to use (default: google/gemini-3-flash-preview)
@@ -274,8 +275,8 @@ async def vision_analyze_tool(
Exception: If download fails, analysis fails, or API key is not set
Note:
- Temporary images are stored in ./temp_vision_images/
- Images are automatically deleted after processing
- For URLs, temporary images are stored in ./temp_vision_images/ and cleaned up
- For local file paths, the file is used directly and NOT deleted
- Supports common image formats (JPEG, PNG, GIF, WebP, etc.)
"""
debug_call_data = {
@@ -292,30 +293,41 @@ async def vision_analyze_tool(
}
temp_image_path = None
# Track whether we should clean up the file after processing.
# Local files (e.g. from the image cache) should NOT be deleted.
should_cleanup = True
try:
print(f"🔍 Analyzing image from URL: {image_url[:60]}{'...' if len(image_url) > 60 else ''}", flush=True)
print(f"🔍 Analyzing image: {image_url[:60]}{'...' if len(image_url) > 60 else ''}", flush=True)
print(f"📝 User prompt: {user_prompt[:100]}{'...' if len(user_prompt) > 100 else ''}", flush=True)
# Validate image URL
if not _validate_image_url(image_url):
raise ValueError("Invalid image URL format. Must start with http:// or https://")
# Check API key availability
if not os.getenv("OPENROUTER_API_KEY"):
raise ValueError("OPENROUTER_API_KEY environment variable not set")
# Download the image to a temporary location
print(f"⬇️ Downloading image from URL...", flush=True)
temp_dir = Path("./temp_vision_images")
temp_image_path = temp_dir / f"temp_image_{uuid.uuid4()}.jpg"
await _download_image(image_url, temp_image_path)
# Determine if this is a local file path or a remote URL
local_path = Path(image_url)
if local_path.is_file():
# Local file path (e.g. from platform image cache) -- skip download
print(f"📁 Using local image file: {image_url}", flush=True)
temp_image_path = local_path
should_cleanup = False # Don't delete cached/local files
elif _validate_image_url(image_url):
# Remote URL -- download to a temporary location
print(f"⬇️ Downloading image from URL...", flush=True)
temp_dir = Path("./temp_vision_images")
temp_image_path = temp_dir / f"temp_image_{uuid.uuid4()}.jpg"
await _download_image(image_url, temp_image_path)
should_cleanup = True
else:
raise ValueError(
"Invalid image source. Provide an HTTP/HTTPS URL or a valid local file path."
)
# Get image file size for logging
image_size_bytes = temp_image_path.stat().st_size
image_size_kb = image_size_bytes / 1024
print(f"✅ Image downloaded successfully ({image_size_kb:.1f} KB)", flush=True)
print(f"✅ Image ready ({image_size_kb:.1f} KB)", flush=True)
# Convert image to base64 data URL
print(f"🔄 Converting image to base64...", flush=True)
@@ -402,8 +414,8 @@ async def vision_analyze_tool(
return json.dumps(result, indent=2, ensure_ascii=False)
finally:
# Clean up temporary image file
if temp_image_path and temp_image_path.exists():
# Clean up temporary image file (but NOT local/cached files)
if should_cleanup and temp_image_path and temp_image_path.exists():
try:
temp_image_path.unlink()
print(f"🧹 Cleaned up temporary image file", flush=True)
+78 -23
View File
@@ -56,8 +56,8 @@ TOOLSETS = {
},
"terminal": {
"description": "Terminal/command execution tools",
"tools": ["terminal"],
"description": "Terminal/command execution and process management tools",
"tools": ["terminal", "process"],
"includes": []
},
@@ -69,7 +69,7 @@ TOOLSETS = {
"skills": {
"description": "Access skill documents with specialized instructions and knowledge",
"tools": ["skills_categories", "skills_list", "skill_view"],
"tools": ["skills_list", "skill_view"],
"includes": []
},
@@ -108,11 +108,17 @@ TOOLSETS = {
"includes": []
},
"tts": {
"description": "Text-to-speech: convert text to audio with Edge TTS (free), ElevenLabs, or OpenAI",
"tools": ["text_to_speech"],
"includes": []
},
# Scenario-specific toolsets
"debugging": {
"description": "Debugging and troubleshooting toolkit",
"tools": ["terminal"],
"tools": ["terminal", "process"],
"includes": ["web", "file"] # For searching error messages and solutions, and file operations
},
@@ -131,8 +137,8 @@ TOOLSETS = {
"tools": [
# Web tools
"web_search", "web_extract",
# Terminal
"terminal",
# Terminal + process management
"terminal", "process",
# File manipulation
"read_file", "write_file", "patch", "search",
# Vision
@@ -142,12 +148,14 @@ TOOLSETS = {
# MoA
"mixture_of_agents",
# Skills
"skills_categories", "skills_list", "skill_view",
"skills_list", "skill_view",
# Browser
"browser_navigate", "browser_snapshot", "browser_click",
"browser_type", "browser_scroll", "browser_back",
"browser_press", "browser_close", "browser_get_images",
"browser_vision",
# Text-to-speech
"text_to_speech",
# Cronjob management (CLI-only)
"schedule_cronjob", "list_cronjobs", "remove_cronjob"
],
@@ -161,33 +169,49 @@ TOOLSETS = {
"hermes-telegram": {
"description": "Telegram bot toolset - full access for personal use (terminal has safety checks)",
"tools": [
# Terminal - enabled with dangerous command approval system
"terminal",
# Terminal + process management
"terminal", "process",
# File manipulation
"read_file", "write_file", "patch", "search",
# Web tools
"web_search", "web_extract",
# Vision - analyze images sent by users
"vision_analyze",
# Image generation
"image_generate",
# Text-to-speech
"text_to_speech",
# Skills - access knowledge base
"skills_categories", "skills_list", "skill_view",
"skills_list", "skill_view",
# Cronjob management - let users schedule tasks
"schedule_cronjob", "list_cronjobs", "remove_cronjob"
"schedule_cronjob", "list_cronjobs", "remove_cronjob",
# Cross-channel messaging
"send_message"
],
"includes": []
},
"hermes-discord": {
"description": "Discord bot toolset - limited for public server safety (no terminal, no file access)",
"description": "Discord bot toolset - full access (terminal has safety checks via dangerous command approval)",
"tools": [
# Web tools - safe for messaging
"web_search",
# Vision - analyze images
# Terminal + process management
"terminal", "process",
# File manipulation
"read_file", "write_file", "patch", "search",
# Web tools
"web_search", "web_extract",
# Vision - analyze images sent by users
"vision_analyze",
# Image generation
"image_generate",
# Text-to-speech
"text_to_speech",
# Skills - access knowledge base
"skills_categories", "skills_list", "skill_view",
# Cronjob - let users schedule reminders
"schedule_cronjob", "list_cronjobs", "remove_cronjob"
"skills_list", "skill_view",
# Cronjob management - let users schedule tasks
"schedule_cronjob", "list_cronjobs", "remove_cronjob",
# Cross-channel messaging
"send_message"
],
"includes": []
},
@@ -197,16 +221,47 @@ TOOLSETS = {
"tools": [
# Web tools
"web_search", "web_extract",
# Terminal - only for trusted personal accounts
"terminal",
# Terminal + process management
"terminal", "process",
# File manipulation
"read_file", "write_file", "patch", "search",
# Vision
"vision_analyze",
# Image generation
"image_generate",
# Text-to-speech
"text_to_speech",
# Skills
"skills_categories", "skills_list", "skill_view",
"skills_list", "skill_view",
# Cronjob management
"schedule_cronjob", "list_cronjobs", "remove_cronjob"
"schedule_cronjob", "list_cronjobs", "remove_cronjob",
# Cross-channel messaging
"send_message"
],
"includes": []
},
"hermes-slack": {
"description": "Slack bot toolset - full access for workspace use (terminal has safety checks)",
"tools": [
# Terminal + process management
"terminal", "process",
# File manipulation
"read_file", "write_file", "patch", "search",
# Web tools
"web_search", "web_extract",
# Vision - analyze images sent by users
"vision_analyze",
# Image generation
"image_generate",
# Text-to-speech
"text_to_speech",
# Skills - access knowledge base
"skills_list", "skill_view",
# Cronjob management - let users schedule tasks
"schedule_cronjob", "list_cronjobs", "remove_cronjob",
# Cross-channel messaging
"send_message"
],
"includes": []
},
@@ -214,7 +269,7 @@ TOOLSETS = {
"hermes-gateway": {
"description": "Gateway toolset - union of all messaging platform tools",
"tools": [],
"includes": ["hermes-telegram", "hermes-discord", "hermes-whatsapp"]
"includes": ["hermes-telegram", "hermes-discord", "hermes-whatsapp", "hermes-slack"]
}
}