Compare commits
101 Commits
| Author | SHA1 | Date | |
|---|---|---|---|
| ab6abc2c13 | |||
| aafe86d81a | |||
| 1aa7027be1 | |||
| f961937097 | |||
| 7a427d7b03 | |||
| 66a1942524 | |||
| 1173adbe86 | |||
| a5beb6d8f0 | |||
| 0e3b7b6a39 | |||
| 5e705bc31b | |||
| 55ce601502 | |||
| 8f6ecd5c64 | |||
| a51a767407 | |||
| 2ea4dd30c6 | |||
| 80e578d3e3 | |||
| c52353cf8a | |||
| d76ebf0ec3 | |||
| 4be5070427 | |||
| e140c02d51 | |||
| 88643a1ba9 | |||
| b7b585656b | |||
| 4494c0b033 | |||
| aa6416399e | |||
| b313751acf | |||
| b1d05dfe8b | |||
| f8899af113 | |||
| cf29cba084 | |||
| ec9b868aea | |||
| 3ec6c71e43 | |||
| 4ad0083118 | |||
| 1055d4356a | |||
| 5822711ae6 | |||
| b19f5133c3 | |||
| 471ea81a7d | |||
| b1832faaae | |||
| 3a9a1bbb84 | |||
| d8081790f3 | |||
| 493bf8db7e | |||
| d9eba2a44f | |||
| fc061c2fee | |||
| aaa96713d4 | |||
| 02954c1a10 | |||
| 4355f30422 | |||
| 2f07df3177 | |||
| 672e9752a0 | |||
| df0f684c34 | |||
| 21afa134f0 | |||
| 6bcec1ac25 | |||
| fe331ed9bd | |||
| 746abf5e28 | |||
| 4d2c93a04f | |||
| 3959e3cadb | |||
| ec5fdb8b92 | |||
| c030ac1d85 | |||
| d223f7388d | |||
| 816d1344ee | |||
| 4c0c7f4c6e | |||
| 04b6ecadc4 | |||
| e84d952dc0 | |||
| 388130a122 | |||
| bb59057d5d | |||
| 36a4481152 | |||
| efa753678c | |||
| 7f3a567259 | |||
| defbe0f9e9 | |||
| 18862145e4 | |||
| 35558dadf4 | |||
| ae8059ca24 | |||
| 116984feb7 | |||
| 219af75704 | |||
| d76fa7fc37 | |||
| 7b6d14e62a | |||
| 67d707e851 | |||
| e648863d52 | |||
| a7cc1cf309 | |||
| f24db23458 | |||
| d132e344d7 | |||
| 22f41daded | |||
| 7c7feaa033 | |||
| 2f80bd9f87 | |||
| 23e5e8dde9 | |||
| e99aca98ab | |||
| 7e30e97a59 | |||
| db4dfea7ec | |||
| 17254a7692 | |||
| adf188c439 | |||
| 21958a55d1 | |||
| 947827bba0 | |||
| e4a3ffa9c1 | |||
| 1fa3737134 | |||
| e7844e9c8d | |||
| 1c761ae042 | |||
| 56ca84f243 | |||
| 04101bc59e | |||
| 0a247a50f2 | |||
| 0e2714acea | |||
| 36921a3e98 | |||
| 21c45ba0ac | |||
| 8422196e89 | |||
| a8132d1252 | |||
| 0c392e7a87 |
@@ -5,7 +5,7 @@ Instructions for AI coding assistants and developers working on the hermes-agent
|
||||
## Development Environment
|
||||
|
||||
```bash
|
||||
source .venv/bin/activate # ALWAYS activate before running Python
|
||||
source venv/bin/activate # ALWAYS activate before running Python
|
||||
```
|
||||
|
||||
## Project Structure
|
||||
@@ -23,6 +23,7 @@ hermes-agent/
|
||||
│ ├── prompt_caching.py # Anthropic prompt caching
|
||||
│ ├── auxiliary_client.py # Auxiliary LLM client (vision, summarization)
|
||||
│ ├── model_metadata.py # Model context lengths, token estimation
|
||||
│ ├── models_dev.py # models.dev registry integration (provider-aware context)
|
||||
│ ├── display.py # KawaiiSpinner, tool preview formatting
|
||||
│ ├── skill_commands.py # Skill slash commands (shared CLI/gateway)
|
||||
│ └── trajectory.py # Trajectory saving helpers
|
||||
@@ -366,6 +367,9 @@ Leaks as literal `?[K` text under `prompt_toolkit`'s `patch_stdout`. Use space-p
|
||||
### `_last_resolved_tool_names` is a process-global in `model_tools.py`
|
||||
`_run_single_child()` in `delegate_tool.py` saves and restores this global around subagent execution. If you add new code that reads this global, be aware it may be temporarily stale during child agent runs.
|
||||
|
||||
### DO NOT hardcode cross-tool references in schema descriptions
|
||||
Tool schema descriptions must not mention tools from other toolsets by name (e.g., `browser_navigate` saying "prefer web_search"). Those tools may be unavailable (missing API keys, disabled toolset), causing the model to hallucinate calls to non-existent tools. If a cross-reference is needed, add it dynamically in `get_tool_definitions()` in `model_tools.py` — see the `browser_navigate` / `execute_code` post-processing blocks for the pattern.
|
||||
|
||||
### Tests must not write to `~/.hermes/`
|
||||
The `_isolate_hermes_home` autouse fixture in `tests/conftest.py` redirects `HERMES_HOME` to a temp dir. Never hardcode `~/.hermes/` paths in tests.
|
||||
|
||||
@@ -374,7 +378,7 @@ The `_isolate_hermes_home` autouse fixture in `tests/conftest.py` redirects `HER
|
||||
## Testing
|
||||
|
||||
```bash
|
||||
source .venv/bin/activate
|
||||
source venv/bin/activate
|
||||
python -m pytest tests/ -q # Full suite (~3000 tests, ~3 min)
|
||||
python -m pytest tests/test_model_tools.py -q # Toolset resolution
|
||||
python -m pytest tests/test_cli_init.py -q # CLI config loading
|
||||
|
||||
@@ -146,8 +146,8 @@ git clone https://github.com/NousResearch/hermes-agent.git
|
||||
cd hermes-agent
|
||||
git submodule update --init mini-swe-agent # required terminal backend
|
||||
curl -LsSf https://astral.sh/uv/install.sh | sh
|
||||
uv venv .venv --python 3.11
|
||||
source .venv/bin/activate
|
||||
uv venv venv --python 3.11
|
||||
source venv/bin/activate
|
||||
uv pip install -e ".[all,dev]"
|
||||
uv pip install -e "./mini-swe-agent"
|
||||
python -m pytest tests/ -q
|
||||
|
||||
@@ -304,6 +304,8 @@ class HermesACPAgent(acp.Agent):
|
||||
|
||||
if result.get("messages"):
|
||||
state.history = result["messages"]
|
||||
# Persist updated history so sessions survive process restarts.
|
||||
self.session_manager.save_session(session_id)
|
||||
|
||||
final_response = result.get("final_response", "")
|
||||
if final_response and conn:
|
||||
@@ -400,6 +402,7 @@ class HermesACPAgent(acp.Agent):
|
||||
cwd=state.cwd,
|
||||
model=new_model,
|
||||
)
|
||||
self.session_manager.save_session(state.session_id)
|
||||
provider_label = target_provider or getattr(state.agent, "provider", "auto")
|
||||
logger.info("Session %s: model switched to %s", state.session_id, new_model)
|
||||
return f"Model switched to: {new_model}\nProvider: {provider_label}"
|
||||
@@ -444,6 +447,7 @@ class HermesACPAgent(acp.Agent):
|
||||
|
||||
def _cmd_reset(self, args: str, state: SessionState) -> str:
|
||||
state.history.clear()
|
||||
self.session_manager.save_session(state.session_id)
|
||||
return "Conversation history cleared."
|
||||
|
||||
def _cmd_compact(self, args: str, state: SessionState) -> str:
|
||||
@@ -453,6 +457,7 @@ class HermesACPAgent(acp.Agent):
|
||||
agent = state.agent
|
||||
if hasattr(agent, "compress_context"):
|
||||
agent.compress_context(state.history)
|
||||
self.session_manager.save_session(state.session_id)
|
||||
return f"Context compressed. Messages: {len(state.history)}"
|
||||
return "Context compression not available for this agent."
|
||||
except Exception as e:
|
||||
@@ -475,5 +480,6 @@ class HermesACPAgent(acp.Agent):
|
||||
cwd=state.cwd,
|
||||
model=model_id,
|
||||
)
|
||||
self.session_manager.save_session(session_id)
|
||||
logger.info("Session %s: model switched to %s", session_id, model_id)
|
||||
return None
|
||||
|
||||
+262
-34
@@ -1,7 +1,15 @@
|
||||
"""ACP session manager — maps ACP sessions to Hermes AIAgent instances."""
|
||||
"""ACP session manager — maps ACP sessions to Hermes AIAgent instances.
|
||||
|
||||
Sessions are persisted to the shared SessionDB (``~/.hermes/state.db``) so they
|
||||
survive process restarts and appear in ``session_search``. When the editor
|
||||
reconnects after idle/restart, the ``load_session`` / ``resume_session`` calls
|
||||
find the persisted session in the database and restore the full conversation
|
||||
history.
|
||||
"""
|
||||
from __future__ import annotations
|
||||
|
||||
import copy
|
||||
import json
|
||||
import logging
|
||||
import uuid
|
||||
from dataclasses import dataclass, field
|
||||
@@ -46,18 +54,26 @@ class SessionState:
|
||||
|
||||
|
||||
class SessionManager:
|
||||
"""Thread-safe manager for ACP sessions backed by Hermes AIAgent instances."""
|
||||
"""Thread-safe manager for ACP sessions backed by Hermes AIAgent instances.
|
||||
|
||||
def __init__(self, agent_factory=None):
|
||||
Sessions are held in-memory for fast access **and** persisted to the
|
||||
shared SessionDB so they survive process restarts and are searchable
|
||||
via ``session_search``.
|
||||
"""
|
||||
|
||||
def __init__(self, agent_factory=None, db=None):
|
||||
"""
|
||||
Args:
|
||||
agent_factory: Optional callable that creates an AIAgent-like object.
|
||||
Used by tests. When omitted, a real AIAgent is created
|
||||
using the current Hermes runtime provider configuration.
|
||||
db: Optional SessionDB instance. When omitted, the default
|
||||
SessionDB (``~/.hermes/state.db``) is lazily created.
|
||||
"""
|
||||
self._sessions: Dict[str, SessionState] = {}
|
||||
self._lock = Lock()
|
||||
self._agent_factory = agent_factory
|
||||
self._db_instance = db # None → lazy-init on first use
|
||||
|
||||
# ---- public API ---------------------------------------------------------
|
||||
|
||||
@@ -77,54 +93,67 @@ class SessionManager:
|
||||
with self._lock:
|
||||
self._sessions[session_id] = state
|
||||
_register_task_cwd(session_id, cwd)
|
||||
self._persist(state)
|
||||
logger.info("Created ACP session %s (cwd=%s)", session_id, cwd)
|
||||
return state
|
||||
|
||||
def get_session(self, session_id: str) -> Optional[SessionState]:
|
||||
"""Return the session for *session_id*, or ``None``."""
|
||||
"""Return the session for *session_id*, or ``None``.
|
||||
|
||||
If the session is not in memory but exists in the database (e.g. after
|
||||
a process restart), it is transparently restored.
|
||||
"""
|
||||
with self._lock:
|
||||
return self._sessions.get(session_id)
|
||||
state = self._sessions.get(session_id)
|
||||
if state is not None:
|
||||
return state
|
||||
# Attempt to restore from database.
|
||||
return self._restore(session_id)
|
||||
|
||||
def remove_session(self, session_id: str) -> bool:
|
||||
"""Remove a session. Returns True if it existed."""
|
||||
"""Remove a session from memory and database. Returns True if it existed."""
|
||||
with self._lock:
|
||||
existed = self._sessions.pop(session_id, None) is not None
|
||||
if existed:
|
||||
db_existed = self._delete_persisted(session_id)
|
||||
if existed or db_existed:
|
||||
_clear_task_cwd(session_id)
|
||||
return existed
|
||||
return existed or db_existed
|
||||
|
||||
def fork_session(self, session_id: str, cwd: str = ".") -> Optional[SessionState]:
|
||||
"""Deep-copy a session's history into a new session."""
|
||||
import threading
|
||||
|
||||
with self._lock:
|
||||
original = self._sessions.get(session_id)
|
||||
if original is None:
|
||||
return None
|
||||
original = self.get_session(session_id) # checks DB too
|
||||
if original is None:
|
||||
return None
|
||||
|
||||
new_id = str(uuid.uuid4())
|
||||
agent = self._make_agent(
|
||||
session_id=new_id,
|
||||
cwd=cwd,
|
||||
model=original.model or None,
|
||||
)
|
||||
state = SessionState(
|
||||
session_id=new_id,
|
||||
agent=agent,
|
||||
cwd=cwd,
|
||||
model=getattr(agent, "model", original.model) or original.model,
|
||||
history=copy.deepcopy(original.history),
|
||||
cancel_event=threading.Event(),
|
||||
)
|
||||
new_id = str(uuid.uuid4())
|
||||
agent = self._make_agent(
|
||||
session_id=new_id,
|
||||
cwd=cwd,
|
||||
model=original.model or None,
|
||||
)
|
||||
state = SessionState(
|
||||
session_id=new_id,
|
||||
agent=agent,
|
||||
cwd=cwd,
|
||||
model=getattr(agent, "model", original.model) or original.model,
|
||||
history=copy.deepcopy(original.history),
|
||||
cancel_event=threading.Event(),
|
||||
)
|
||||
with self._lock:
|
||||
self._sessions[new_id] = state
|
||||
_register_task_cwd(new_id, cwd)
|
||||
self._persist(state)
|
||||
logger.info("Forked ACP session %s -> %s", session_id, new_id)
|
||||
return state
|
||||
|
||||
def list_sessions(self) -> List[Dict[str, Any]]:
|
||||
"""Return lightweight info dicts for all sessions."""
|
||||
"""Return lightweight info dicts for all sessions (memory + database)."""
|
||||
# Collect in-memory sessions first.
|
||||
with self._lock:
|
||||
return [
|
||||
seen_ids = set(self._sessions.keys())
|
||||
results = [
|
||||
{
|
||||
"session_id": s.session_id,
|
||||
"cwd": s.cwd,
|
||||
@@ -134,23 +163,220 @@ class SessionManager:
|
||||
for s in self._sessions.values()
|
||||
]
|
||||
|
||||
# Merge any persisted sessions not currently in memory.
|
||||
db = self._get_db()
|
||||
if db is not None:
|
||||
try:
|
||||
rows = db.search_sessions(source="acp", limit=1000)
|
||||
for row in rows:
|
||||
sid = row["id"]
|
||||
if sid in seen_ids:
|
||||
continue
|
||||
# Extract cwd from model_config JSON.
|
||||
cwd = "."
|
||||
mc = row.get("model_config")
|
||||
if mc:
|
||||
try:
|
||||
cwd = json.loads(mc).get("cwd", ".")
|
||||
except (json.JSONDecodeError, TypeError):
|
||||
pass
|
||||
results.append({
|
||||
"session_id": sid,
|
||||
"cwd": cwd,
|
||||
"model": row.get("model") or "",
|
||||
"history_len": row.get("message_count") or 0,
|
||||
})
|
||||
except Exception:
|
||||
logger.debug("Failed to list ACP sessions from DB", exc_info=True)
|
||||
|
||||
return results
|
||||
|
||||
def update_cwd(self, session_id: str, cwd: str) -> Optional[SessionState]:
|
||||
"""Update the working directory for a session and its tool overrides."""
|
||||
with self._lock:
|
||||
state = self._sessions.get(session_id)
|
||||
if state is None:
|
||||
return None
|
||||
state.cwd = cwd
|
||||
state = self.get_session(session_id) # checks DB too
|
||||
if state is None:
|
||||
return None
|
||||
state.cwd = cwd
|
||||
_register_task_cwd(session_id, cwd)
|
||||
self._persist(state)
|
||||
return state
|
||||
|
||||
def cleanup(self) -> None:
|
||||
"""Remove all sessions and clear task-specific cwd overrides."""
|
||||
"""Remove all sessions (memory and database) and clear task-specific cwd overrides."""
|
||||
with self._lock:
|
||||
session_ids = list(self._sessions.keys())
|
||||
self._sessions.clear()
|
||||
for session_id in session_ids:
|
||||
_clear_task_cwd(session_id)
|
||||
self._delete_persisted(session_id)
|
||||
# Also remove any DB-only ACP sessions not currently in memory.
|
||||
db = self._get_db()
|
||||
if db is not None:
|
||||
try:
|
||||
rows = db.search_sessions(source="acp", limit=10000)
|
||||
for row in rows:
|
||||
sid = row["id"]
|
||||
_clear_task_cwd(sid)
|
||||
db.delete_session(sid)
|
||||
except Exception:
|
||||
logger.debug("Failed to cleanup ACP sessions from DB", exc_info=True)
|
||||
|
||||
def save_session(self, session_id: str) -> None:
|
||||
"""Persist the current state of a session to the database.
|
||||
|
||||
Called by the server after prompt completion, slash commands that
|
||||
mutate history, and model switches.
|
||||
"""
|
||||
with self._lock:
|
||||
state = self._sessions.get(session_id)
|
||||
if state is not None:
|
||||
self._persist(state)
|
||||
|
||||
# ---- persistence via SessionDB ------------------------------------------
|
||||
|
||||
def _get_db(self):
|
||||
"""Lazily initialise and return the SessionDB instance.
|
||||
|
||||
Returns ``None`` if the DB is unavailable (e.g. import error in a
|
||||
minimal test environment).
|
||||
|
||||
Note: we resolve ``HERMES_HOME`` dynamically rather than relying on
|
||||
the module-level ``DEFAULT_DB_PATH`` constant, because that constant
|
||||
is evaluated at import time and won't reflect env-var changes made
|
||||
later (e.g. by the test fixture ``_isolate_hermes_home``).
|
||||
"""
|
||||
if self._db_instance is not None:
|
||||
return self._db_instance
|
||||
try:
|
||||
import os
|
||||
from pathlib import Path
|
||||
from hermes_state import SessionDB
|
||||
hermes_home = Path(os.getenv("HERMES_HOME", Path.home() / ".hermes"))
|
||||
self._db_instance = SessionDB(db_path=hermes_home / "state.db")
|
||||
return self._db_instance
|
||||
except Exception:
|
||||
logger.debug("SessionDB unavailable for ACP persistence", exc_info=True)
|
||||
return None
|
||||
|
||||
def _persist(self, state: SessionState) -> None:
|
||||
"""Write session state to the database.
|
||||
|
||||
Creates the session record if it doesn't exist, then replaces all
|
||||
stored messages with the current in-memory history.
|
||||
"""
|
||||
db = self._get_db()
|
||||
if db is None:
|
||||
return
|
||||
|
||||
# Ensure model is a plain string (not a MagicMock or other proxy).
|
||||
model_str = str(state.model) if state.model else None
|
||||
cwd_json = json.dumps({"cwd": state.cwd})
|
||||
|
||||
try:
|
||||
# Ensure the session record exists.
|
||||
existing = db.get_session(state.session_id)
|
||||
if existing is None:
|
||||
db.create_session(
|
||||
session_id=state.session_id,
|
||||
source="acp",
|
||||
model=model_str,
|
||||
model_config={"cwd": state.cwd},
|
||||
)
|
||||
else:
|
||||
# Update model_config (contains cwd) if changed.
|
||||
try:
|
||||
with db._lock:
|
||||
db._conn.execute(
|
||||
"UPDATE sessions SET model_config = ?, model = COALESCE(?, model) WHERE id = ?",
|
||||
(cwd_json, model_str, state.session_id),
|
||||
)
|
||||
db._conn.commit()
|
||||
except Exception:
|
||||
logger.debug("Failed to update ACP session metadata", exc_info=True)
|
||||
|
||||
# Replace stored messages with current history.
|
||||
db.clear_messages(state.session_id)
|
||||
for msg in state.history:
|
||||
db.append_message(
|
||||
session_id=state.session_id,
|
||||
role=msg.get("role", "user"),
|
||||
content=msg.get("content"),
|
||||
tool_name=msg.get("tool_name") or msg.get("name"),
|
||||
tool_calls=msg.get("tool_calls"),
|
||||
tool_call_id=msg.get("tool_call_id"),
|
||||
)
|
||||
except Exception:
|
||||
logger.warning("Failed to persist ACP session %s", state.session_id, exc_info=True)
|
||||
|
||||
def _restore(self, session_id: str) -> Optional[SessionState]:
|
||||
"""Load a session from the database into memory, recreating the AIAgent."""
|
||||
import threading
|
||||
|
||||
db = self._get_db()
|
||||
if db is None:
|
||||
return None
|
||||
|
||||
try:
|
||||
row = db.get_session(session_id)
|
||||
except Exception:
|
||||
logger.debug("Failed to query DB for ACP session %s", session_id, exc_info=True)
|
||||
return None
|
||||
|
||||
if row is None:
|
||||
return None
|
||||
|
||||
# Only restore ACP sessions.
|
||||
if row.get("source") != "acp":
|
||||
return None
|
||||
|
||||
# Extract cwd from model_config.
|
||||
cwd = "."
|
||||
mc = row.get("model_config")
|
||||
if mc:
|
||||
try:
|
||||
cwd = json.loads(mc).get("cwd", ".")
|
||||
except (json.JSONDecodeError, TypeError):
|
||||
pass
|
||||
|
||||
model = row.get("model") or None
|
||||
|
||||
# Load conversation history.
|
||||
try:
|
||||
history = db.get_messages_as_conversation(session_id)
|
||||
except Exception:
|
||||
logger.warning("Failed to load messages for ACP session %s", session_id, exc_info=True)
|
||||
history = []
|
||||
|
||||
try:
|
||||
agent = self._make_agent(session_id=session_id, cwd=cwd, model=model)
|
||||
except Exception:
|
||||
logger.warning("Failed to recreate agent for ACP session %s", session_id, exc_info=True)
|
||||
return None
|
||||
|
||||
state = SessionState(
|
||||
session_id=session_id,
|
||||
agent=agent,
|
||||
cwd=cwd,
|
||||
model=model or getattr(agent, "model", "") or "",
|
||||
history=history,
|
||||
cancel_event=threading.Event(),
|
||||
)
|
||||
with self._lock:
|
||||
self._sessions[session_id] = state
|
||||
_register_task_cwd(session_id, cwd)
|
||||
logger.info("Restored ACP session %s from DB (%d messages)", session_id, len(history))
|
||||
return state
|
||||
|
||||
def _delete_persisted(self, session_id: str) -> bool:
|
||||
"""Delete a session from the database. Returns True if it existed."""
|
||||
db = self._get_db()
|
||||
if db is None:
|
||||
return False
|
||||
try:
|
||||
return db.delete_session(session_id)
|
||||
except Exception:
|
||||
logger.debug("Failed to delete ACP session %s from DB", session_id, exc_info=True)
|
||||
return False
|
||||
|
||||
# ---- internal -----------------------------------------------------------
|
||||
|
||||
@@ -194,6 +420,8 @@ class SessionManager:
|
||||
"api_mode": runtime.get("api_mode"),
|
||||
"base_url": runtime.get("base_url"),
|
||||
"api_key": runtime.get("api_key"),
|
||||
"command": runtime.get("command"),
|
||||
"args": list(runtime.get("args") or []),
|
||||
}
|
||||
)
|
||||
except Exception:
|
||||
|
||||
@@ -935,6 +935,26 @@ def convert_messages_to_anthropic(
|
||||
if not m["content"]:
|
||||
m["content"] = [{"type": "text", "text": "(tool call removed)"}]
|
||||
|
||||
# Strip orphaned tool_result blocks (no matching tool_use precedes them).
|
||||
# This is the mirror of the above: context compression or session truncation
|
||||
# can remove an assistant message containing a tool_use while leaving the
|
||||
# subsequent tool_result intact. Anthropic rejects these with a 400.
|
||||
tool_use_ids = set()
|
||||
for m in result:
|
||||
if m["role"] == "assistant" and isinstance(m["content"], list):
|
||||
for block in m["content"]:
|
||||
if block.get("type") == "tool_use":
|
||||
tool_use_ids.add(block.get("id"))
|
||||
for m in result:
|
||||
if m["role"] == "user" and isinstance(m["content"], list):
|
||||
m["content"] = [
|
||||
b
|
||||
for b in m["content"]
|
||||
if b.get("type") != "tool_result" or b.get("tool_use_id") in tool_use_ids
|
||||
]
|
||||
if not m["content"]:
|
||||
m["content"] = [{"type": "text", "text": "(tool result removed)"}]
|
||||
|
||||
# Enforce strict role alternation (Anthropic rejects consecutive same-role messages)
|
||||
fixed = []
|
||||
for m in result:
|
||||
|
||||
+68
-46
@@ -480,11 +480,11 @@ def _read_codex_access_token() -> Optional[str]:
|
||||
def _resolve_api_key_provider() -> Tuple[Optional[OpenAI], Optional[str]]:
|
||||
"""Try each API-key provider in PROVIDER_REGISTRY order.
|
||||
|
||||
Returns (client, model) for the first provider whose env var is set,
|
||||
or (None, None) if none are configured.
|
||||
Returns (client, model) for the first provider with usable runtime
|
||||
credentials, or (None, None) if none are configured.
|
||||
"""
|
||||
try:
|
||||
from hermes_cli.auth import PROVIDER_REGISTRY
|
||||
from hermes_cli.auth import PROVIDER_REGISTRY, resolve_api_key_provider_credentials
|
||||
except ImportError:
|
||||
logger.debug("Could not import PROVIDER_REGISTRY for API-key fallback")
|
||||
return None, None
|
||||
@@ -492,34 +492,24 @@ def _resolve_api_key_provider() -> Tuple[Optional[OpenAI], Optional[str]]:
|
||||
for provider_id, pconfig in PROVIDER_REGISTRY.items():
|
||||
if pconfig.auth_type != "api_key":
|
||||
continue
|
||||
# Check if any of the provider's env vars are set
|
||||
api_key = ""
|
||||
for env_var in pconfig.api_key_env_vars:
|
||||
val = os.getenv(env_var, "").strip()
|
||||
if val:
|
||||
api_key = val
|
||||
break
|
||||
if not api_key:
|
||||
continue
|
||||
if provider_id == "anthropic":
|
||||
return _try_anthropic()
|
||||
|
||||
# Resolve base URL (with optional env-var override)
|
||||
# Kimi Code keys (sk-kimi-) need api.kimi.com/coding/v1
|
||||
env_url = ""
|
||||
if pconfig.base_url_env_var:
|
||||
env_url = os.getenv(pconfig.base_url_env_var, "").strip()
|
||||
if env_url:
|
||||
base_url = env_url.rstrip("/")
|
||||
elif provider_id == "kimi-coding" and api_key.startswith("sk-kimi-"):
|
||||
base_url = "https://api.kimi.com/coding/v1"
|
||||
else:
|
||||
base_url = pconfig.inference_base_url
|
||||
creds = resolve_api_key_provider_credentials(provider_id)
|
||||
api_key = str(creds.get("api_key", "")).strip()
|
||||
if not api_key:
|
||||
continue
|
||||
|
||||
base_url = str(creds.get("base_url", "")).strip().rstrip("/") or pconfig.inference_base_url
|
||||
model = _API_KEY_PROVIDER_AUX_MODELS.get(provider_id, "default")
|
||||
logger.debug("Auxiliary text client: %s (%s)", pconfig.name, model)
|
||||
extra = {}
|
||||
if "api.kimi.com" in base_url.lower():
|
||||
extra["default_headers"] = {"User-Agent": "KimiCLI/1.0"}
|
||||
elif "api.githubcopilot.com" in base_url.lower():
|
||||
from hermes_cli.models import copilot_default_headers
|
||||
|
||||
extra["default_headers"] = copilot_default_headers()
|
||||
return OpenAI(api_key=api_key, base_url=base_url, **extra), model
|
||||
|
||||
return None, None
|
||||
@@ -664,10 +654,23 @@ def _try_anthropic() -> Tuple[Optional[Any], Optional[str]]:
|
||||
if not token:
|
||||
return None, None
|
||||
|
||||
# Allow base URL override from config.yaml model.base_url
|
||||
base_url = _ANTHROPIC_DEFAULT_BASE_URL
|
||||
try:
|
||||
from hermes_cli.config import load_config
|
||||
cfg = load_config()
|
||||
model_cfg = cfg.get("model")
|
||||
if isinstance(model_cfg, dict):
|
||||
cfg_base_url = (model_cfg.get("base_url") or "").strip().rstrip("/")
|
||||
if cfg_base_url:
|
||||
base_url = cfg_base_url
|
||||
except Exception:
|
||||
pass
|
||||
|
||||
model = _API_KEY_PROVIDER_AUX_MODELS.get("anthropic", "claude-haiku-4-5-20251001")
|
||||
logger.debug("Auxiliary client: Anthropic native (%s)", model)
|
||||
real_client = build_anthropic_client(token, _ANTHROPIC_DEFAULT_BASE_URL)
|
||||
return AnthropicAuxiliaryClient(real_client, model, token, _ANTHROPIC_DEFAULT_BASE_URL), model
|
||||
logger.debug("Auxiliary client: Anthropic native (%s) at %s", model, base_url)
|
||||
real_client = build_anthropic_client(token, base_url)
|
||||
return AnthropicAuxiliaryClient(real_client, model, token, base_url), model
|
||||
|
||||
|
||||
def _resolve_forced_provider(forced: str) -> Tuple[Optional[OpenAI], Optional[str]]:
|
||||
@@ -744,6 +747,10 @@ def _to_async_client(sync_client, model: str):
|
||||
base_lower = str(sync_client.base_url).lower()
|
||||
if "openrouter" in base_lower:
|
||||
async_kwargs["default_headers"] = dict(_OR_HEADERS)
|
||||
elif "api.githubcopilot.com" in base_lower:
|
||||
from hermes_cli.models import copilot_default_headers
|
||||
|
||||
async_kwargs["default_headers"] = copilot_default_headers()
|
||||
elif "api.kimi.com" in base_lower:
|
||||
async_kwargs["default_headers"] = {"User-Agent": "KimiCLI/1.0"}
|
||||
return AsyncOpenAI(**async_kwargs), model
|
||||
@@ -885,7 +892,7 @@ def resolve_provider_client(
|
||||
|
||||
# ── API-key providers from PROVIDER_REGISTRY ─────────────────────
|
||||
try:
|
||||
from hermes_cli.auth import PROVIDER_REGISTRY, _resolve_kimi_base_url
|
||||
from hermes_cli.auth import PROVIDER_REGISTRY, resolve_api_key_provider_credentials
|
||||
except ImportError:
|
||||
logger.debug("hermes_cli.auth not available for provider %s", provider)
|
||||
return None, None
|
||||
@@ -904,26 +911,18 @@ def resolve_provider_client(
|
||||
final_model = model or default_model
|
||||
return (_to_async_client(client, final_model) if async_mode else (client, final_model))
|
||||
|
||||
# Find the first configured API key
|
||||
api_key = ""
|
||||
for env_var in pconfig.api_key_env_vars:
|
||||
api_key = os.getenv(env_var, "").strip()
|
||||
if api_key:
|
||||
break
|
||||
creds = resolve_api_key_provider_credentials(provider)
|
||||
api_key = str(creds.get("api_key", "")).strip()
|
||||
if not api_key:
|
||||
tried_sources = list(pconfig.api_key_env_vars)
|
||||
if provider == "copilot":
|
||||
tried_sources.append("gh auth token")
|
||||
logger.warning("resolve_provider_client: provider %s has no API "
|
||||
"key configured (tried: %s)",
|
||||
provider, ", ".join(pconfig.api_key_env_vars))
|
||||
provider, ", ".join(tried_sources))
|
||||
return None, None
|
||||
|
||||
# Resolve base URL (env override → provider-specific logic → default)
|
||||
base_url_override = os.getenv(pconfig.base_url_env_var, "").strip() if pconfig.base_url_env_var else ""
|
||||
if provider == "kimi-coding":
|
||||
base_url = _resolve_kimi_base_url(api_key, pconfig.inference_base_url, base_url_override)
|
||||
elif base_url_override:
|
||||
base_url = base_url_override
|
||||
else:
|
||||
base_url = pconfig.inference_base_url
|
||||
base_url = str(creds.get("base_url", "")).strip().rstrip("/") or pconfig.inference_base_url
|
||||
|
||||
default_model = _API_KEY_PROVIDER_AUX_MODELS.get(provider, "")
|
||||
final_model = model or default_model
|
||||
@@ -932,6 +931,10 @@ def resolve_provider_client(
|
||||
headers = {}
|
||||
if "api.kimi.com" in base_url.lower():
|
||||
headers["User-Agent"] = "KimiCLI/1.0"
|
||||
elif "api.githubcopilot.com" in base_url.lower():
|
||||
from hermes_cli.models import copilot_default_headers
|
||||
|
||||
headers.update(copilot_default_headers())
|
||||
|
||||
client = OpenAI(api_key=api_key, base_url=base_url,
|
||||
**({"default_headers": headers} if headers else {}))
|
||||
@@ -1188,8 +1191,18 @@ def _get_cached_client(
|
||||
cache_key = (provider, async_mode, base_url or "", api_key or "")
|
||||
with _client_cache_lock:
|
||||
if cache_key in _client_cache:
|
||||
cached_client, cached_default = _client_cache[cache_key]
|
||||
return cached_client, model or cached_default
|
||||
cached_client, cached_default, cached_loop = _client_cache[cache_key]
|
||||
if async_mode:
|
||||
# Async clients are bound to the event loop that created them.
|
||||
# A cached async client whose loop has been closed will raise
|
||||
# "Event loop is closed" when httpx tries to clean up its
|
||||
# transport. Discard the stale client and create a fresh one.
|
||||
if cached_loop is not None and cached_loop.is_closed():
|
||||
del _client_cache[cache_key]
|
||||
else:
|
||||
return cached_client, model or cached_default
|
||||
else:
|
||||
return cached_client, model or cached_default
|
||||
# Build outside the lock
|
||||
client, default_model = resolve_provider_client(
|
||||
provider,
|
||||
@@ -1199,11 +1212,20 @@ def _get_cached_client(
|
||||
explicit_api_key=api_key,
|
||||
)
|
||||
if client is not None:
|
||||
# For async clients, remember which loop they were created on so we
|
||||
# can detect stale entries later.
|
||||
bound_loop = None
|
||||
if async_mode:
|
||||
try:
|
||||
import asyncio as _aio
|
||||
bound_loop = _aio.get_event_loop()
|
||||
except RuntimeError:
|
||||
pass
|
||||
with _client_cache_lock:
|
||||
if cache_key not in _client_cache:
|
||||
_client_cache[cache_key] = (client, default_model)
|
||||
_client_cache[cache_key] = (client, default_model, bound_loop)
|
||||
else:
|
||||
client, default_model = _client_cache[cache_key]
|
||||
client, default_model, _ = _client_cache[cache_key]
|
||||
return client, model or default_model
|
||||
|
||||
|
||||
|
||||
+23
-10
@@ -46,17 +46,24 @@ class ContextCompressor:
|
||||
summary_model_override: str = None,
|
||||
base_url: str = "",
|
||||
api_key: str = "",
|
||||
config_context_length: int | None = None,
|
||||
provider: str = "",
|
||||
):
|
||||
self.model = model
|
||||
self.base_url = base_url
|
||||
self.api_key = api_key
|
||||
self.provider = provider
|
||||
self.threshold_percent = threshold_percent
|
||||
self.protect_first_n = protect_first_n
|
||||
self.protect_last_n = protect_last_n
|
||||
self.summary_target_tokens = summary_target_tokens
|
||||
self.quiet_mode = quiet_mode
|
||||
|
||||
self.context_length = get_model_context_length(model, base_url=base_url, api_key=api_key)
|
||||
self.context_length = get_model_context_length(
|
||||
model, base_url=base_url, api_key=api_key,
|
||||
config_context_length=config_context_length,
|
||||
provider=provider,
|
||||
)
|
||||
self.threshold_tokens = int(self.context_length * threshold_percent)
|
||||
self.compression_count = 0
|
||||
self._context_probed = False # True after a step-down from context error
|
||||
@@ -253,18 +260,24 @@ Write only the summary body. Do not include any preamble or prefix; the system w
|
||||
"""Pull a compress-end boundary backward to avoid splitting a
|
||||
tool_call / result group.
|
||||
|
||||
If the message just before ``idx`` is an assistant message with
|
||||
tool_calls, those tool results will start at ``idx`` and would be
|
||||
separated from their parent. Move backwards to include the whole
|
||||
group in the summarised region.
|
||||
If the boundary falls in the middle of a tool-result group (i.e.
|
||||
there are consecutive tool messages before ``idx``), walk backward
|
||||
past all of them to find the parent assistant message. If found,
|
||||
move the boundary before the assistant so the entire
|
||||
assistant + tool_results group is included in the summarised region
|
||||
rather than being split (which causes silent data loss when
|
||||
``_sanitize_tool_pairs`` removes the orphaned tail results).
|
||||
"""
|
||||
if idx <= 0 or idx >= len(messages):
|
||||
return idx
|
||||
prev = messages[idx - 1]
|
||||
if prev.get("role") == "assistant" and prev.get("tool_calls"):
|
||||
# The results for this assistant turn sit at idx..idx+k.
|
||||
# Include the assistant message in the summarised region too.
|
||||
idx -= 1
|
||||
# Walk backward past consecutive tool results
|
||||
check = idx - 1
|
||||
while check >= 0 and messages[check].get("role") == "tool":
|
||||
check -= 1
|
||||
# If we landed on the parent assistant with tool_calls, pull the
|
||||
# boundary before it so the whole group gets summarised together.
|
||||
if check >= 0 and messages[check].get("role") == "assistant" and messages[check].get("tool_calls"):
|
||||
idx = check
|
||||
return idx
|
||||
|
||||
def compress(self, messages: List[Dict[str, Any]], current_tokens: int = None) -> List[Dict[str, Any]]:
|
||||
|
||||
@@ -0,0 +1,447 @@
|
||||
"""OpenAI-compatible shim that forwards Hermes requests to `copilot --acp`.
|
||||
|
||||
This adapter lets Hermes treat the GitHub Copilot ACP server as a chat-style
|
||||
backend. Each request starts a short-lived ACP session, sends the formatted
|
||||
conversation as a single prompt, collects text chunks, and converts the result
|
||||
back into the minimal shape Hermes expects from an OpenAI client.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import json
|
||||
import os
|
||||
import queue
|
||||
import shlex
|
||||
import subprocess
|
||||
import threading
|
||||
import time
|
||||
from collections import deque
|
||||
from pathlib import Path
|
||||
from types import SimpleNamespace
|
||||
from typing import Any
|
||||
|
||||
ACP_MARKER_BASE_URL = "acp://copilot"
|
||||
_DEFAULT_TIMEOUT_SECONDS = 900.0
|
||||
|
||||
|
||||
def _resolve_command() -> str:
|
||||
return (
|
||||
os.getenv("HERMES_COPILOT_ACP_COMMAND", "").strip()
|
||||
or os.getenv("COPILOT_CLI_PATH", "").strip()
|
||||
or "copilot"
|
||||
)
|
||||
|
||||
|
||||
def _resolve_args() -> list[str]:
|
||||
raw = os.getenv("HERMES_COPILOT_ACP_ARGS", "").strip()
|
||||
if not raw:
|
||||
return ["--acp", "--stdio"]
|
||||
return shlex.split(raw)
|
||||
|
||||
|
||||
def _jsonrpc_error(message_id: Any, code: int, message: str) -> dict[str, Any]:
|
||||
return {
|
||||
"jsonrpc": "2.0",
|
||||
"id": message_id,
|
||||
"error": {
|
||||
"code": code,
|
||||
"message": message,
|
||||
},
|
||||
}
|
||||
|
||||
|
||||
def _format_messages_as_prompt(messages: list[dict[str, Any]], model: str | None = None) -> str:
|
||||
sections: list[str] = [
|
||||
"You are being used as the active ACP agent backend for Hermes.",
|
||||
"Use your own ACP capabilities and respond directly in natural language.",
|
||||
"Do not emit OpenAI tool-call JSON.",
|
||||
]
|
||||
if model:
|
||||
sections.append(f"Hermes requested model hint: {model}")
|
||||
|
||||
transcript: list[str] = []
|
||||
for message in messages:
|
||||
if not isinstance(message, dict):
|
||||
continue
|
||||
role = str(message.get("role") or "unknown").strip().lower()
|
||||
if role == "tool":
|
||||
role = "tool"
|
||||
elif role not in {"system", "user", "assistant"}:
|
||||
role = "context"
|
||||
|
||||
content = message.get("content")
|
||||
rendered = _render_message_content(content)
|
||||
if not rendered:
|
||||
continue
|
||||
|
||||
label = {
|
||||
"system": "System",
|
||||
"user": "User",
|
||||
"assistant": "Assistant",
|
||||
"tool": "Tool",
|
||||
"context": "Context",
|
||||
}.get(role, role.title())
|
||||
transcript.append(f"{label}:\n{rendered}")
|
||||
|
||||
if transcript:
|
||||
sections.append("Conversation transcript:\n\n" + "\n\n".join(transcript))
|
||||
|
||||
sections.append("Continue the conversation from the latest user request.")
|
||||
return "\n\n".join(section.strip() for section in sections if section and section.strip())
|
||||
|
||||
|
||||
def _render_message_content(content: Any) -> str:
|
||||
if content is None:
|
||||
return ""
|
||||
if isinstance(content, str):
|
||||
return content.strip()
|
||||
if isinstance(content, dict):
|
||||
if "text" in content:
|
||||
return str(content.get("text") or "").strip()
|
||||
if "content" in content and isinstance(content.get("content"), str):
|
||||
return str(content.get("content") or "").strip()
|
||||
return json.dumps(content, ensure_ascii=True)
|
||||
if isinstance(content, list):
|
||||
parts: list[str] = []
|
||||
for item in content:
|
||||
if isinstance(item, str):
|
||||
parts.append(item)
|
||||
elif isinstance(item, dict):
|
||||
text = item.get("text")
|
||||
if isinstance(text, str) and text.strip():
|
||||
parts.append(text.strip())
|
||||
return "\n".join(parts).strip()
|
||||
return str(content).strip()
|
||||
|
||||
|
||||
def _ensure_path_within_cwd(path_text: str, cwd: str) -> Path:
|
||||
candidate = Path(path_text)
|
||||
if not candidate.is_absolute():
|
||||
raise PermissionError("ACP file-system paths must be absolute.")
|
||||
resolved = candidate.resolve()
|
||||
root = Path(cwd).resolve()
|
||||
try:
|
||||
resolved.relative_to(root)
|
||||
except ValueError as exc:
|
||||
raise PermissionError(f"Path '{resolved}' is outside the session cwd '{root}'.") from exc
|
||||
return resolved
|
||||
|
||||
|
||||
class _ACPChatCompletions:
|
||||
def __init__(self, client: "CopilotACPClient"):
|
||||
self._client = client
|
||||
|
||||
def create(self, **kwargs: Any) -> Any:
|
||||
return self._client._create_chat_completion(**kwargs)
|
||||
|
||||
|
||||
class _ACPChatNamespace:
|
||||
def __init__(self, client: "CopilotACPClient"):
|
||||
self.completions = _ACPChatCompletions(client)
|
||||
|
||||
|
||||
class CopilotACPClient:
|
||||
"""Minimal OpenAI-client-compatible facade for Copilot ACP."""
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
*,
|
||||
api_key: str | None = None,
|
||||
base_url: str | None = None,
|
||||
default_headers: dict[str, str] | None = None,
|
||||
acp_command: str | None = None,
|
||||
acp_args: list[str] | None = None,
|
||||
acp_cwd: str | None = None,
|
||||
command: str | None = None,
|
||||
args: list[str] | None = None,
|
||||
**_: Any,
|
||||
):
|
||||
self.api_key = api_key or "copilot-acp"
|
||||
self.base_url = base_url or ACP_MARKER_BASE_URL
|
||||
self._default_headers = dict(default_headers or {})
|
||||
self._acp_command = acp_command or command or _resolve_command()
|
||||
self._acp_args = list(acp_args or args or _resolve_args())
|
||||
self._acp_cwd = str(Path(acp_cwd or os.getcwd()).resolve())
|
||||
self.chat = _ACPChatNamespace(self)
|
||||
self.is_closed = False
|
||||
self._active_process: subprocess.Popen[str] | None = None
|
||||
self._active_process_lock = threading.Lock()
|
||||
|
||||
def close(self) -> None:
|
||||
proc: subprocess.Popen[str] | None
|
||||
with self._active_process_lock:
|
||||
proc = self._active_process
|
||||
self._active_process = None
|
||||
self.is_closed = True
|
||||
if proc is None:
|
||||
return
|
||||
try:
|
||||
proc.terminate()
|
||||
proc.wait(timeout=2)
|
||||
except Exception:
|
||||
try:
|
||||
proc.kill()
|
||||
except Exception:
|
||||
pass
|
||||
|
||||
def _create_chat_completion(
|
||||
self,
|
||||
*,
|
||||
model: str | None = None,
|
||||
messages: list[dict[str, Any]] | None = None,
|
||||
timeout: float | None = None,
|
||||
**_: Any,
|
||||
) -> Any:
|
||||
prompt_text = _format_messages_as_prompt(messages or [], model=model)
|
||||
response_text, reasoning_text = self._run_prompt(
|
||||
prompt_text,
|
||||
timeout_seconds=float(timeout or _DEFAULT_TIMEOUT_SECONDS),
|
||||
)
|
||||
|
||||
usage = SimpleNamespace(
|
||||
prompt_tokens=0,
|
||||
completion_tokens=0,
|
||||
total_tokens=0,
|
||||
prompt_tokens_details=SimpleNamespace(cached_tokens=0),
|
||||
)
|
||||
assistant_message = SimpleNamespace(
|
||||
content=response_text,
|
||||
tool_calls=[],
|
||||
reasoning=reasoning_text or None,
|
||||
reasoning_content=reasoning_text or None,
|
||||
reasoning_details=None,
|
||||
)
|
||||
choice = SimpleNamespace(message=assistant_message, finish_reason="stop")
|
||||
return SimpleNamespace(
|
||||
choices=[choice],
|
||||
usage=usage,
|
||||
model=model or "copilot-acp",
|
||||
)
|
||||
|
||||
def _run_prompt(self, prompt_text: str, *, timeout_seconds: float) -> tuple[str, str]:
|
||||
try:
|
||||
proc = subprocess.Popen(
|
||||
[self._acp_command] + self._acp_args,
|
||||
stdin=subprocess.PIPE,
|
||||
stdout=subprocess.PIPE,
|
||||
stderr=subprocess.PIPE,
|
||||
text=True,
|
||||
bufsize=1,
|
||||
cwd=self._acp_cwd,
|
||||
)
|
||||
except FileNotFoundError as exc:
|
||||
raise RuntimeError(
|
||||
f"Could not start Copilot ACP command '{self._acp_command}'. "
|
||||
"Install GitHub Copilot CLI or set HERMES_COPILOT_ACP_COMMAND/COPILOT_CLI_PATH."
|
||||
) from exc
|
||||
|
||||
if proc.stdin is None or proc.stdout is None:
|
||||
proc.kill()
|
||||
raise RuntimeError("Copilot ACP process did not expose stdin/stdout pipes.")
|
||||
|
||||
self.is_closed = False
|
||||
with self._active_process_lock:
|
||||
self._active_process = proc
|
||||
|
||||
inbox: queue.Queue[dict[str, Any]] = queue.Queue()
|
||||
stderr_tail: deque[str] = deque(maxlen=40)
|
||||
|
||||
def _stdout_reader() -> None:
|
||||
for line in proc.stdout:
|
||||
try:
|
||||
inbox.put(json.loads(line))
|
||||
except Exception:
|
||||
inbox.put({"raw": line.rstrip("\n")})
|
||||
|
||||
def _stderr_reader() -> None:
|
||||
if proc.stderr is None:
|
||||
return
|
||||
for line in proc.stderr:
|
||||
stderr_tail.append(line.rstrip("\n"))
|
||||
|
||||
out_thread = threading.Thread(target=_stdout_reader, daemon=True)
|
||||
err_thread = threading.Thread(target=_stderr_reader, daemon=True)
|
||||
out_thread.start()
|
||||
err_thread.start()
|
||||
|
||||
next_id = 0
|
||||
|
||||
def _request(method: str, params: dict[str, Any], *, text_parts: list[str] | None = None, reasoning_parts: list[str] | None = None) -> Any:
|
||||
nonlocal next_id
|
||||
next_id += 1
|
||||
request_id = next_id
|
||||
payload = {
|
||||
"jsonrpc": "2.0",
|
||||
"id": request_id,
|
||||
"method": method,
|
||||
"params": params,
|
||||
}
|
||||
proc.stdin.write(json.dumps(payload) + "\n")
|
||||
proc.stdin.flush()
|
||||
|
||||
deadline = time.time() + timeout_seconds
|
||||
while time.time() < deadline:
|
||||
if proc.poll() is not None:
|
||||
break
|
||||
try:
|
||||
msg = inbox.get(timeout=0.1)
|
||||
except queue.Empty:
|
||||
continue
|
||||
|
||||
if self._handle_server_message(
|
||||
msg,
|
||||
process=proc,
|
||||
cwd=self._acp_cwd,
|
||||
text_parts=text_parts,
|
||||
reasoning_parts=reasoning_parts,
|
||||
):
|
||||
continue
|
||||
|
||||
if msg.get("id") != request_id:
|
||||
continue
|
||||
if "error" in msg:
|
||||
err = msg.get("error") or {}
|
||||
raise RuntimeError(
|
||||
f"Copilot ACP {method} failed: {err.get('message') or err}"
|
||||
)
|
||||
return msg.get("result")
|
||||
|
||||
stderr_text = "\n".join(stderr_tail).strip()
|
||||
if proc.poll() is not None and stderr_text:
|
||||
raise RuntimeError(f"Copilot ACP process exited early: {stderr_text}")
|
||||
raise TimeoutError(f"Timed out waiting for Copilot ACP response to {method}.")
|
||||
|
||||
try:
|
||||
_request(
|
||||
"initialize",
|
||||
{
|
||||
"protocolVersion": 1,
|
||||
"clientCapabilities": {
|
||||
"fs": {
|
||||
"readTextFile": True,
|
||||
"writeTextFile": True,
|
||||
}
|
||||
},
|
||||
"clientInfo": {
|
||||
"name": "hermes-agent",
|
||||
"title": "Hermes Agent",
|
||||
"version": "0.0.0",
|
||||
},
|
||||
},
|
||||
)
|
||||
session = _request(
|
||||
"session/new",
|
||||
{
|
||||
"cwd": self._acp_cwd,
|
||||
"mcpServers": [],
|
||||
},
|
||||
) or {}
|
||||
session_id = str(session.get("sessionId") or "").strip()
|
||||
if not session_id:
|
||||
raise RuntimeError("Copilot ACP did not return a sessionId.")
|
||||
|
||||
text_parts: list[str] = []
|
||||
reasoning_parts: list[str] = []
|
||||
_request(
|
||||
"session/prompt",
|
||||
{
|
||||
"sessionId": session_id,
|
||||
"prompt": [
|
||||
{
|
||||
"type": "text",
|
||||
"text": prompt_text,
|
||||
}
|
||||
],
|
||||
},
|
||||
text_parts=text_parts,
|
||||
reasoning_parts=reasoning_parts,
|
||||
)
|
||||
return "".join(text_parts), "".join(reasoning_parts)
|
||||
finally:
|
||||
self.close()
|
||||
|
||||
def _handle_server_message(
|
||||
self,
|
||||
msg: dict[str, Any],
|
||||
*,
|
||||
process: subprocess.Popen[str],
|
||||
cwd: str,
|
||||
text_parts: list[str] | None,
|
||||
reasoning_parts: list[str] | None,
|
||||
) -> bool:
|
||||
method = msg.get("method")
|
||||
if not isinstance(method, str):
|
||||
return False
|
||||
|
||||
if method == "session/update":
|
||||
params = msg.get("params") or {}
|
||||
update = params.get("update") or {}
|
||||
kind = str(update.get("sessionUpdate") or "").strip()
|
||||
content = update.get("content") or {}
|
||||
chunk_text = ""
|
||||
if isinstance(content, dict):
|
||||
chunk_text = str(content.get("text") or "")
|
||||
if kind == "agent_message_chunk" and chunk_text and text_parts is not None:
|
||||
text_parts.append(chunk_text)
|
||||
elif kind == "agent_thought_chunk" and chunk_text and reasoning_parts is not None:
|
||||
reasoning_parts.append(chunk_text)
|
||||
return True
|
||||
|
||||
if process.stdin is None:
|
||||
return True
|
||||
|
||||
message_id = msg.get("id")
|
||||
params = msg.get("params") or {}
|
||||
|
||||
if method == "session/request_permission":
|
||||
response = {
|
||||
"jsonrpc": "2.0",
|
||||
"id": message_id,
|
||||
"result": {
|
||||
"outcome": {
|
||||
"outcome": "allow_once",
|
||||
}
|
||||
},
|
||||
}
|
||||
elif method == "fs/read_text_file":
|
||||
try:
|
||||
path = _ensure_path_within_cwd(str(params.get("path") or ""), cwd)
|
||||
content = path.read_text() if path.exists() else ""
|
||||
line = params.get("line")
|
||||
limit = params.get("limit")
|
||||
if isinstance(line, int) and line > 1:
|
||||
lines = content.splitlines(keepends=True)
|
||||
start = line - 1
|
||||
end = start + limit if isinstance(limit, int) and limit > 0 else None
|
||||
content = "".join(lines[start:end])
|
||||
response = {
|
||||
"jsonrpc": "2.0",
|
||||
"id": message_id,
|
||||
"result": {
|
||||
"content": content,
|
||||
},
|
||||
}
|
||||
except Exception as exc:
|
||||
response = _jsonrpc_error(message_id, -32602, str(exc))
|
||||
elif method == "fs/write_text_file":
|
||||
try:
|
||||
path = _ensure_path_within_cwd(str(params.get("path") or ""), cwd)
|
||||
path.parent.mkdir(parents=True, exist_ok=True)
|
||||
path.write_text(str(params.get("content") or ""))
|
||||
response = {
|
||||
"jsonrpc": "2.0",
|
||||
"id": message_id,
|
||||
"result": None,
|
||||
}
|
||||
except Exception as exc:
|
||||
response = _jsonrpc_error(message_id, -32602, str(exc))
|
||||
else:
|
||||
response = _jsonrpc_error(
|
||||
message_id,
|
||||
-32601,
|
||||
f"ACP client method '{method}' is not supported by Hermes yet.",
|
||||
)
|
||||
|
||||
process.stdin.write(json.dumps(response) + "\n")
|
||||
process.stdin.flush()
|
||||
return True
|
||||
@@ -612,3 +612,95 @@ def write_tty(text: str) -> None:
|
||||
except OSError:
|
||||
sys.stdout.write(text)
|
||||
sys.stdout.flush()
|
||||
|
||||
|
||||
# =========================================================================
|
||||
# Context pressure display (CLI user-facing warnings)
|
||||
# =========================================================================
|
||||
|
||||
# ANSI color codes for context pressure tiers
|
||||
_CYAN = "\033[36m"
|
||||
_YELLOW = "\033[33m"
|
||||
_BOLD = "\033[1m"
|
||||
_DIM_ANSI = "\033[2m"
|
||||
|
||||
# Bar characters
|
||||
_BAR_FILLED = "▰"
|
||||
_BAR_EMPTY = "▱"
|
||||
_BAR_WIDTH = 20
|
||||
|
||||
|
||||
def format_context_pressure(
|
||||
compaction_progress: float,
|
||||
threshold_tokens: int,
|
||||
threshold_percent: float,
|
||||
compression_enabled: bool = True,
|
||||
) -> str:
|
||||
"""Build a formatted context pressure line for CLI display.
|
||||
|
||||
The bar and percentage show progress toward the compaction threshold,
|
||||
NOT the raw context window. 100% = compaction fires.
|
||||
|
||||
Uses ANSI colors:
|
||||
- cyan at ~60% to compaction = informational
|
||||
- bold yellow at ~85% to compaction = warning
|
||||
|
||||
Args:
|
||||
compaction_progress: How close to compaction (0.0–1.0, 1.0 = fires).
|
||||
threshold_tokens: Compaction threshold in tokens.
|
||||
threshold_percent: Compaction threshold as a fraction of context window.
|
||||
compression_enabled: Whether auto-compression is active.
|
||||
"""
|
||||
pct_int = int(compaction_progress * 100)
|
||||
filled = min(int(compaction_progress * _BAR_WIDTH), _BAR_WIDTH)
|
||||
bar = _BAR_FILLED * filled + _BAR_EMPTY * (_BAR_WIDTH - filled)
|
||||
|
||||
threshold_k = f"{threshold_tokens // 1000}k" if threshold_tokens >= 1000 else str(threshold_tokens)
|
||||
threshold_pct_int = int(threshold_percent * 100)
|
||||
|
||||
# Tier styling
|
||||
if compaction_progress >= 0.85:
|
||||
color = f"{_BOLD}{_YELLOW}"
|
||||
icon = "⚠"
|
||||
if compression_enabled:
|
||||
hint = "compaction imminent"
|
||||
else:
|
||||
hint = "no auto-compaction"
|
||||
else:
|
||||
color = _CYAN
|
||||
icon = "◐"
|
||||
hint = "approaching compaction"
|
||||
|
||||
return (
|
||||
f" {color}{icon} context {bar} {pct_int}% to compaction{_ANSI_RESET}"
|
||||
f" {_DIM_ANSI}{threshold_k} threshold ({threshold_pct_int}%) · {hint}{_ANSI_RESET}"
|
||||
)
|
||||
|
||||
|
||||
def format_context_pressure_gateway(
|
||||
compaction_progress: float,
|
||||
threshold_percent: float,
|
||||
compression_enabled: bool = True,
|
||||
) -> str:
|
||||
"""Build a plain-text context pressure notification for messaging platforms.
|
||||
|
||||
No ANSI — just Unicode and plain text suitable for Telegram/Discord/etc.
|
||||
The percentage shows progress toward the compaction threshold.
|
||||
"""
|
||||
pct_int = int(compaction_progress * 100)
|
||||
filled = min(int(compaction_progress * _BAR_WIDTH), _BAR_WIDTH)
|
||||
bar = _BAR_FILLED * filled + _BAR_EMPTY * (_BAR_WIDTH - filled)
|
||||
|
||||
threshold_pct_int = int(threshold_percent * 100)
|
||||
|
||||
if compaction_progress >= 0.85:
|
||||
icon = "⚠️"
|
||||
if compression_enabled:
|
||||
hint = f"Context compaction is imminent (threshold: {threshold_pct_int}% of window)."
|
||||
else:
|
||||
hint = "Auto-compaction is disabled — context may be truncated."
|
||||
else:
|
||||
icon = "ℹ️"
|
||||
hint = f"Compaction threshold is at {threshold_pct_int}% of context window."
|
||||
|
||||
return f"{icon} Context: {bar} {pct_int}% to compaction\n{hint}"
|
||||
|
||||
+15
-12
@@ -181,22 +181,25 @@ class InsightsEngine:
|
||||
"billing_base_url, billing_mode, estimated_cost_usd, "
|
||||
"actual_cost_usd, cost_status, cost_source")
|
||||
|
||||
# Pre-computed query strings — f-string evaluated once at class definition,
|
||||
# not at runtime, so no user-controlled value can alter the query structure.
|
||||
_GET_SESSIONS_WITH_SOURCE = (
|
||||
f"SELECT {_SESSION_COLS} FROM sessions"
|
||||
" WHERE started_at >= ? AND source = ?"
|
||||
" ORDER BY started_at DESC"
|
||||
)
|
||||
_GET_SESSIONS_ALL = (
|
||||
f"SELECT {_SESSION_COLS} FROM sessions"
|
||||
" WHERE started_at >= ?"
|
||||
" ORDER BY started_at DESC"
|
||||
)
|
||||
|
||||
def _get_sessions(self, cutoff: float, source: str = None) -> List[Dict]:
|
||||
"""Fetch sessions within the time window."""
|
||||
if source:
|
||||
cursor = self._conn.execute(
|
||||
f"""SELECT {self._SESSION_COLS} FROM sessions
|
||||
WHERE started_at >= ? AND source = ?
|
||||
ORDER BY started_at DESC""",
|
||||
(cutoff, source),
|
||||
)
|
||||
cursor = self._conn.execute(self._GET_SESSIONS_WITH_SOURCE, (cutoff, source))
|
||||
else:
|
||||
cursor = self._conn.execute(
|
||||
f"""SELECT {self._SESSION_COLS} FROM sessions
|
||||
WHERE started_at >= ?
|
||||
ORDER BY started_at DESC""",
|
||||
(cutoff,),
|
||||
)
|
||||
cursor = self._conn.execute(self._GET_SESSIONS_ALL, (cutoff,))
|
||||
return [dict(row) for row in cursor.fetchall()]
|
||||
|
||||
def _get_tool_usage(self, cutoff: float, source: str = None) -> List[Dict]:
|
||||
|
||||
+467
-106
@@ -19,6 +19,46 @@ from hermes_constants import OPENROUTER_MODELS_URL
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
# Provider names that can appear as a "provider:" prefix before a model ID.
|
||||
# Only these are stripped — Ollama-style "model:tag" colons (e.g. "qwen3.5:27b")
|
||||
# are preserved so the full model name reaches cache lookups and server queries.
|
||||
_PROVIDER_PREFIXES: frozenset[str] = frozenset({
|
||||
"openrouter", "nous", "openai-codex", "copilot", "copilot-acp",
|
||||
"zai", "kimi-coding", "minimax", "minimax-cn", "anthropic", "deepseek",
|
||||
"opencode-zen", "opencode-go", "ai-gateway", "kilocode", "alibaba",
|
||||
"custom", "local",
|
||||
# Common aliases
|
||||
"glm", "z-ai", "z.ai", "zhipu", "github", "github-copilot",
|
||||
"github-models", "kimi", "moonshot", "claude", "deep-seek",
|
||||
"opencode", "zen", "go", "vercel", "kilo", "dashscope", "aliyun", "qwen",
|
||||
})
|
||||
|
||||
|
||||
_OLLAMA_TAG_PATTERN = re.compile(
|
||||
r"^(\d+\.?\d*b|latest|stable|q\d|fp?\d|instruct|chat|coder|vision|text)",
|
||||
re.IGNORECASE,
|
||||
)
|
||||
|
||||
|
||||
def _strip_provider_prefix(model: str) -> str:
|
||||
"""Strip a recognised provider prefix from a model string.
|
||||
|
||||
``"local:my-model"`` → ``"my-model"``
|
||||
``"qwen3.5:27b"`` → ``"qwen3.5:27b"`` (unchanged — not a provider prefix)
|
||||
``"qwen:0.5b"`` → ``"qwen:0.5b"`` (unchanged — Ollama model:tag)
|
||||
``"deepseek:latest"``→ ``"deepseek:latest"``(unchanged — Ollama model:tag)
|
||||
"""
|
||||
if ":" not in model or model.startswith("http"):
|
||||
return model
|
||||
prefix, suffix = model.split(":", 1)
|
||||
prefix_lower = prefix.strip().lower()
|
||||
if prefix_lower in _PROVIDER_PREFIXES:
|
||||
# Don't strip if suffix looks like an Ollama tag (e.g. "7b", "latest", "q4_0")
|
||||
if _OLLAMA_TAG_PATTERN.match(suffix.strip()):
|
||||
return model
|
||||
return suffix
|
||||
return model
|
||||
|
||||
_model_metadata_cache: Dict[str, Dict[str, Any]] = {}
|
||||
_model_metadata_cache_time: float = 0
|
||||
_MODEL_CACHE_TTL = 3600
|
||||
@@ -27,104 +67,52 @@ _endpoint_model_metadata_cache_time: Dict[str, float] = {}
|
||||
_ENDPOINT_MODEL_CACHE_TTL = 300
|
||||
|
||||
# Descending tiers for context length probing when the model is unknown.
|
||||
# We start high and step down on context-length errors until one works.
|
||||
# We start at 128K (a safe default for most modern models) and step down
|
||||
# on context-length errors until one works.
|
||||
CONTEXT_PROBE_TIERS = [
|
||||
2_000_000,
|
||||
1_000_000,
|
||||
512_000,
|
||||
200_000,
|
||||
128_000,
|
||||
64_000,
|
||||
32_000,
|
||||
16_000,
|
||||
8_000,
|
||||
]
|
||||
|
||||
# Default context length when no detection method succeeds.
|
||||
DEFAULT_FALLBACK_CONTEXT = CONTEXT_PROBE_TIERS[0]
|
||||
|
||||
# Thin fallback defaults — only broad model family patterns.
|
||||
# These fire only when provider is unknown AND models.dev/OpenRouter/Anthropic
|
||||
# all miss. Replaced the previous 80+ entry dict.
|
||||
# For provider-specific context lengths, models.dev is the primary source.
|
||||
DEFAULT_CONTEXT_LENGTHS = {
|
||||
"anthropic/claude-opus-4": 200000,
|
||||
"anthropic/claude-opus-4.5": 200000,
|
||||
"anthropic/claude-opus-4.6": 200000,
|
||||
"anthropic/claude-sonnet-4": 200000,
|
||||
"anthropic/claude-sonnet-4-20250514": 200000,
|
||||
"anthropic/claude-sonnet-4.5": 200000,
|
||||
"anthropic/claude-sonnet-4.6": 200000,
|
||||
"anthropic/claude-haiku-4.5": 200000,
|
||||
# Bare Anthropic model IDs (for native API provider)
|
||||
"claude-opus-4-6": 200000,
|
||||
"claude-sonnet-4-6": 200000,
|
||||
"claude-opus-4-5-20251101": 200000,
|
||||
"claude-sonnet-4-5-20250929": 200000,
|
||||
"claude-opus-4-1-20250805": 200000,
|
||||
"claude-opus-4-20250514": 200000,
|
||||
"claude-sonnet-4-20250514": 200000,
|
||||
"claude-haiku-4-5-20251001": 200000,
|
||||
"openai/gpt-5": 128000,
|
||||
"openai/gpt-4.1": 1047576,
|
||||
"openai/gpt-4.1-mini": 1047576,
|
||||
"openai/gpt-4o": 128000,
|
||||
"openai/gpt-4-turbo": 128000,
|
||||
"openai/gpt-4o-mini": 128000,
|
||||
"google/gemini-3-pro-preview": 1048576,
|
||||
"google/gemini-3-flash": 1048576,
|
||||
"google/gemini-2.5-flash": 1048576,
|
||||
"google/gemini-2.0-flash": 1048576,
|
||||
"google/gemini-2.5-pro": 1048576,
|
||||
"deepseek/deepseek-v3.2": 65536,
|
||||
"meta-llama/llama-3.3-70b-instruct": 131072,
|
||||
"deepseek/deepseek-chat-v3": 65536,
|
||||
"qwen/qwen-2.5-72b-instruct": 32768,
|
||||
"glm-4.7": 202752,
|
||||
"glm-5": 202752,
|
||||
"glm-4.5": 131072,
|
||||
"glm-4.5-flash": 131072,
|
||||
"kimi-for-coding": 262144,
|
||||
"kimi-k2.5": 262144,
|
||||
"kimi-k2-thinking": 262144,
|
||||
"kimi-k2-thinking-turbo": 262144,
|
||||
"kimi-k2-turbo-preview": 262144,
|
||||
"kimi-k2-0905-preview": 131072,
|
||||
"MiniMax-M2.7": 204800,
|
||||
"MiniMax-M2.7-highspeed": 204800,
|
||||
"MiniMax-M2.5": 204800,
|
||||
"MiniMax-M2.5-highspeed": 204800,
|
||||
"MiniMax-M2.1": 204800,
|
||||
# OpenCode Zen models
|
||||
"gpt-5.4-pro": 128000,
|
||||
"gpt-5.4": 128000,
|
||||
"gpt-5.3-codex": 128000,
|
||||
"gpt-5.3-codex-spark": 128000,
|
||||
"gpt-5.2": 128000,
|
||||
"gpt-5.2-codex": 128000,
|
||||
"gpt-5.1": 128000,
|
||||
"gpt-5.1-codex": 128000,
|
||||
"gpt-5.1-codex-max": 128000,
|
||||
"gpt-5.1-codex-mini": 128000,
|
||||
# Anthropic Claude 4.6 (1M context) — bare IDs only to avoid
|
||||
# fuzzy-match collisions (e.g. "anthropic/claude-sonnet-4" is a
|
||||
# substring of "anthropic/claude-sonnet-4.6").
|
||||
# OpenRouter-prefixed models resolve via OpenRouter live API or models.dev.
|
||||
"claude-opus-4-6": 1000000,
|
||||
"claude-sonnet-4-6": 1000000,
|
||||
"claude-opus-4.6": 1000000,
|
||||
"claude-sonnet-4.6": 1000000,
|
||||
# Catch-all for older Claude models (must sort after specific entries)
|
||||
"claude": 200000,
|
||||
# OpenAI
|
||||
"gpt-4.1": 1047576,
|
||||
"gpt-5": 128000,
|
||||
"gpt-5-codex": 128000,
|
||||
"gpt-5-nano": 128000,
|
||||
# Bare model IDs without provider prefix (avoid duplicates with entries above)
|
||||
"claude-opus-4-5": 200000,
|
||||
"claude-opus-4-1": 200000,
|
||||
"claude-sonnet-4-5": 200000,
|
||||
"claude-sonnet-4": 200000,
|
||||
"claude-haiku-4-5": 200000,
|
||||
"claude-3-5-haiku": 200000,
|
||||
"gemini-3.1-pro": 1048576,
|
||||
"gemini-3-pro": 1048576,
|
||||
"gemini-3-flash": 1048576,
|
||||
"minimax-m2.5": 204800,
|
||||
"minimax-m2.5-free": 204800,
|
||||
"minimax-m2.1": 204800,
|
||||
"glm-4.6": 202752,
|
||||
"kimi-k2": 262144,
|
||||
"qwen3-coder": 32768,
|
||||
"big-pickle": 128000,
|
||||
# Alibaba Cloud / DashScope Qwen models
|
||||
"qwen3.5-plus": 131072,
|
||||
"qwen3-max": 131072,
|
||||
"qwen3-coder-plus": 131072,
|
||||
"qwen3-coder-next": 131072,
|
||||
"qwen-plus-latest": 131072,
|
||||
"qwen3.5-flash": 131072,
|
||||
"qwen-vl-max": 32768,
|
||||
"gpt-4": 128000,
|
||||
# Google
|
||||
"gemini": 1048576,
|
||||
# DeepSeek
|
||||
"deepseek": 128000,
|
||||
# Meta
|
||||
"llama": 131072,
|
||||
# Qwen
|
||||
"qwen": 131072,
|
||||
# MiniMax
|
||||
"minimax": 204800,
|
||||
# GLM
|
||||
"glm": 202752,
|
||||
# Kimi
|
||||
"kimi": 262144,
|
||||
}
|
||||
|
||||
_CONTEXT_LENGTH_KEYS = (
|
||||
@@ -136,6 +124,8 @@ _CONTEXT_LENGTH_KEYS = (
|
||||
"max_input_tokens",
|
||||
"max_sequence_length",
|
||||
"max_seq_len",
|
||||
"n_ctx_train",
|
||||
"n_ctx",
|
||||
)
|
||||
|
||||
_MAX_COMPLETION_KEYS = (
|
||||
@@ -144,6 +134,9 @@ _MAX_COMPLETION_KEYS = (
|
||||
"max_tokens",
|
||||
)
|
||||
|
||||
# Local server hostnames / address patterns
|
||||
_LOCAL_HOSTS = ("localhost", "127.0.0.1", "::1", "0.0.0.0")
|
||||
|
||||
|
||||
def _normalize_base_url(base_url: str) -> str:
|
||||
return (base_url or "").strip().rstrip("/")
|
||||
@@ -176,6 +169,99 @@ def _is_known_provider_base_url(base_url: str) -> bool:
|
||||
return any(known_host in host for known_host in known_hosts)
|
||||
|
||||
|
||||
def is_local_endpoint(base_url: str) -> bool:
|
||||
"""Return True if base_url points to a local machine (localhost / RFC-1918 / WSL)."""
|
||||
normalized = _normalize_base_url(base_url)
|
||||
if not normalized:
|
||||
return False
|
||||
url = normalized if "://" in normalized else f"http://{normalized}"
|
||||
try:
|
||||
parsed = urlparse(url)
|
||||
host = parsed.hostname or ""
|
||||
except Exception:
|
||||
return False
|
||||
if host in _LOCAL_HOSTS:
|
||||
return True
|
||||
# RFC-1918 private ranges and link-local
|
||||
import ipaddress
|
||||
try:
|
||||
addr = ipaddress.ip_address(host)
|
||||
return addr.is_private or addr.is_loopback or addr.is_link_local
|
||||
except ValueError:
|
||||
pass
|
||||
# Bare IP that looks like a private range (e.g. 172.26.x.x for WSL)
|
||||
parts = host.split(".")
|
||||
if len(parts) == 4:
|
||||
try:
|
||||
first, second = int(parts[0]), int(parts[1])
|
||||
if first == 10:
|
||||
return True
|
||||
if first == 172 and 16 <= second <= 31:
|
||||
return True
|
||||
if first == 192 and second == 168:
|
||||
return True
|
||||
except ValueError:
|
||||
pass
|
||||
return False
|
||||
|
||||
|
||||
def detect_local_server_type(base_url: str) -> Optional[str]:
|
||||
"""Detect which local server is running at base_url by probing known endpoints.
|
||||
|
||||
Returns one of: "ollama", "lm-studio", "vllm", "llamacpp", or None.
|
||||
"""
|
||||
import httpx
|
||||
|
||||
normalized = _normalize_base_url(base_url)
|
||||
server_url = normalized
|
||||
if server_url.endswith("/v1"):
|
||||
server_url = server_url[:-3]
|
||||
|
||||
try:
|
||||
with httpx.Client(timeout=2.0) as client:
|
||||
# LM Studio exposes /api/v1/models — check first (most specific)
|
||||
try:
|
||||
r = client.get(f"{server_url}/api/v1/models")
|
||||
if r.status_code == 200:
|
||||
return "lm-studio"
|
||||
except Exception:
|
||||
pass
|
||||
# Ollama exposes /api/tags and responds with {"models": [...]}
|
||||
# LM Studio returns {"error": "Unexpected endpoint"} with status 200
|
||||
# on this path, so we must verify the response contains "models".
|
||||
try:
|
||||
r = client.get(f"{server_url}/api/tags")
|
||||
if r.status_code == 200:
|
||||
try:
|
||||
data = r.json()
|
||||
if "models" in data:
|
||||
return "ollama"
|
||||
except Exception:
|
||||
pass
|
||||
except Exception:
|
||||
pass
|
||||
# llama.cpp exposes /props
|
||||
try:
|
||||
r = client.get(f"{server_url}/props")
|
||||
if r.status_code == 200 and "default_generation_settings" in r.text:
|
||||
return "llamacpp"
|
||||
except Exception:
|
||||
pass
|
||||
# vLLM: /version
|
||||
try:
|
||||
r = client.get(f"{server_url}/version")
|
||||
if r.status_code == 200:
|
||||
data = r.json()
|
||||
if "version" in data:
|
||||
return "vllm"
|
||||
except Exception:
|
||||
pass
|
||||
except Exception:
|
||||
pass
|
||||
|
||||
return None
|
||||
|
||||
|
||||
def _iter_nested_dicts(value: Any):
|
||||
if isinstance(value, dict):
|
||||
yield value
|
||||
@@ -342,6 +428,25 @@ def fetch_endpoint_model_metadata(
|
||||
entry["pricing"] = pricing
|
||||
_add_model_aliases(cache, model_id, entry)
|
||||
|
||||
# If this is a llama.cpp server, query /props for actual allocated context
|
||||
is_llamacpp = any(
|
||||
m.get("owned_by") == "llamacpp"
|
||||
for m in payload.get("data", []) if isinstance(m, dict)
|
||||
)
|
||||
if is_llamacpp:
|
||||
try:
|
||||
props_url = candidate.rstrip("/").replace("/v1", "") + "/props"
|
||||
props_resp = requests.get(props_url, headers=headers, timeout=5)
|
||||
if props_resp.ok:
|
||||
props = props_resp.json()
|
||||
gen_settings = props.get("default_generation_settings", {})
|
||||
n_ctx = gen_settings.get("n_ctx")
|
||||
model_alias = props.get("model_alias", "")
|
||||
if n_ctx and model_alias and model_alias in cache:
|
||||
cache[model_alias]["context_length"] = n_ctx
|
||||
except Exception:
|
||||
pass
|
||||
|
||||
_endpoint_model_metadata_cache[normalized] = cache
|
||||
_endpoint_model_metadata_cache_time[normalized] = time.time()
|
||||
return cache
|
||||
@@ -362,7 +467,7 @@ def _get_context_cache_path() -> Path:
|
||||
|
||||
|
||||
def _load_context_cache() -> Dict[str, int]:
|
||||
"""Load the model+provider → context_length cache from disk."""
|
||||
"""Load the model+provider -> context_length cache from disk."""
|
||||
path = _get_context_cache_path()
|
||||
if not path.exists():
|
||||
return {}
|
||||
@@ -391,7 +496,7 @@ def save_context_length(model: str, base_url: str, length: int) -> None:
|
||||
path.parent.mkdir(parents=True, exist_ok=True)
|
||||
with open(path, "w") as f:
|
||||
yaml.dump({"context_lengths": cache}, f, default_flow_style=False)
|
||||
logger.info("Cached context length %s → %s tokens", key, f"{length:,}")
|
||||
logger.info("Cached context length %s -> %s tokens", key, f"{length:,}")
|
||||
except Exception as e:
|
||||
logger.debug("Failed to save context length cache: %s", e)
|
||||
|
||||
@@ -439,16 +544,219 @@ def parse_context_limit_from_error(error_msg: str) -> Optional[int]:
|
||||
return None
|
||||
|
||||
|
||||
def get_model_context_length(model: str, base_url: str = "", api_key: str = "") -> int:
|
||||
def _model_id_matches(candidate_id: str, lookup_model: str) -> bool:
|
||||
"""Return True if *candidate_id* (from server) matches *lookup_model* (configured).
|
||||
|
||||
Supports two forms:
|
||||
- Exact match: "nvidia-nemotron-super-49b-v1" == "nvidia-nemotron-super-49b-v1"
|
||||
- Slug match: "nvidia/nvidia-nemotron-super-49b-v1" matches "nvidia-nemotron-super-49b-v1"
|
||||
(the part after the last "/" equals lookup_model)
|
||||
|
||||
This covers LM Studio's native API which stores models as "publisher/slug"
|
||||
while users typically configure only the slug after the "local:" prefix.
|
||||
"""
|
||||
if candidate_id == lookup_model:
|
||||
return True
|
||||
# Slug match: basename of candidate equals the lookup name
|
||||
if "/" in candidate_id and candidate_id.rsplit("/", 1)[1] == lookup_model:
|
||||
return True
|
||||
return False
|
||||
|
||||
|
||||
def _query_local_context_length(model: str, base_url: str) -> Optional[int]:
|
||||
"""Query a local server for the model's context length."""
|
||||
import httpx
|
||||
|
||||
# Strip recognised provider prefix (e.g., "local:model-name" → "model-name").
|
||||
# Ollama "model:tag" colons (e.g. "qwen3.5:27b") are intentionally preserved.
|
||||
model = _strip_provider_prefix(model)
|
||||
|
||||
# Strip /v1 suffix to get the server root
|
||||
server_url = base_url.rstrip("/")
|
||||
if server_url.endswith("/v1"):
|
||||
server_url = server_url[:-3]
|
||||
|
||||
try:
|
||||
server_type = detect_local_server_type(base_url)
|
||||
except Exception:
|
||||
server_type = None
|
||||
|
||||
try:
|
||||
with httpx.Client(timeout=3.0) as client:
|
||||
# Ollama: /api/show returns model details with context info
|
||||
if server_type == "ollama":
|
||||
resp = client.post(f"{server_url}/api/show", json={"name": model})
|
||||
if resp.status_code == 200:
|
||||
data = resp.json()
|
||||
# Check model_info for context length
|
||||
model_info = data.get("model_info", {})
|
||||
for key, value in model_info.items():
|
||||
if "context_length" in key and isinstance(value, (int, float)):
|
||||
return int(value)
|
||||
# Check parameters string for num_ctx
|
||||
params = data.get("parameters", "")
|
||||
if "num_ctx" in params:
|
||||
for line in params.split("\n"):
|
||||
if "num_ctx" in line:
|
||||
parts = line.strip().split()
|
||||
if len(parts) >= 2:
|
||||
try:
|
||||
return int(parts[-1])
|
||||
except ValueError:
|
||||
pass
|
||||
|
||||
# LM Studio native API: /api/v1/models returns max_context_length.
|
||||
# This is more reliable than the OpenAI-compat /v1/models which
|
||||
# doesn't include context window information for LM Studio servers.
|
||||
# Use _model_id_matches for fuzzy matching: LM Studio stores models as
|
||||
# "publisher/slug" but users configure only "slug" after "local:" prefix.
|
||||
if server_type == "lm-studio":
|
||||
resp = client.get(f"{server_url}/api/v1/models")
|
||||
if resp.status_code == 200:
|
||||
data = resp.json()
|
||||
for m in data.get("models", []):
|
||||
if _model_id_matches(m.get("key", ""), model) or _model_id_matches(m.get("id", ""), model):
|
||||
# Prefer loaded instance context (actual runtime value)
|
||||
for inst in m.get("loaded_instances", []):
|
||||
cfg = inst.get("config", {})
|
||||
ctx = cfg.get("context_length")
|
||||
if ctx and isinstance(ctx, (int, float)):
|
||||
return int(ctx)
|
||||
# Fall back to max_context_length (theoretical model max)
|
||||
ctx = m.get("max_context_length") or m.get("context_length")
|
||||
if ctx and isinstance(ctx, (int, float)):
|
||||
return int(ctx)
|
||||
|
||||
# LM Studio / vLLM / llama.cpp: try /v1/models/{model}
|
||||
resp = client.get(f"{server_url}/v1/models/{model}")
|
||||
if resp.status_code == 200:
|
||||
data = resp.json()
|
||||
# vLLM returns max_model_len
|
||||
ctx = data.get("max_model_len") or data.get("context_length") or data.get("max_tokens")
|
||||
if ctx and isinstance(ctx, (int, float)):
|
||||
return int(ctx)
|
||||
|
||||
# Try /v1/models and find the model in the list.
|
||||
# Use _model_id_matches to handle "publisher/slug" vs bare "slug".
|
||||
resp = client.get(f"{server_url}/v1/models")
|
||||
if resp.status_code == 200:
|
||||
data = resp.json()
|
||||
models_list = data.get("data", [])
|
||||
for m in models_list:
|
||||
if _model_id_matches(m.get("id", ""), model):
|
||||
ctx = m.get("max_model_len") or m.get("context_length") or m.get("max_tokens")
|
||||
if ctx and isinstance(ctx, (int, float)):
|
||||
return int(ctx)
|
||||
except Exception:
|
||||
pass
|
||||
|
||||
return None
|
||||
|
||||
|
||||
def _normalize_model_version(model: str) -> str:
|
||||
"""Normalize version separators for matching.
|
||||
|
||||
Nous uses dashes: claude-opus-4-6, claude-sonnet-4-5
|
||||
OpenRouter uses dots: claude-opus-4.6, claude-sonnet-4.5
|
||||
Normalize both to dashes for comparison.
|
||||
"""
|
||||
return model.replace(".", "-")
|
||||
|
||||
|
||||
def _query_anthropic_context_length(model: str, base_url: str, api_key: str) -> Optional[int]:
|
||||
"""Query Anthropic's /v1/models endpoint for context length.
|
||||
|
||||
Only works with regular ANTHROPIC_API_KEY (sk-ant-api*).
|
||||
OAuth tokens (sk-ant-oat*) from Claude Code return 401.
|
||||
"""
|
||||
if not api_key or api_key.startswith("sk-ant-oat"):
|
||||
return None # OAuth tokens can't access /v1/models
|
||||
try:
|
||||
base = base_url.rstrip("/")
|
||||
if base.endswith("/v1"):
|
||||
base = base[:-3]
|
||||
url = f"{base}/v1/models?limit=1000"
|
||||
headers = {
|
||||
"x-api-key": api_key,
|
||||
"anthropic-version": "2023-06-01",
|
||||
}
|
||||
resp = requests.get(url, headers=headers, timeout=10)
|
||||
if resp.status_code != 200:
|
||||
return None
|
||||
data = resp.json()
|
||||
for m in data.get("data", []):
|
||||
if m.get("id") == model:
|
||||
ctx = m.get("max_input_tokens")
|
||||
if isinstance(ctx, int) and ctx > 0:
|
||||
return ctx
|
||||
except Exception as e:
|
||||
logger.debug("Anthropic /v1/models query failed: %s", e)
|
||||
return None
|
||||
|
||||
|
||||
def _resolve_nous_context_length(model: str) -> Optional[int]:
|
||||
"""Resolve Nous Portal model context length via OpenRouter metadata.
|
||||
|
||||
Nous model IDs are bare (e.g. 'claude-opus-4-6') while OpenRouter uses
|
||||
prefixed IDs (e.g. 'anthropic/claude-opus-4.6'). Try suffix matching
|
||||
with version normalization (dot↔dash).
|
||||
"""
|
||||
metadata = fetch_model_metadata() # OpenRouter cache
|
||||
# Exact match first
|
||||
if model in metadata:
|
||||
return metadata[model].get("context_length")
|
||||
|
||||
normalized = _normalize_model_version(model).lower()
|
||||
|
||||
for or_id, entry in metadata.items():
|
||||
bare = or_id.split("/", 1)[1] if "/" in or_id else or_id
|
||||
if bare.lower() == model.lower() or _normalize_model_version(bare).lower() == normalized:
|
||||
return entry.get("context_length")
|
||||
|
||||
# Partial prefix match for cases like gemini-3-flash → gemini-3-flash-preview
|
||||
# Require match to be at a word boundary (followed by -, :, or end of string)
|
||||
model_lower = model.lower()
|
||||
for or_id, entry in metadata.items():
|
||||
bare = or_id.split("/", 1)[1] if "/" in or_id else or_id
|
||||
for candidate, query in [(bare.lower(), model_lower), (_normalize_model_version(bare).lower(), normalized)]:
|
||||
if candidate.startswith(query) and (
|
||||
len(candidate) == len(query) or candidate[len(query)] in "-:."
|
||||
):
|
||||
return entry.get("context_length")
|
||||
|
||||
return None
|
||||
|
||||
|
||||
def get_model_context_length(
|
||||
model: str,
|
||||
base_url: str = "",
|
||||
api_key: str = "",
|
||||
config_context_length: int | None = None,
|
||||
provider: str = "",
|
||||
) -> int:
|
||||
"""Get the context length for a model.
|
||||
|
||||
Resolution order:
|
||||
0. Explicit config override (model.context_length or custom_providers per-model)
|
||||
1. Persistent cache (previously discovered via probing)
|
||||
2. Active endpoint metadata (/models for explicit custom endpoints)
|
||||
3. OpenRouter API metadata
|
||||
4. Hardcoded DEFAULT_CONTEXT_LENGTHS (fuzzy match for hosted routes only)
|
||||
5. First probe tier (2M) — will be narrowed on first context error
|
||||
3. Local server query (for local endpoints)
|
||||
4. Anthropic /v1/models API (API-key users only, not OAuth)
|
||||
5. OpenRouter live API metadata
|
||||
6. Nous suffix-match via OpenRouter cache
|
||||
7. models.dev registry lookup (provider-aware)
|
||||
8. Thin hardcoded defaults (broad family patterns)
|
||||
9. Default fallback (128K)
|
||||
"""
|
||||
# 0. Explicit config override — user knows best
|
||||
if config_context_length is not None and isinstance(config_context_length, int) and config_context_length > 0:
|
||||
return config_context_length
|
||||
|
||||
# Normalise provider-prefixed model names (e.g. "local:model-name" →
|
||||
# "model-name") so cache lookups and server queries use the bare ID that
|
||||
# local servers actually know about. Ollama "model:tag" colons are preserved.
|
||||
model = _strip_provider_prefix(model)
|
||||
|
||||
# 1. Check persistent cache (model+provider)
|
||||
if base_url:
|
||||
cached = get_cached_context_length(model, base_url)
|
||||
@@ -458,29 +766,82 @@ def get_model_context_length(model: str, base_url: str = "", api_key: str = "")
|
||||
# 2. Active endpoint metadata for explicit custom routes
|
||||
if _is_custom_endpoint(base_url):
|
||||
endpoint_metadata = fetch_endpoint_model_metadata(base_url, api_key=api_key)
|
||||
if model in endpoint_metadata:
|
||||
context_length = endpoint_metadata[model].get("context_length")
|
||||
matched = endpoint_metadata.get(model)
|
||||
if not matched:
|
||||
# Single-model servers: if only one model is loaded, use it
|
||||
if len(endpoint_metadata) == 1:
|
||||
matched = next(iter(endpoint_metadata.values()))
|
||||
else:
|
||||
# Fuzzy match: substring in either direction
|
||||
for key, entry in endpoint_metadata.items():
|
||||
if model in key or key in model:
|
||||
matched = entry
|
||||
break
|
||||
if matched:
|
||||
context_length = matched.get("context_length")
|
||||
if isinstance(context_length, int):
|
||||
return context_length
|
||||
if not _is_known_provider_base_url(base_url):
|
||||
# Explicit third-party endpoints should not borrow fuzzy global
|
||||
# defaults from unrelated providers with similarly named models.
|
||||
return CONTEXT_PROBE_TIERS[0]
|
||||
# 3. Try querying local server directly
|
||||
if is_local_endpoint(base_url):
|
||||
local_ctx = _query_local_context_length(model, base_url)
|
||||
if local_ctx and local_ctx > 0:
|
||||
save_context_length(model, base_url, local_ctx)
|
||||
return local_ctx
|
||||
logger.info(
|
||||
"Could not detect context length for model %r at %s — "
|
||||
"defaulting to %s tokens (probe-down). Set model.context_length "
|
||||
"in config.yaml to override.",
|
||||
model, base_url, f"{DEFAULT_FALLBACK_CONTEXT:,}",
|
||||
)
|
||||
return DEFAULT_FALLBACK_CONTEXT
|
||||
|
||||
# 3. OpenRouter API metadata
|
||||
# 4. Anthropic /v1/models API (only for regular API keys, not OAuth)
|
||||
if provider == "anthropic" or (
|
||||
base_url and "api.anthropic.com" in base_url
|
||||
):
|
||||
ctx = _query_anthropic_context_length(model, base_url or "https://api.anthropic.com", api_key)
|
||||
if ctx:
|
||||
return ctx
|
||||
|
||||
# 5. Provider-aware lookups (before generic OpenRouter cache)
|
||||
# These are provider-specific and take priority over the generic OR cache,
|
||||
# since the same model can have different context limits per provider
|
||||
# (e.g. claude-opus-4.6 is 1M on Anthropic but 128K on GitHub Copilot).
|
||||
if provider == "nous":
|
||||
ctx = _resolve_nous_context_length(model)
|
||||
if ctx:
|
||||
return ctx
|
||||
if provider:
|
||||
from agent.models_dev import lookup_models_dev_context
|
||||
ctx = lookup_models_dev_context(provider, model)
|
||||
if ctx:
|
||||
return ctx
|
||||
|
||||
# 6. OpenRouter live API metadata (provider-unaware fallback)
|
||||
metadata = fetch_model_metadata()
|
||||
if model in metadata:
|
||||
return metadata[model].get("context_length", 128000)
|
||||
|
||||
# 4. Hardcoded defaults (fuzzy match — longest key first for specificity)
|
||||
# 8. Hardcoded defaults (fuzzy match — longest key first for specificity)
|
||||
# Only check `default_model in model` (is the key a substring of the input).
|
||||
# The reverse (`model in default_model`) causes shorter names like
|
||||
# "claude-sonnet-4" to incorrectly match "claude-sonnet-4-6" and return 1M.
|
||||
for default_model, length in sorted(
|
||||
DEFAULT_CONTEXT_LENGTHS.items(), key=lambda x: len(x[0]), reverse=True
|
||||
):
|
||||
if default_model in model or model in default_model:
|
||||
if default_model in model:
|
||||
return length
|
||||
|
||||
# 5. Unknown model — start at highest probe tier
|
||||
return CONTEXT_PROBE_TIERS[0]
|
||||
# 9. Query local server as last resort
|
||||
if base_url and is_local_endpoint(base_url):
|
||||
local_ctx = _query_local_context_length(model, base_url)
|
||||
if local_ctx and local_ctx > 0:
|
||||
save_context_length(model, base_url, local_ctx)
|
||||
return local_ctx
|
||||
|
||||
# 10. Default fallback — 128K
|
||||
return DEFAULT_FALLBACK_CONTEXT
|
||||
|
||||
|
||||
def estimate_tokens_rough(text: str) -> int:
|
||||
|
||||
@@ -0,0 +1,171 @@
|
||||
"""Models.dev registry integration for provider-aware context length detection.
|
||||
|
||||
Fetches model metadata from https://models.dev/api.json — a community-maintained
|
||||
database of 3800+ models across 100+ providers, including per-provider context
|
||||
windows, pricing, and capabilities.
|
||||
|
||||
Data is cached in memory (1hr TTL) and on disk (~/.hermes/models_dev_cache.json)
|
||||
to avoid cold-start network latency.
|
||||
"""
|
||||
|
||||
import json
|
||||
import logging
|
||||
import os
|
||||
import time
|
||||
from pathlib import Path
|
||||
from typing import Any, Dict, Optional
|
||||
|
||||
import requests
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
MODELS_DEV_URL = "https://models.dev/api.json"
|
||||
_MODELS_DEV_CACHE_TTL = 3600 # 1 hour in-memory
|
||||
|
||||
# In-memory cache
|
||||
_models_dev_cache: Dict[str, Any] = {}
|
||||
_models_dev_cache_time: float = 0
|
||||
|
||||
# Provider ID mapping: Hermes provider names → models.dev provider IDs
|
||||
PROVIDER_TO_MODELS_DEV: Dict[str, str] = {
|
||||
"openrouter": "openrouter",
|
||||
"anthropic": "anthropic",
|
||||
"zai": "zai",
|
||||
"kimi-coding": "kimi-for-coding",
|
||||
"minimax": "minimax",
|
||||
"minimax-cn": "minimax-cn",
|
||||
"deepseek": "deepseek",
|
||||
"alibaba": "alibaba",
|
||||
"copilot": "github-copilot",
|
||||
"ai-gateway": "vercel",
|
||||
"opencode-zen": "opencode",
|
||||
"opencode-go": "opencode-go",
|
||||
"kilocode": "kilo",
|
||||
}
|
||||
|
||||
|
||||
def _get_cache_path() -> Path:
|
||||
"""Return path to disk cache file."""
|
||||
env_val = os.environ.get("HERMES_HOME", "")
|
||||
hermes_home = Path(env_val) if env_val else Path.home() / ".hermes"
|
||||
return hermes_home / "models_dev_cache.json"
|
||||
|
||||
|
||||
def _load_disk_cache() -> Dict[str, Any]:
|
||||
"""Load models.dev data from disk cache."""
|
||||
try:
|
||||
cache_path = _get_cache_path()
|
||||
if cache_path.exists():
|
||||
with open(cache_path, encoding="utf-8") as f:
|
||||
return json.load(f)
|
||||
except Exception as e:
|
||||
logger.debug("Failed to load models.dev disk cache: %s", e)
|
||||
return {}
|
||||
|
||||
|
||||
def _save_disk_cache(data: Dict[str, Any]) -> None:
|
||||
"""Save models.dev data to disk cache."""
|
||||
try:
|
||||
cache_path = _get_cache_path()
|
||||
cache_path.parent.mkdir(parents=True, exist_ok=True)
|
||||
with open(cache_path, "w", encoding="utf-8") as f:
|
||||
json.dump(data, f, separators=(",", ":"))
|
||||
except Exception as e:
|
||||
logger.debug("Failed to save models.dev disk cache: %s", e)
|
||||
|
||||
|
||||
def fetch_models_dev(force_refresh: bool = False) -> Dict[str, Any]:
|
||||
"""Fetch models.dev registry. In-memory cache (1hr) + disk fallback.
|
||||
|
||||
Returns the full registry dict keyed by provider ID, or empty dict on failure.
|
||||
"""
|
||||
global _models_dev_cache, _models_dev_cache_time
|
||||
|
||||
# Check in-memory cache
|
||||
if (
|
||||
not force_refresh
|
||||
and _models_dev_cache
|
||||
and (time.time() - _models_dev_cache_time) < _MODELS_DEV_CACHE_TTL
|
||||
):
|
||||
return _models_dev_cache
|
||||
|
||||
# Try network fetch
|
||||
try:
|
||||
response = requests.get(MODELS_DEV_URL, timeout=15)
|
||||
response.raise_for_status()
|
||||
data = response.json()
|
||||
if isinstance(data, dict) and len(data) > 0:
|
||||
_models_dev_cache = data
|
||||
_models_dev_cache_time = time.time()
|
||||
_save_disk_cache(data)
|
||||
logger.debug(
|
||||
"Fetched models.dev registry: %d providers, %d total models",
|
||||
len(data),
|
||||
sum(len(p.get("models", {})) for p in data.values() if isinstance(p, dict)),
|
||||
)
|
||||
return data
|
||||
except Exception as e:
|
||||
logger.debug("Failed to fetch models.dev: %s", e)
|
||||
|
||||
# Fall back to disk cache — use a short TTL (5 min) so we retry
|
||||
# the network fetch soon instead of serving stale data for a full hour.
|
||||
if not _models_dev_cache:
|
||||
_models_dev_cache = _load_disk_cache()
|
||||
if _models_dev_cache:
|
||||
_models_dev_cache_time = time.time() - _MODELS_DEV_CACHE_TTL + 300
|
||||
logger.debug("Loaded models.dev from disk cache (%d providers)", len(_models_dev_cache))
|
||||
|
||||
return _models_dev_cache
|
||||
|
||||
|
||||
def lookup_models_dev_context(provider: str, model: str) -> Optional[int]:
|
||||
"""Look up context_length for a provider+model combo in models.dev.
|
||||
|
||||
Returns the context window in tokens, or None if not found.
|
||||
Handles case-insensitive matching and filters out context=0 entries.
|
||||
"""
|
||||
mdev_provider_id = PROVIDER_TO_MODELS_DEV.get(provider)
|
||||
if not mdev_provider_id:
|
||||
return None
|
||||
|
||||
data = fetch_models_dev()
|
||||
provider_data = data.get(mdev_provider_id)
|
||||
if not isinstance(provider_data, dict):
|
||||
return None
|
||||
|
||||
models = provider_data.get("models", {})
|
||||
if not isinstance(models, dict):
|
||||
return None
|
||||
|
||||
# Exact match
|
||||
entry = models.get(model)
|
||||
if entry:
|
||||
ctx = _extract_context(entry)
|
||||
if ctx:
|
||||
return ctx
|
||||
|
||||
# Case-insensitive match
|
||||
model_lower = model.lower()
|
||||
for mid, mdata in models.items():
|
||||
if mid.lower() == model_lower:
|
||||
ctx = _extract_context(mdata)
|
||||
if ctx:
|
||||
return ctx
|
||||
|
||||
return None
|
||||
|
||||
|
||||
def _extract_context(entry: Dict[str, Any]) -> Optional[int]:
|
||||
"""Extract context_length from a models.dev model entry.
|
||||
|
||||
Returns None for invalid/zero values (some audio/image models have context=0).
|
||||
"""
|
||||
if not isinstance(entry, dict):
|
||||
return None
|
||||
limit = entry.get("limit")
|
||||
if not isinstance(limit, dict):
|
||||
return None
|
||||
ctx = limit.get("context")
|
||||
if isinstance(ctx, (int, float)) and ctx > 0:
|
||||
return int(ctx)
|
||||
return None
|
||||
+42
-23
@@ -206,11 +206,11 @@ PLATFORM_HINTS = {
|
||||
"contextually appropriate."
|
||||
),
|
||||
"cron": (
|
||||
"You are running as a scheduled cron job. Your final response is automatically "
|
||||
"delivered to the job's configured destination, so do not use send_message to "
|
||||
"send to that same target again. If you want the user to receive something in "
|
||||
"the scheduled destination, put it directly in your final response. Use "
|
||||
"send_message only for additional or different targets."
|
||||
"You are running as a scheduled cron job. There is no user present — you "
|
||||
"cannot ask questions, request clarification, or wait for follow-up. Execute "
|
||||
"the task fully and autonomously, making reasonable decisions where needed. "
|
||||
"Your final response is automatically delivered to the job's configured "
|
||||
"destination — put the primary content directly in your response."
|
||||
),
|
||||
"cli": (
|
||||
"You are a CLI AI Agent. Try not to use markdown but simple text "
|
||||
@@ -429,11 +429,42 @@ def _truncate_content(content: str, filename: str, max_chars: int = CONTEXT_FILE
|
||||
return head + marker + tail
|
||||
|
||||
|
||||
def build_context_files_prompt(cwd: Optional[str] = None) -> str:
|
||||
def load_soul_md() -> Optional[str]:
|
||||
"""Load SOUL.md from HERMES_HOME and return its content, or None.
|
||||
|
||||
Used as the agent identity (slot #1 in the system prompt). When this
|
||||
returns content, ``build_context_files_prompt`` should be called with
|
||||
``skip_soul=True`` so SOUL.md isn't injected twice.
|
||||
"""
|
||||
try:
|
||||
from hermes_cli.config import ensure_hermes_home
|
||||
ensure_hermes_home()
|
||||
except Exception as e:
|
||||
logger.debug("Could not ensure HERMES_HOME before loading SOUL.md: %s", e)
|
||||
|
||||
soul_path = Path(os.getenv("HERMES_HOME", Path.home() / ".hermes")) / "SOUL.md"
|
||||
if not soul_path.exists():
|
||||
return None
|
||||
try:
|
||||
content = soul_path.read_text(encoding="utf-8").strip()
|
||||
if not content:
|
||||
return None
|
||||
content = _scan_context_content(content, "SOUL.md")
|
||||
content = _truncate_content(content, "SOUL.md")
|
||||
return content
|
||||
except Exception as e:
|
||||
logger.debug("Could not read SOUL.md from %s: %s", soul_path, e)
|
||||
return None
|
||||
|
||||
|
||||
def build_context_files_prompt(cwd: Optional[str] = None, skip_soul: bool = False) -> str:
|
||||
"""Discover and load context files for the system prompt.
|
||||
|
||||
Discovery: AGENTS.md (recursive), .cursorrules / .cursor/rules/*.mdc,
|
||||
and SOUL.md from HERMES_HOME only. Each capped at 20,000 chars.
|
||||
|
||||
When *skip_soul* is True, SOUL.md is not included here (it was already
|
||||
loaded via ``load_soul_md()`` for the identity slot).
|
||||
"""
|
||||
if cwd is None:
|
||||
cwd = os.getcwd()
|
||||
@@ -523,23 +554,11 @@ def build_context_files_prompt(cwd: Optional[str] = None) -> str:
|
||||
hermes_md_content = _truncate_content(hermes_md_content, ".hermes.md")
|
||||
sections.append(hermes_md_content)
|
||||
|
||||
# SOUL.md from HERMES_HOME only
|
||||
try:
|
||||
from hermes_cli.config import ensure_hermes_home
|
||||
ensure_hermes_home()
|
||||
except Exception as e:
|
||||
logger.debug("Could not ensure HERMES_HOME before loading SOUL.md: %s", e)
|
||||
|
||||
soul_path = Path(os.getenv("HERMES_HOME", Path.home() / ".hermes")) / "SOUL.md"
|
||||
if soul_path.exists():
|
||||
try:
|
||||
content = soul_path.read_text(encoding="utf-8").strip()
|
||||
if content:
|
||||
content = _scan_context_content(content, "SOUL.md")
|
||||
content = _truncate_content(content, "SOUL.md")
|
||||
sections.append(content)
|
||||
except Exception as e:
|
||||
logger.debug("Could not read SOUL.md from %s: %s", soul_path, e)
|
||||
# SOUL.md from HERMES_HOME only — skip when already loaded as identity
|
||||
if not skip_soul:
|
||||
soul_content = load_soul_md()
|
||||
if soul_content:
|
||||
sections.append(soul_content)
|
||||
|
||||
if not sections:
|
||||
return ""
|
||||
|
||||
@@ -125,6 +125,8 @@ def resolve_turn_route(user_message: str, routing_config: Optional[Dict[str, Any
|
||||
"base_url": primary.get("base_url"),
|
||||
"provider": primary.get("provider"),
|
||||
"api_mode": primary.get("api_mode"),
|
||||
"command": primary.get("command"),
|
||||
"args": list(primary.get("args") or []),
|
||||
},
|
||||
"label": None,
|
||||
"signature": (
|
||||
@@ -132,6 +134,8 @@ def resolve_turn_route(user_message: str, routing_config: Optional[Dict[str, Any
|
||||
primary.get("provider"),
|
||||
primary.get("base_url"),
|
||||
primary.get("api_mode"),
|
||||
primary.get("command"),
|
||||
tuple(primary.get("args") or ()),
|
||||
),
|
||||
}
|
||||
|
||||
@@ -156,6 +160,8 @@ def resolve_turn_route(user_message: str, routing_config: Optional[Dict[str, Any
|
||||
"base_url": primary.get("base_url"),
|
||||
"provider": primary.get("provider"),
|
||||
"api_mode": primary.get("api_mode"),
|
||||
"command": primary.get("command"),
|
||||
"args": list(primary.get("args") or []),
|
||||
},
|
||||
"label": None,
|
||||
"signature": (
|
||||
@@ -163,6 +169,8 @@ def resolve_turn_route(user_message: str, routing_config: Optional[Dict[str, Any
|
||||
primary.get("provider"),
|
||||
primary.get("base_url"),
|
||||
primary.get("api_mode"),
|
||||
primary.get("command"),
|
||||
tuple(primary.get("args") or ()),
|
||||
),
|
||||
}
|
||||
|
||||
@@ -173,6 +181,8 @@ def resolve_turn_route(user_message: str, routing_config: Optional[Dict[str, Any
|
||||
"base_url": runtime.get("base_url"),
|
||||
"provider": runtime.get("provider"),
|
||||
"api_mode": runtime.get("api_mode"),
|
||||
"command": runtime.get("command"),
|
||||
"args": list(runtime.get("args") or []),
|
||||
},
|
||||
"label": f"smart route → {route.get('model')} ({runtime.get('provider')})",
|
||||
"signature": (
|
||||
@@ -180,5 +190,7 @@ def resolve_turn_route(user_message: str, routing_config: Optional[Dict[str, Any
|
||||
runtime.get("provider"),
|
||||
runtime.get("base_url"),
|
||||
runtime.get("api_mode"),
|
||||
runtime.get("command"),
|
||||
tuple(runtime.get("args") or ()),
|
||||
),
|
||||
}
|
||||
|
||||
@@ -973,6 +973,8 @@ def save_config_value(key_path: str, value: any) -> bool:
|
||||
return False
|
||||
|
||||
|
||||
|
||||
|
||||
# ============================================================================
|
||||
# HermesCLI Class
|
||||
# ============================================================================
|
||||
@@ -1046,6 +1048,14 @@ class HermesCLI:
|
||||
_config_model = _model_config.get("default", "") if isinstance(_model_config, dict) else (_model_config or "")
|
||||
_FALLBACK_MODEL = "anthropic/claude-opus-4.6"
|
||||
self.model = model or _config_model or _FALLBACK_MODEL
|
||||
# Auto-detect model from local server if still on fallback
|
||||
if self.model == _FALLBACK_MODEL:
|
||||
_base_url = _model_config.get("base_url", "") if isinstance(_model_config, dict) else ""
|
||||
if "localhost" in _base_url or "127.0.0.1" in _base_url:
|
||||
from hermes_cli.runtime_provider import _auto_detect_local_model
|
||||
_detected = _auto_detect_local_model(_base_url)
|
||||
if _detected:
|
||||
self.model = _detected
|
||||
# Track whether model was explicitly chosen by the user or fell back
|
||||
# to the global default. Provider-specific normalisation may override
|
||||
# the default silently but should warn when overriding an explicit choice.
|
||||
@@ -1069,6 +1079,8 @@ class HermesCLI:
|
||||
self._provider_source: Optional[str] = None
|
||||
self.provider = self.requested_provider
|
||||
self.api_mode = "chat_completions"
|
||||
self.acp_command: Optional[str] = None
|
||||
self.acp_args: list[str] = []
|
||||
self.base_url = (
|
||||
base_url
|
||||
or os.getenv("OPENAI_BASE_URL")
|
||||
@@ -1249,6 +1261,8 @@ class HermesCLI:
|
||||
def _get_status_bar_snapshot(self) -> Dict[str, Any]:
|
||||
model_name = self.model or "unknown"
|
||||
model_short = model_name.split("/")[-1] if "/" in model_name else model_name
|
||||
if model_short.endswith(".gguf"):
|
||||
model_short = model_short[:-5]
|
||||
if len(model_short) > 26:
|
||||
model_short = f"{model_short[:23]}..."
|
||||
|
||||
@@ -1385,27 +1399,35 @@ class HermesCLI:
|
||||
return [("class:status-bar", f" {self._build_status_bar_text()} ")]
|
||||
|
||||
def _normalize_model_for_provider(self, resolved_provider: str) -> bool:
|
||||
"""Strip provider prefixes and swap the default model for Codex.
|
||||
|
||||
When the resolved provider is ``openai-codex``:
|
||||
|
||||
1. Strip any ``provider/`` prefix (the Codex Responses API only
|
||||
accepts bare model slugs like ``gpt-5.4``, not ``openai/gpt-5.4``).
|
||||
2. If the active model is still the *untouched default* (user never
|
||||
explicitly chose a model), replace it with a Codex-compatible
|
||||
default so the first session doesn't immediately error.
|
||||
|
||||
If the user explicitly chose a model — *any* model — we trust them
|
||||
and let the API be the judge. No allowlists, no slug checks.
|
||||
|
||||
Returns True when the active model was changed.
|
||||
"""
|
||||
if resolved_provider != "openai-codex":
|
||||
return False
|
||||
|
||||
"""Normalize provider-specific model IDs and routing."""
|
||||
current_model = (self.model or "").strip()
|
||||
changed = False
|
||||
|
||||
if resolved_provider == "copilot":
|
||||
try:
|
||||
from hermes_cli.models import copilot_model_api_mode, normalize_copilot_model_id
|
||||
|
||||
canonical = normalize_copilot_model_id(current_model, api_key=self.api_key)
|
||||
if canonical and canonical != current_model:
|
||||
if not self._model_is_default:
|
||||
self.console.print(
|
||||
f"[yellow]⚠️ Normalized Copilot model '{current_model}' to '{canonical}'.[/]"
|
||||
)
|
||||
self.model = canonical
|
||||
current_model = canonical
|
||||
changed = True
|
||||
|
||||
resolved_mode = copilot_model_api_mode(current_model, api_key=self.api_key)
|
||||
if resolved_mode != self.api_mode:
|
||||
self.api_mode = resolved_mode
|
||||
changed = True
|
||||
except Exception:
|
||||
pass
|
||||
return changed
|
||||
|
||||
if resolved_provider != "openai-codex":
|
||||
return False
|
||||
|
||||
# 1. Strip provider prefix ("openai/gpt-5.4" → "gpt-5.4")
|
||||
if "/" in current_model:
|
||||
slug = current_model.split("/", 1)[1]
|
||||
@@ -1502,9 +1524,11 @@ class HermesCLI:
|
||||
# Track whether we're inside a reasoning/thinking block.
|
||||
# These tags are model-generated (system prompt tells the model
|
||||
# to use them) and get stripped from final_response. We must
|
||||
# suppress them during streaming too.
|
||||
_OPEN_TAGS = ("<REASONING_SCRATCHPAD>", "<think>", "<reasoning>", "<THINKING>")
|
||||
_CLOSE_TAGS = ("</REASONING_SCRATCHPAD>", "</think>", "</reasoning>", "</THINKING>")
|
||||
# suppress them during streaming too — unless show_reasoning is
|
||||
# enabled, in which case we route the inner content to the
|
||||
# reasoning display box instead of discarding it.
|
||||
_OPEN_TAGS = ("<REASONING_SCRATCHPAD>", "<think>", "<reasoning>", "<THINKING>", "<thinking>")
|
||||
_CLOSE_TAGS = ("</REASONING_SCRATCHPAD>", "</think>", "</reasoning>", "</THINKING>", "</thinking>")
|
||||
|
||||
# Append to a pre-filter buffer first
|
||||
self._stream_prefilt = getattr(self, "_stream_prefilt", "") + text
|
||||
@@ -1544,6 +1568,12 @@ class HermesCLI:
|
||||
idx = self._stream_prefilt.find(tag)
|
||||
if idx != -1:
|
||||
self._in_reasoning_block = False
|
||||
# When show_reasoning is on, route inner content to
|
||||
# the reasoning display box instead of discarding.
|
||||
if self.show_reasoning:
|
||||
inner = self._stream_prefilt[:idx]
|
||||
if inner:
|
||||
self._stream_reasoning_delta(inner)
|
||||
after = self._stream_prefilt[idx + len(tag):]
|
||||
self._stream_prefilt = ""
|
||||
# Process remaining text after close tag through full
|
||||
@@ -1551,10 +1581,15 @@ class HermesCLI:
|
||||
if after:
|
||||
self._stream_delta(after)
|
||||
return
|
||||
# Still inside reasoning block — keep only the tail that could
|
||||
# be a partial close tag prefix (save memory on long blocks).
|
||||
# When show_reasoning is on, stream reasoning content live
|
||||
# instead of silently accumulating. Keep only the tail that
|
||||
# could be a partial close tag prefix.
|
||||
max_tag_len = max(len(t) for t in _CLOSE_TAGS)
|
||||
if len(self._stream_prefilt) > max_tag_len:
|
||||
if self.show_reasoning:
|
||||
# Route the safe prefix to reasoning display
|
||||
safe_reasoning = self._stream_prefilt[:-max_tag_len]
|
||||
self._stream_reasoning_delta(safe_reasoning)
|
||||
self._stream_prefilt = self._stream_prefilt[-max_tag_len:]
|
||||
return
|
||||
|
||||
@@ -1681,6 +1716,8 @@ class HermesCLI:
|
||||
base_url = runtime.get("base_url")
|
||||
resolved_provider = runtime.get("provider", "openrouter")
|
||||
resolved_api_mode = runtime.get("api_mode", self.api_mode)
|
||||
resolved_acp_command = runtime.get("command")
|
||||
resolved_acp_args = list(runtime.get("args") or [])
|
||||
if not isinstance(api_key, str) or not api_key:
|
||||
self.console.print("[bold red]Provider resolver returned an empty API key.[/]")
|
||||
return False
|
||||
@@ -1692,9 +1729,13 @@ class HermesCLI:
|
||||
routing_changed = (
|
||||
resolved_provider != self.provider
|
||||
or resolved_api_mode != self.api_mode
|
||||
or resolved_acp_command != self.acp_command
|
||||
or resolved_acp_args != self.acp_args
|
||||
)
|
||||
self.provider = resolved_provider
|
||||
self.api_mode = resolved_api_mode
|
||||
self.acp_command = resolved_acp_command
|
||||
self.acp_args = resolved_acp_args
|
||||
self._provider_source = runtime.get("source")
|
||||
self.api_key = api_key
|
||||
self.base_url = base_url
|
||||
@@ -1724,6 +1765,8 @@ class HermesCLI:
|
||||
"base_url": self.base_url,
|
||||
"provider": self.provider,
|
||||
"api_mode": self.api_mode,
|
||||
"command": self.acp_command,
|
||||
"args": list(self.acp_args or []),
|
||||
},
|
||||
)
|
||||
|
||||
@@ -1792,6 +1835,8 @@ class HermesCLI:
|
||||
"base_url": self.base_url,
|
||||
"provider": self.provider,
|
||||
"api_mode": self.api_mode,
|
||||
"command": self.acp_command,
|
||||
"args": list(self.acp_args or []),
|
||||
}
|
||||
effective_model = model_override or self.model
|
||||
self.agent = AIAgent(
|
||||
@@ -1800,6 +1845,8 @@ class HermesCLI:
|
||||
base_url=runtime.get("base_url"),
|
||||
provider=runtime.get("provider"),
|
||||
api_mode=runtime.get("api_mode"),
|
||||
acp_command=runtime.get("command"),
|
||||
acp_args=runtime.get("args"),
|
||||
max_iterations=self.max_turns,
|
||||
enabled_toolsets=self.enabled_toolsets,
|
||||
verbose_logging=self.verbose,
|
||||
@@ -1836,6 +1883,8 @@ class HermesCLI:
|
||||
runtime.get("provider"),
|
||||
runtime.get("base_url"),
|
||||
runtime.get("api_mode"),
|
||||
runtime.get("command"),
|
||||
tuple(runtime.get("args") or ()),
|
||||
)
|
||||
|
||||
if self._pending_title and self._session_db:
|
||||
@@ -2697,6 +2746,7 @@ class HermesCLI:
|
||||
if self.agent:
|
||||
self.agent.session_id = self.session_id
|
||||
self.agent.session_start = self.session_start
|
||||
self.agent.reset_session_state()
|
||||
if hasattr(self.agent, "_last_flushed_db_idx"):
|
||||
self.agent._last_flushed_db_idx = 0
|
||||
if hasattr(self.agent, "_todo_store"):
|
||||
@@ -2856,6 +2906,14 @@ class HermesCLI:
|
||||
for mid, desc in curated:
|
||||
current_marker = " ← current" if (is_active and mid == self.model) else ""
|
||||
print(f" {mid}{current_marker}")
|
||||
elif p["id"] == "custom":
|
||||
from hermes_cli.models import _get_custom_base_url
|
||||
custom_url = _get_custom_base_url() or os.getenv("OPENAI_BASE_URL", "")
|
||||
if custom_url:
|
||||
print(f" endpoint: {custom_url}")
|
||||
if is_active:
|
||||
print(f" model: {self.model} ← current")
|
||||
print(f" (use /model custom:<model-name>)")
|
||||
else:
|
||||
print(f" (use /model {p['id']}:<model-name>)")
|
||||
print()
|
||||
@@ -3459,8 +3517,17 @@ class HermesCLI:
|
||||
# Parse provider:model syntax (e.g. "openrouter:anthropic/claude-sonnet-4.5")
|
||||
current_provider = self.provider or self.requested_provider or "openrouter"
|
||||
target_provider, new_model = parse_model_input(raw_input, current_provider)
|
||||
# Auto-detect provider when no explicit provider:model syntax was used
|
||||
if target_provider == current_provider:
|
||||
# Auto-detect provider when no explicit provider:model syntax was used.
|
||||
# Skip auto-detection for custom providers — the model name might
|
||||
# coincidentally match a known provider's catalog, but the user
|
||||
# intends to use it on their custom endpoint. Require explicit
|
||||
# provider:model syntax (e.g. /model openai-codex:gpt-5.2-codex)
|
||||
# to switch away from a custom endpoint.
|
||||
_base = self.base_url or ""
|
||||
is_custom = current_provider == "custom" or (
|
||||
"localhost" in _base or "127.0.0.1" in _base
|
||||
)
|
||||
if target_provider == current_provider and not is_custom:
|
||||
from hermes_cli.models import detect_provider_for_model
|
||||
detected = detect_provider_for_model(new_model, current_provider)
|
||||
if detected:
|
||||
@@ -3528,6 +3595,13 @@ class HermesCLI:
|
||||
if message:
|
||||
print(f" Reason: {message}")
|
||||
print(" Note: Model will revert on restart. Use a verified model to save to config.")
|
||||
|
||||
# Helpful hint when staying on a custom endpoint
|
||||
if is_custom and not provider_changed:
|
||||
endpoint = self.base_url or "custom endpoint"
|
||||
print(f" Endpoint: {endpoint}")
|
||||
print(f" Tip: To switch providers, use /model provider:model")
|
||||
print(f" e.g. /model openai-codex:gpt-5.2-codex")
|
||||
else:
|
||||
self._show_model_and_providers()
|
||||
elif canonical == "provider":
|
||||
@@ -3604,6 +3678,18 @@ class HermesCLI:
|
||||
self._handle_stop_command()
|
||||
elif canonical == "background":
|
||||
self._handle_background_command(cmd_original)
|
||||
elif canonical == "queue":
|
||||
if not self._agent_running:
|
||||
_cprint(" /queue only works while Hermes is busy. Just type your message normally.")
|
||||
else:
|
||||
# Extract prompt after "/queue " or "/q "
|
||||
parts = cmd_original.split(None, 1)
|
||||
payload = parts[1].strip() if len(parts) > 1 else ""
|
||||
if not payload:
|
||||
_cprint(" Usage: /queue <prompt>")
|
||||
else:
|
||||
self._pending_input.put(payload)
|
||||
_cprint(f" Queued for the next turn: {payload[:80]}{'...' if len(payload) > 80 else ''}")
|
||||
elif canonical == "skin":
|
||||
self._handle_skin_command(cmd_original)
|
||||
elif canonical == "voice":
|
||||
@@ -3765,6 +3851,8 @@ class HermesCLI:
|
||||
base_url=turn_route["runtime"].get("base_url"),
|
||||
provider=turn_route["runtime"].get("provider"),
|
||||
api_mode=turn_route["runtime"].get("api_mode"),
|
||||
acp_command=turn_route["runtime"].get("command"),
|
||||
acp_args=turn_route["runtime"].get("args"),
|
||||
max_iterations=self.max_turns,
|
||||
enabled_toolsets=self.enabled_toolsets,
|
||||
quiet_mode=True,
|
||||
@@ -3890,7 +3978,7 @@ class HermesCLI:
|
||||
parts = cmd.strip().split(None, 1)
|
||||
sub = parts[1].lower().strip() if len(parts) > 1 else "status"
|
||||
|
||||
_DEFAULT_CDP = "ws://localhost:9222"
|
||||
_DEFAULT_CDP = "http://localhost:9222"
|
||||
current = os.environ.get("BROWSER_CDP_URL", "").strip()
|
||||
|
||||
if sub.startswith("connect"):
|
||||
@@ -5851,7 +5939,12 @@ class HermesCLI:
|
||||
|
||||
@kb.add('tab', eager=True)
|
||||
def handle_tab(event):
|
||||
"""Tab: accept completion and re-trigger if we just completed a provider.
|
||||
"""Tab: accept completion, auto-suggestion, or start completions.
|
||||
|
||||
Priority:
|
||||
1. Completion menu open → accept selected completion
|
||||
2. Ghost text suggestion available → accept auto-suggestion
|
||||
3. Otherwise → start completion menu
|
||||
|
||||
After accepting a provider like 'anthropic:', the completion menu
|
||||
closes and complete_while_typing doesn't fire (no keystroke).
|
||||
@@ -5860,6 +5953,7 @@ class HermesCLI:
|
||||
"""
|
||||
buf = event.current_buffer
|
||||
if buf.complete_state:
|
||||
# Completion menu is open — accept the selection
|
||||
completion = buf.complete_state.current_completion
|
||||
if completion is None:
|
||||
# Menu open but nothing selected — select first then grab it
|
||||
@@ -5873,8 +5967,11 @@ class HermesCLI:
|
||||
text = buf.document.text_before_cursor
|
||||
if text.startswith("/model ") and text.endswith(":"):
|
||||
buf.start_completion()
|
||||
elif buf.suggestion and buf.suggestion.text:
|
||||
# No completion menu, but there's a ghost text auto-suggestion — accept it
|
||||
buf.insert_text(buf.suggestion.text)
|
||||
else:
|
||||
# No menu open — start completions from scratch
|
||||
# No menu and no suggestion — start completions from scratch
|
||||
buf.start_completion()
|
||||
|
||||
# --- Clarify tool: arrow-key navigation for multiple-choice questions ---
|
||||
|
||||
+49
-4
@@ -34,6 +34,7 @@ HERMES_DIR = Path(os.getenv("HERMES_HOME", Path.home() / ".hermes"))
|
||||
CRON_DIR = HERMES_DIR / "cron"
|
||||
JOBS_FILE = CRON_DIR / "jobs.json"
|
||||
OUTPUT_DIR = CRON_DIR / "output"
|
||||
ONESHOT_GRACE_SECONDS = 120
|
||||
|
||||
|
||||
def _normalize_skill_list(skill: Optional[str] = None, skills: Optional[Any] = None) -> List[str]:
|
||||
@@ -220,6 +221,33 @@ def _ensure_aware(dt: datetime) -> datetime:
|
||||
return dt.astimezone(target_tz)
|
||||
|
||||
|
||||
def _recoverable_oneshot_run_at(
|
||||
schedule: Dict[str, Any],
|
||||
now: datetime,
|
||||
*,
|
||||
last_run_at: Optional[str] = None,
|
||||
) -> Optional[str]:
|
||||
"""Return a one-shot run time if it is still eligible to fire.
|
||||
|
||||
One-shot jobs get a small grace window so jobs created a few seconds after
|
||||
their requested minute still run on the next tick. Once a one-shot has
|
||||
already run, it is never eligible again.
|
||||
"""
|
||||
if schedule.get("kind") != "once":
|
||||
return None
|
||||
if last_run_at:
|
||||
return None
|
||||
|
||||
run_at = schedule.get("run_at")
|
||||
if not run_at:
|
||||
return None
|
||||
|
||||
run_at_dt = _ensure_aware(datetime.fromisoformat(run_at))
|
||||
if run_at_dt >= now - timedelta(seconds=ONESHOT_GRACE_SECONDS):
|
||||
return run_at
|
||||
return None
|
||||
|
||||
|
||||
def compute_next_run(schedule: Dict[str, Any], last_run_at: Optional[str] = None) -> Optional[str]:
|
||||
"""
|
||||
Compute the next run time for a schedule.
|
||||
@@ -229,9 +257,7 @@ def compute_next_run(schedule: Dict[str, Any], last_run_at: Optional[str] = None
|
||||
now = _hermes_now()
|
||||
|
||||
if schedule["kind"] == "once":
|
||||
run_at = _ensure_aware(datetime.fromisoformat(schedule["run_at"]))
|
||||
# If in the future, return it; if in the past, no more runs
|
||||
return schedule["run_at"] if run_at > now else None
|
||||
return _recoverable_oneshot_run_at(schedule, now, last_run_at=last_run_at)
|
||||
|
||||
elif schedule["kind"] == "interval":
|
||||
minutes = schedule["minutes"]
|
||||
@@ -555,7 +581,26 @@ def get_due_jobs() -> List[Dict[str, Any]]:
|
||||
|
||||
next_run = job.get("next_run_at")
|
||||
if not next_run:
|
||||
continue
|
||||
recovered_next = _recoverable_oneshot_run_at(
|
||||
job.get("schedule", {}),
|
||||
now,
|
||||
last_run_at=job.get("last_run_at"),
|
||||
)
|
||||
if not recovered_next:
|
||||
continue
|
||||
|
||||
job["next_run_at"] = recovered_next
|
||||
next_run = recovered_next
|
||||
logger.info(
|
||||
"Job '%s' had no next_run_at; recovering one-shot run at %s",
|
||||
job.get("name", job["id"]),
|
||||
recovered_next,
|
||||
)
|
||||
for rj in raw_jobs:
|
||||
if rj["id"] == job["id"]:
|
||||
rj["next_run_at"] = recovered_next
|
||||
needs_save = True
|
||||
break
|
||||
|
||||
next_run_dt = _ensure_aware(datetime.fromisoformat(next_run))
|
||||
if next_run_dt <= now:
|
||||
|
||||
+22
-2
@@ -136,6 +136,10 @@ def _deliver_result(job: dict, content: str) -> None:
|
||||
"slack": Platform.SLACK,
|
||||
"whatsapp": Platform.WHATSAPP,
|
||||
"signal": Platform.SIGNAL,
|
||||
"matrix": Platform.MATRIX,
|
||||
"mattermost": Platform.MATTERMOST,
|
||||
"homeassistant": Platform.HOMEASSISTANT,
|
||||
"dingtalk": Platform.DINGTALK,
|
||||
"email": Platform.EMAIL,
|
||||
"sms": Platform.SMS,
|
||||
}
|
||||
@@ -207,11 +211,14 @@ def _build_job_prompt(job: dict) -> str:
|
||||
from tools.skills_tool import skill_view
|
||||
|
||||
parts = []
|
||||
skipped: list[str] = []
|
||||
for skill_name in skill_names:
|
||||
loaded = json.loads(skill_view(skill_name))
|
||||
if not loaded.get("success"):
|
||||
error = loaded.get("error") or f"Failed to load skill '{skill_name}'"
|
||||
raise RuntimeError(error)
|
||||
logger.warning("Cron job '%s': skill not found, skipping — %s", job.get("name", job.get("id")), error)
|
||||
skipped.append(skill_name)
|
||||
continue
|
||||
|
||||
content = str(loaded.get("content") or "").strip()
|
||||
if parts:
|
||||
@@ -224,6 +231,15 @@ def _build_job_prompt(job: dict) -> str:
|
||||
]
|
||||
)
|
||||
|
||||
if skipped:
|
||||
notice = (
|
||||
f"[SYSTEM: The following skill(s) were listed for this job but could not be found "
|
||||
f"and were skipped: {', '.join(skipped)}. "
|
||||
f"Start your response with a brief notice so the user is aware, e.g.: "
|
||||
f"'⚠️ Skill(s) not found and skipped: {', '.join(skipped)}']"
|
||||
)
|
||||
parts.insert(0, notice)
|
||||
|
||||
if prompt:
|
||||
parts.extend(["", f"The user has provided the following instruction alongside the skill invocation: {prompt}"])
|
||||
return "\n".join(parts)
|
||||
@@ -359,6 +375,8 @@ def run_job(job: dict) -> tuple[bool, str, str, Optional[str]]:
|
||||
"base_url": runtime.get("base_url"),
|
||||
"provider": runtime.get("provider"),
|
||||
"api_mode": runtime.get("api_mode"),
|
||||
"command": runtime.get("command"),
|
||||
"args": list(runtime.get("args") or []),
|
||||
},
|
||||
)
|
||||
|
||||
@@ -368,6 +386,8 @@ def run_job(job: dict) -> tuple[bool, str, str, Optional[str]]:
|
||||
base_url=turn_route["runtime"].get("base_url"),
|
||||
provider=turn_route["runtime"].get("provider"),
|
||||
api_mode=turn_route["runtime"].get("api_mode"),
|
||||
acp_command=turn_route["runtime"].get("command"),
|
||||
acp_args=turn_route["runtime"].get("args"),
|
||||
max_iterations=max_iterations,
|
||||
reasoning_config=reasoning_config,
|
||||
prefill_messages=prefill_messages,
|
||||
@@ -375,7 +395,7 @@ def run_job(job: dict) -> tuple[bool, str, str, Optional[str]]:
|
||||
providers_ignored=pr.get("ignore"),
|
||||
providers_order=pr.get("order"),
|
||||
provider_sort=pr.get("sort"),
|
||||
disabled_toolsets=["cronjob"],
|
||||
disabled_toolsets=["cronjob", "messaging", "clarify"],
|
||||
quiet_mode=True,
|
||||
platform="cron",
|
||||
session_id=f"cron_{job_id}_{_hermes_now().strftime('%Y%m%d_%H%M%S')}",
|
||||
|
||||
+87
-7
@@ -32,6 +32,15 @@ def _coerce_bool(value: Any, default: bool = True) -> bool:
|
||||
return bool(value)
|
||||
|
||||
|
||||
def _normalize_unauthorized_dm_behavior(value: Any, default: str = "pair") -> str:
|
||||
"""Normalize unauthorized DM behavior to a supported value."""
|
||||
if isinstance(value, str):
|
||||
normalized = value.strip().lower()
|
||||
if normalized in {"pair", "ignore"}:
|
||||
return normalized
|
||||
return default
|
||||
|
||||
|
||||
class Platform(Enum):
|
||||
"""Supported messaging platforms."""
|
||||
LOCAL = "local"
|
||||
@@ -47,6 +56,7 @@ class Platform(Enum):
|
||||
SMS = "sms"
|
||||
DINGTALK = "dingtalk"
|
||||
API_SERVER = "api_server"
|
||||
WEBHOOK = "webhook"
|
||||
|
||||
|
||||
@dataclass
|
||||
@@ -215,6 +225,9 @@ class GatewayConfig:
|
||||
# Session isolation in shared chats
|
||||
group_sessions_per_user: bool = True # Isolate group/channel sessions per participant when user IDs are available
|
||||
|
||||
# Unauthorized DM policy
|
||||
unauthorized_dm_behavior: str = "pair" # "pair" or "ignore"
|
||||
|
||||
# Streaming configuration
|
||||
streaming: StreamingConfig = field(default_factory=StreamingConfig)
|
||||
|
||||
@@ -242,6 +255,9 @@ class GatewayConfig:
|
||||
# API Server uses enabled flag only (no token needed)
|
||||
elif platform == Platform.API_SERVER:
|
||||
connected.append(platform)
|
||||
# Webhook uses enabled flag only (secrets are per-route)
|
||||
elif platform == Platform.WEBHOOK:
|
||||
connected.append(platform)
|
||||
return connected
|
||||
|
||||
def get_home_channel(self, platform: Platform) -> Optional[HomeChannel]:
|
||||
@@ -289,6 +305,7 @@ class GatewayConfig:
|
||||
"always_log_local": self.always_log_local,
|
||||
"stt_enabled": self.stt_enabled,
|
||||
"group_sessions_per_user": self.group_sessions_per_user,
|
||||
"unauthorized_dm_behavior": self.unauthorized_dm_behavior,
|
||||
"streaming": self.streaming.to_dict(),
|
||||
}
|
||||
|
||||
@@ -331,6 +348,10 @@ class GatewayConfig:
|
||||
stt_enabled = data.get("stt", {}).get("enabled") if isinstance(data.get("stt"), dict) else None
|
||||
|
||||
group_sessions_per_user = data.get("group_sessions_per_user")
|
||||
unauthorized_dm_behavior = _normalize_unauthorized_dm_behavior(
|
||||
data.get("unauthorized_dm_behavior"),
|
||||
"pair",
|
||||
)
|
||||
|
||||
return cls(
|
||||
platforms=platforms,
|
||||
@@ -343,9 +364,21 @@ class GatewayConfig:
|
||||
always_log_local=data.get("always_log_local", True),
|
||||
stt_enabled=_coerce_bool(stt_enabled, True),
|
||||
group_sessions_per_user=_coerce_bool(group_sessions_per_user, True),
|
||||
unauthorized_dm_behavior=unauthorized_dm_behavior,
|
||||
streaming=StreamingConfig.from_dict(data.get("streaming", {})),
|
||||
)
|
||||
|
||||
def get_unauthorized_dm_behavior(self, platform: Optional[Platform] = None) -> str:
|
||||
"""Return the effective unauthorized-DM behavior for a platform."""
|
||||
if platform:
|
||||
platform_cfg = self.platforms.get(platform)
|
||||
if platform_cfg and "unauthorized_dm_behavior" in platform_cfg.extra:
|
||||
return _normalize_unauthorized_dm_behavior(
|
||||
platform_cfg.extra.get("unauthorized_dm_behavior"),
|
||||
self.unauthorized_dm_behavior,
|
||||
)
|
||||
return self.unauthorized_dm_behavior
|
||||
|
||||
|
||||
def load_gateway_config() -> GatewayConfig:
|
||||
"""
|
||||
@@ -416,6 +449,44 @@ def load_gateway_config() -> GatewayConfig:
|
||||
if "always_log_local" in yaml_cfg:
|
||||
gw_data["always_log_local"] = yaml_cfg["always_log_local"]
|
||||
|
||||
if "unauthorized_dm_behavior" in yaml_cfg:
|
||||
gw_data["unauthorized_dm_behavior"] = _normalize_unauthorized_dm_behavior(
|
||||
yaml_cfg.get("unauthorized_dm_behavior"),
|
||||
"pair",
|
||||
)
|
||||
|
||||
# Bridge per-platform settings from config.yaml into gw_data
|
||||
platforms_data = gw_data.setdefault("platforms", {})
|
||||
if not isinstance(platforms_data, dict):
|
||||
platforms_data = {}
|
||||
gw_data["platforms"] = platforms_data
|
||||
for plat in Platform:
|
||||
if plat == Platform.LOCAL:
|
||||
continue
|
||||
platform_cfg = yaml_cfg.get(plat.value)
|
||||
if not isinstance(platform_cfg, dict):
|
||||
continue
|
||||
# Collect bridgeable keys from this platform section
|
||||
bridged = {}
|
||||
if "unauthorized_dm_behavior" in platform_cfg:
|
||||
bridged["unauthorized_dm_behavior"] = _normalize_unauthorized_dm_behavior(
|
||||
platform_cfg.get("unauthorized_dm_behavior"),
|
||||
gw_data.get("unauthorized_dm_behavior", "pair"),
|
||||
)
|
||||
if "reply_prefix" in platform_cfg:
|
||||
bridged["reply_prefix"] = platform_cfg["reply_prefix"]
|
||||
if not bridged:
|
||||
continue
|
||||
plat_data = platforms_data.setdefault(plat.value, {})
|
||||
if not isinstance(plat_data, dict):
|
||||
plat_data = {}
|
||||
platforms_data[plat.value] = plat_data
|
||||
extra = plat_data.setdefault("extra", {})
|
||||
if not isinstance(extra, dict):
|
||||
extra = {}
|
||||
plat_data["extra"] = extra
|
||||
extra.update(bridged)
|
||||
|
||||
# Discord settings → env vars (env vars take precedence)
|
||||
discord_cfg = yaml_cfg.get("discord", {})
|
||||
if isinstance(discord_cfg, dict):
|
||||
@@ -428,13 +499,6 @@ def load_gateway_config() -> GatewayConfig:
|
||||
os.environ["DISCORD_FREE_RESPONSE_CHANNELS"] = str(frc)
|
||||
if "auto_thread" in discord_cfg and not os.getenv("DISCORD_AUTO_THREAD"):
|
||||
os.environ["DISCORD_AUTO_THREAD"] = str(discord_cfg["auto_thread"]).lower()
|
||||
|
||||
# Bridge whatsapp settings from config.yaml into platform config
|
||||
whatsapp_cfg = yaml_cfg.get("whatsapp", {})
|
||||
if isinstance(whatsapp_cfg, dict) and "reply_prefix" in whatsapp_cfg:
|
||||
if Platform.WHATSAPP not in config.platforms:
|
||||
config.platforms[Platform.WHATSAPP] = PlatformConfig()
|
||||
config.platforms[Platform.WHATSAPP].extra["reply_prefix"] = whatsapp_cfg["reply_prefix"]
|
||||
except Exception:
|
||||
pass
|
||||
|
||||
@@ -674,6 +738,22 @@ def _apply_env_overrides(config: GatewayConfig) -> None:
|
||||
if api_server_host:
|
||||
config.platforms[Platform.API_SERVER].extra["host"] = api_server_host
|
||||
|
||||
# Webhook platform
|
||||
webhook_enabled = os.getenv("WEBHOOK_ENABLED", "").lower() in ("true", "1", "yes")
|
||||
webhook_port = os.getenv("WEBHOOK_PORT")
|
||||
webhook_secret = os.getenv("WEBHOOK_SECRET", "")
|
||||
if webhook_enabled:
|
||||
if Platform.WEBHOOK not in config.platforms:
|
||||
config.platforms[Platform.WEBHOOK] = PlatformConfig()
|
||||
config.platforms[Platform.WEBHOOK].enabled = True
|
||||
if webhook_port:
|
||||
try:
|
||||
config.platforms[Platform.WEBHOOK].extra["port"] = int(webhook_port)
|
||||
except ValueError:
|
||||
pass
|
||||
if webhook_secret:
|
||||
config.platforms[Platform.WEBHOOK].extra["secret"] = webhook_secret
|
||||
|
||||
# Session settings
|
||||
idle_minutes = os.getenv("SESSION_IDLE_MINUTES")
|
||||
if idle_minutes:
|
||||
|
||||
@@ -1099,6 +1099,22 @@ class BasePlatformAdapter(ABC):
|
||||
print(f"[{self.name}] Error handling message: {e}")
|
||||
import traceback
|
||||
traceback.print_exc()
|
||||
# Send the error to the user so they aren't left with radio silence
|
||||
try:
|
||||
error_type = type(e).__name__
|
||||
error_detail = str(e)[:300] if str(e) else "no details available"
|
||||
_thread_metadata = {"thread_id": event.source.thread_id} if event.source.thread_id else None
|
||||
await self.send(
|
||||
chat_id=event.source.chat_id,
|
||||
content=(
|
||||
f"Sorry, I encountered an error ({error_type}).\n"
|
||||
f"{error_detail}\n"
|
||||
"Try again or use /reset to start a fresh session."
|
||||
),
|
||||
metadata=_thread_metadata,
|
||||
)
|
||||
except Exception:
|
||||
pass # Last resort — don't let error reporting crash the handler
|
||||
finally:
|
||||
# Stop typing indicator
|
||||
typing_task.cancel()
|
||||
|
||||
@@ -179,6 +179,11 @@ class SignalAdapter(BasePlatformAdapter):
|
||||
# Normalize account for self-message filtering
|
||||
self._account_normalized = self.account.strip()
|
||||
|
||||
# Track recently sent message timestamps to prevent echo-back loops
|
||||
# in Note to Self / self-chat mode (mirrors WhatsApp recentlySentIds)
|
||||
self._recent_sent_timestamps: set = set()
|
||||
self._max_recent_timestamps = 50
|
||||
|
||||
logger.info("Signal adapter initialized: url=%s account=%s groups=%s",
|
||||
self.http_url, _redact_phone(self.account),
|
||||
"enabled" if self.group_allow_from else "disabled")
|
||||
@@ -353,10 +358,26 @@ class SignalAdapter(BasePlatformAdapter):
|
||||
# Unwrap nested envelope if present
|
||||
envelope_data = envelope.get("envelope", envelope)
|
||||
|
||||
# Filter syncMessage envelopes (sent transcripts, read receipts, etc.)
|
||||
# signal-cli may set syncMessage to null vs omitting it, so check key existence
|
||||
# Handle syncMessage: extract "Note to Self" messages (sent to own account)
|
||||
# while still filtering other sync events (read receipts, typing, etc.)
|
||||
is_note_to_self = False
|
||||
if "syncMessage" in envelope_data:
|
||||
return
|
||||
sync_msg = envelope_data.get("syncMessage")
|
||||
if sync_msg and isinstance(sync_msg, dict):
|
||||
sent_msg = sync_msg.get("sentMessage")
|
||||
if sent_msg and isinstance(sent_msg, dict):
|
||||
dest = sent_msg.get("destinationNumber") or sent_msg.get("destination")
|
||||
sent_ts = sent_msg.get("timestamp")
|
||||
if dest == self._account_normalized:
|
||||
# Check if this is an echo of our own outbound reply
|
||||
if sent_ts and sent_ts in self._recent_sent_timestamps:
|
||||
self._recent_sent_timestamps.discard(sent_ts)
|
||||
return
|
||||
# Genuine user Note to Self — promote to dataMessage
|
||||
is_note_to_self = True
|
||||
envelope_data = {**envelope_data, "dataMessage": sent_msg}
|
||||
if not is_note_to_self:
|
||||
return
|
||||
|
||||
# Extract sender info
|
||||
sender = (
|
||||
@@ -371,8 +392,8 @@ class SignalAdapter(BasePlatformAdapter):
|
||||
logger.debug("Signal: ignoring envelope with no sender")
|
||||
return
|
||||
|
||||
# Self-message filtering — prevent reply loops
|
||||
if self._account_normalized and sender == self._account_normalized:
|
||||
# Self-message filtering — prevent reply loops (but allow Note to Self)
|
||||
if self._account_normalized and sender == self._account_normalized and not is_note_to_self:
|
||||
return
|
||||
|
||||
# Filter stories
|
||||
@@ -577,9 +598,18 @@ class SignalAdapter(BasePlatformAdapter):
|
||||
result = await self._rpc("send", params)
|
||||
|
||||
if result is not None:
|
||||
self._track_sent_timestamp(result)
|
||||
return SendResult(success=True)
|
||||
return SendResult(success=False, error="RPC send failed")
|
||||
|
||||
def _track_sent_timestamp(self, rpc_result) -> None:
|
||||
"""Record outbound message timestamp for echo-back filtering."""
|
||||
ts = rpc_result.get("timestamp") if isinstance(rpc_result, dict) else None
|
||||
if ts:
|
||||
self._recent_sent_timestamps.add(ts)
|
||||
if len(self._recent_sent_timestamps) > self._max_recent_timestamps:
|
||||
self._recent_sent_timestamps.pop()
|
||||
|
||||
async def send_typing(self, chat_id: str, metadata=None) -> None:
|
||||
"""Send a typing indicator."""
|
||||
params: Dict[str, Any] = {
|
||||
@@ -635,6 +665,7 @@ class SignalAdapter(BasePlatformAdapter):
|
||||
|
||||
result = await self._rpc("send", params)
|
||||
if result is not None:
|
||||
self._track_sent_timestamp(result)
|
||||
return SendResult(success=True)
|
||||
return SendResult(success=False, error="RPC send with attachment failed")
|
||||
|
||||
@@ -665,6 +696,7 @@ class SignalAdapter(BasePlatformAdapter):
|
||||
|
||||
result = await self._rpc("send", params)
|
||||
if result is not None:
|
||||
self._track_sent_timestamp(result)
|
||||
return SendResult(success=True)
|
||||
return SendResult(success=False, error="RPC send document failed")
|
||||
|
||||
|
||||
@@ -0,0 +1,557 @@
|
||||
"""Generic webhook platform adapter.
|
||||
|
||||
Runs an aiohttp HTTP server that receives webhook POSTs from external
|
||||
services (GitHub, GitLab, JIRA, Stripe, etc.), validates HMAC signatures,
|
||||
transforms payloads into agent prompts, and routes responses back to the
|
||||
source or to another configured platform.
|
||||
|
||||
Configuration lives in config.yaml under platforms.webhook.extra.routes.
|
||||
Each route defines:
|
||||
- events: which event types to accept (header-based filtering)
|
||||
- secret: HMAC secret for signature validation (REQUIRED)
|
||||
- prompt: template string formatted with the webhook payload
|
||||
- skills: optional list of skills to load for the agent
|
||||
- deliver: where to send the response (github_comment, telegram, etc.)
|
||||
- deliver_extra: additional delivery config (repo, pr_number, chat_id)
|
||||
|
||||
Security:
|
||||
- HMAC secret is required per route (validated at startup)
|
||||
- Rate limiting per route (fixed-window, configurable)
|
||||
- Idempotency cache prevents duplicate agent runs on webhook retries
|
||||
- Body size limits checked before reading payload
|
||||
- Set secret to "INSECURE_NO_AUTH" to skip validation (testing only)
|
||||
"""
|
||||
|
||||
import asyncio
|
||||
import hashlib
|
||||
import hmac
|
||||
import json
|
||||
import logging
|
||||
import re
|
||||
import subprocess
|
||||
import time
|
||||
from typing import Any, Dict, List, Optional
|
||||
|
||||
try:
|
||||
from aiohttp import web
|
||||
|
||||
AIOHTTP_AVAILABLE = True
|
||||
except ImportError:
|
||||
AIOHTTP_AVAILABLE = False
|
||||
web = None # type: ignore[assignment]
|
||||
|
||||
from gateway.config import Platform, PlatformConfig
|
||||
from gateway.platforms.base import (
|
||||
BasePlatformAdapter,
|
||||
MessageEvent,
|
||||
MessageType,
|
||||
SendResult,
|
||||
)
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
DEFAULT_HOST = "0.0.0.0"
|
||||
DEFAULT_PORT = 8644
|
||||
_INSECURE_NO_AUTH = "INSECURE_NO_AUTH"
|
||||
|
||||
|
||||
def check_webhook_requirements() -> bool:
|
||||
"""Check if webhook adapter dependencies are available."""
|
||||
return AIOHTTP_AVAILABLE
|
||||
|
||||
|
||||
class WebhookAdapter(BasePlatformAdapter):
|
||||
"""Generic webhook receiver that triggers agent runs from HTTP POSTs."""
|
||||
|
||||
def __init__(self, config: PlatformConfig):
|
||||
super().__init__(config, Platform.WEBHOOK)
|
||||
self._host: str = config.extra.get("host", DEFAULT_HOST)
|
||||
self._port: int = int(config.extra.get("port", DEFAULT_PORT))
|
||||
self._global_secret: str = config.extra.get("secret", "")
|
||||
self._routes: Dict[str, dict] = config.extra.get("routes", {})
|
||||
self._runner = None
|
||||
|
||||
# Delivery info keyed by session chat_id — consumed by send()
|
||||
self._delivery_info: Dict[str, dict] = {}
|
||||
|
||||
# Reference to gateway runner for cross-platform delivery (set externally)
|
||||
self.gateway_runner = None
|
||||
|
||||
# Idempotency: TTL cache of recently processed delivery IDs.
|
||||
# Prevents duplicate agent runs when webhook providers retry.
|
||||
self._seen_deliveries: Dict[str, float] = {}
|
||||
self._idempotency_ttl: int = 3600 # 1 hour
|
||||
|
||||
# Rate limiting: per-route timestamps in a fixed window.
|
||||
self._rate_counts: Dict[str, List[float]] = {}
|
||||
self._rate_limit: int = int(config.extra.get("rate_limit", 30)) # per minute
|
||||
|
||||
# Body size limit (auth-before-body pattern)
|
||||
self._max_body_bytes: int = int(
|
||||
config.extra.get("max_body_bytes", 1_048_576)
|
||||
) # 1MB
|
||||
|
||||
# ------------------------------------------------------------------
|
||||
# Lifecycle
|
||||
# ------------------------------------------------------------------
|
||||
|
||||
async def connect(self) -> bool:
|
||||
# Validate routes at startup — secret is required per route
|
||||
for name, route in self._routes.items():
|
||||
secret = route.get("secret", self._global_secret)
|
||||
if not secret:
|
||||
raise ValueError(
|
||||
f"[webhook] Route '{name}' has no HMAC secret. "
|
||||
f"Set 'secret' on the route or globally. "
|
||||
f"For testing without auth, set secret to '{_INSECURE_NO_AUTH}'."
|
||||
)
|
||||
|
||||
app = web.Application()
|
||||
app.router.add_get("/health", self._handle_health)
|
||||
app.router.add_post("/webhooks/{route_name}", self._handle_webhook)
|
||||
|
||||
self._runner = web.AppRunner(app)
|
||||
await self._runner.setup()
|
||||
site = web.TCPSite(self._runner, self._host, self._port)
|
||||
await site.start()
|
||||
self._mark_connected()
|
||||
|
||||
route_names = ", ".join(self._routes.keys()) or "(none configured)"
|
||||
logger.info(
|
||||
"[webhook] Listening on %s:%d — routes: %s",
|
||||
self._host,
|
||||
self._port,
|
||||
route_names,
|
||||
)
|
||||
return True
|
||||
|
||||
async def disconnect(self) -> None:
|
||||
if self._runner:
|
||||
await self._runner.cleanup()
|
||||
self._runner = None
|
||||
self._mark_disconnected()
|
||||
logger.info("[webhook] Disconnected")
|
||||
|
||||
async def send(
|
||||
self,
|
||||
chat_id: str,
|
||||
content: str,
|
||||
reply_to: Optional[str] = None,
|
||||
metadata: Optional[Dict[str, Any]] = None,
|
||||
) -> SendResult:
|
||||
"""Deliver the agent's response to the configured destination.
|
||||
|
||||
chat_id is ``webhook:{route}:{delivery_id}`` — we pop the delivery
|
||||
info stored during webhook receipt so it doesn't leak memory.
|
||||
"""
|
||||
delivery = self._delivery_info.pop(chat_id, {})
|
||||
deliver_type = delivery.get("deliver", "log")
|
||||
|
||||
if deliver_type == "log":
|
||||
logger.info("[webhook] Response for %s: %s", chat_id, content[:200])
|
||||
return SendResult(success=True)
|
||||
|
||||
if deliver_type == "github_comment":
|
||||
return await self._deliver_github_comment(content, delivery)
|
||||
|
||||
# Cross-platform delivery (telegram, discord, etc.)
|
||||
if self.gateway_runner and deliver_type in (
|
||||
"telegram",
|
||||
"discord",
|
||||
"slack",
|
||||
"signal",
|
||||
"sms",
|
||||
):
|
||||
return await self._deliver_cross_platform(
|
||||
deliver_type, content, delivery
|
||||
)
|
||||
|
||||
logger.warning("[webhook] Unknown deliver type: %s", deliver_type)
|
||||
return SendResult(
|
||||
success=False, error=f"Unknown deliver type: {deliver_type}"
|
||||
)
|
||||
|
||||
async def get_chat_info(self, chat_id: str) -> Dict[str, Any]:
|
||||
return {"name": chat_id, "type": "webhook"}
|
||||
|
||||
# ------------------------------------------------------------------
|
||||
# HTTP handlers
|
||||
# ------------------------------------------------------------------
|
||||
|
||||
async def _handle_health(self, request: "web.Request") -> "web.Response":
|
||||
"""GET /health — simple health check."""
|
||||
return web.json_response({"status": "ok", "platform": "webhook"})
|
||||
|
||||
async def _handle_webhook(self, request: "web.Request") -> "web.Response":
|
||||
"""POST /webhooks/{route_name} — receive and process a webhook event."""
|
||||
route_name = request.match_info.get("route_name", "")
|
||||
route_config = self._routes.get(route_name)
|
||||
|
||||
if not route_config:
|
||||
return web.json_response(
|
||||
{"error": f"Unknown route: {route_name}"}, status=404
|
||||
)
|
||||
|
||||
# ── Auth-before-body ─────────────────────────────────────
|
||||
# Check Content-Length before reading the full payload.
|
||||
content_length = request.content_length or 0
|
||||
if content_length > self._max_body_bytes:
|
||||
return web.json_response(
|
||||
{"error": "Payload too large"}, status=413
|
||||
)
|
||||
|
||||
# ── Rate limiting ────────────────────────────────────────
|
||||
now = time.time()
|
||||
window = self._rate_counts.setdefault(route_name, [])
|
||||
window[:] = [t for t in window if now - t < 60]
|
||||
if len(window) >= self._rate_limit:
|
||||
return web.json_response(
|
||||
{"error": "Rate limit exceeded"}, status=429
|
||||
)
|
||||
window.append(now)
|
||||
|
||||
# Read body
|
||||
try:
|
||||
raw_body = await request.read()
|
||||
except Exception as e:
|
||||
logger.error("[webhook] Failed to read body: %s", e)
|
||||
return web.json_response({"error": "Bad request"}, status=400)
|
||||
|
||||
# Validate HMAC signature (skip for INSECURE_NO_AUTH testing mode)
|
||||
secret = route_config.get("secret", self._global_secret)
|
||||
if secret and secret != _INSECURE_NO_AUTH:
|
||||
if not self._validate_signature(request, raw_body, secret):
|
||||
logger.warning(
|
||||
"[webhook] Invalid signature for route %s", route_name
|
||||
)
|
||||
return web.json_response(
|
||||
{"error": "Invalid signature"}, status=401
|
||||
)
|
||||
|
||||
# Parse payload
|
||||
try:
|
||||
payload = json.loads(raw_body)
|
||||
except json.JSONDecodeError:
|
||||
# Try form-encoded as fallback
|
||||
try:
|
||||
import urllib.parse
|
||||
|
||||
payload = dict(
|
||||
urllib.parse.parse_qsl(raw_body.decode("utf-8"))
|
||||
)
|
||||
except Exception:
|
||||
return web.json_response(
|
||||
{"error": "Cannot parse body"}, status=400
|
||||
)
|
||||
|
||||
# Check event type filter
|
||||
event_type = (
|
||||
request.headers.get("X-GitHub-Event", "")
|
||||
or request.headers.get("X-GitLab-Event", "")
|
||||
or payload.get("event_type", "")
|
||||
or "unknown"
|
||||
)
|
||||
allowed_events = route_config.get("events", [])
|
||||
if allowed_events and event_type not in allowed_events:
|
||||
logger.debug(
|
||||
"[webhook] Ignoring event %s for route %s (allowed: %s)",
|
||||
event_type,
|
||||
route_name,
|
||||
allowed_events,
|
||||
)
|
||||
return web.json_response(
|
||||
{"status": "ignored", "event": event_type}
|
||||
)
|
||||
|
||||
# Format prompt from template
|
||||
prompt_template = route_config.get("prompt", "")
|
||||
prompt = self._render_prompt(
|
||||
prompt_template, payload, event_type, route_name
|
||||
)
|
||||
|
||||
# Inject skill content if configured.
|
||||
# We call build_skill_invocation_message() directly rather than
|
||||
# using /skill-name slash commands — the gateway's command parser
|
||||
# would intercept those and break the flow.
|
||||
skills = route_config.get("skills", [])
|
||||
if skills:
|
||||
try:
|
||||
from agent.skill_commands import (
|
||||
build_skill_invocation_message,
|
||||
get_skill_commands,
|
||||
)
|
||||
|
||||
skill_cmds = get_skill_commands()
|
||||
for skill_name in skills:
|
||||
cmd_key = f"/{skill_name}"
|
||||
if cmd_key in skill_cmds:
|
||||
skill_content = build_skill_invocation_message(
|
||||
cmd_key, user_instruction=prompt
|
||||
)
|
||||
if skill_content:
|
||||
prompt = skill_content
|
||||
break # Load the first matching skill
|
||||
else:
|
||||
logger.warning(
|
||||
"[webhook] Skill '%s' not found", skill_name
|
||||
)
|
||||
except Exception as e:
|
||||
logger.warning("[webhook] Skill loading failed: %s", e)
|
||||
|
||||
# Build a unique delivery ID
|
||||
delivery_id = request.headers.get(
|
||||
"X-GitHub-Delivery",
|
||||
request.headers.get("X-Request-ID", str(int(time.time() * 1000))),
|
||||
)
|
||||
|
||||
# ── Idempotency ─────────────────────────────────────────
|
||||
# Skip duplicate deliveries (webhook retries).
|
||||
now = time.time()
|
||||
# Prune expired entries
|
||||
self._seen_deliveries = {
|
||||
k: v
|
||||
for k, v in self._seen_deliveries.items()
|
||||
if now - v < self._idempotency_ttl
|
||||
}
|
||||
if delivery_id in self._seen_deliveries:
|
||||
logger.info(
|
||||
"[webhook] Skipping duplicate delivery %s", delivery_id
|
||||
)
|
||||
return web.json_response(
|
||||
{"status": "duplicate", "delivery_id": delivery_id},
|
||||
status=200,
|
||||
)
|
||||
self._seen_deliveries[delivery_id] = now
|
||||
|
||||
# Use delivery_id in session key so concurrent webhooks on the
|
||||
# same route get independent agent runs (not queued/interrupted).
|
||||
session_chat_id = f"webhook:{route_name}:{delivery_id}"
|
||||
|
||||
# Store delivery info for send() — consumed (popped) on delivery
|
||||
deliver_config = {
|
||||
"deliver": route_config.get("deliver", "log"),
|
||||
"deliver_extra": self._render_delivery_extra(
|
||||
route_config.get("deliver_extra", {}), payload
|
||||
),
|
||||
"payload": payload,
|
||||
}
|
||||
self._delivery_info[session_chat_id] = deliver_config
|
||||
|
||||
# Build source and event
|
||||
source = self.build_source(
|
||||
chat_id=session_chat_id,
|
||||
chat_name=f"webhook/{route_name}",
|
||||
chat_type="webhook",
|
||||
user_id=f"webhook:{route_name}",
|
||||
user_name=route_name,
|
||||
)
|
||||
event = MessageEvent(
|
||||
text=prompt,
|
||||
message_type=MessageType.TEXT,
|
||||
source=source,
|
||||
raw_message=payload,
|
||||
message_id=delivery_id,
|
||||
)
|
||||
|
||||
logger.info(
|
||||
"[webhook] %s event=%s route=%s prompt_len=%d delivery=%s",
|
||||
request.method,
|
||||
event_type,
|
||||
route_name,
|
||||
len(prompt),
|
||||
delivery_id,
|
||||
)
|
||||
|
||||
# Non-blocking — return 202 Accepted immediately
|
||||
asyncio.create_task(self.handle_message(event))
|
||||
|
||||
return web.json_response(
|
||||
{
|
||||
"status": "accepted",
|
||||
"route": route_name,
|
||||
"event": event_type,
|
||||
"delivery_id": delivery_id,
|
||||
},
|
||||
status=202,
|
||||
)
|
||||
|
||||
# ------------------------------------------------------------------
|
||||
# Signature validation
|
||||
# ------------------------------------------------------------------
|
||||
|
||||
def _validate_signature(
|
||||
self, request: "web.Request", body: bytes, secret: str
|
||||
) -> bool:
|
||||
"""Validate webhook signature (GitHub, GitLab, generic HMAC-SHA256)."""
|
||||
# GitHub: X-Hub-Signature-256 = sha256=<hex>
|
||||
gh_sig = request.headers.get("X-Hub-Signature-256", "")
|
||||
if gh_sig:
|
||||
expected = "sha256=" + hmac.new(
|
||||
secret.encode(), body, hashlib.sha256
|
||||
).hexdigest()
|
||||
return hmac.compare_digest(gh_sig, expected)
|
||||
|
||||
# GitLab: X-Gitlab-Token = <plain secret>
|
||||
gl_token = request.headers.get("X-Gitlab-Token", "")
|
||||
if gl_token:
|
||||
return hmac.compare_digest(gl_token, secret)
|
||||
|
||||
# Generic: X-Webhook-Signature = <hex HMAC-SHA256>
|
||||
generic_sig = request.headers.get("X-Webhook-Signature", "")
|
||||
if generic_sig:
|
||||
expected = hmac.new(
|
||||
secret.encode(), body, hashlib.sha256
|
||||
).hexdigest()
|
||||
return hmac.compare_digest(generic_sig, expected)
|
||||
|
||||
# No recognised signature header but secret is configured → reject
|
||||
logger.debug(
|
||||
"[webhook] Secret configured but no signature header found"
|
||||
)
|
||||
return False
|
||||
|
||||
# ------------------------------------------------------------------
|
||||
# Prompt rendering
|
||||
# ------------------------------------------------------------------
|
||||
|
||||
def _render_prompt(
|
||||
self,
|
||||
template: str,
|
||||
payload: dict,
|
||||
event_type: str,
|
||||
route_name: str,
|
||||
) -> str:
|
||||
"""Render a prompt template with the webhook payload.
|
||||
|
||||
Supports dot-notation access into nested dicts:
|
||||
``{pull_request.title}`` → ``payload["pull_request"]["title"]``
|
||||
"""
|
||||
if not template:
|
||||
truncated = json.dumps(payload, indent=2)[:4000]
|
||||
return (
|
||||
f"Webhook event '{event_type}' on route "
|
||||
f"'{route_name}':\n\n```json\n{truncated}\n```"
|
||||
)
|
||||
|
||||
def _resolve(match: re.Match) -> str:
|
||||
key = match.group(1)
|
||||
value: Any = payload
|
||||
for part in key.split("."):
|
||||
if isinstance(value, dict):
|
||||
value = value.get(part, f"{{{key}}}")
|
||||
else:
|
||||
return f"{{{key}}}"
|
||||
if isinstance(value, (dict, list)):
|
||||
return json.dumps(value, indent=2)[:2000]
|
||||
return str(value)
|
||||
|
||||
return re.sub(r"\{([a-zA-Z0-9_.]+)\}", _resolve, template)
|
||||
|
||||
def _render_delivery_extra(
|
||||
self, extra: dict, payload: dict
|
||||
) -> dict:
|
||||
"""Render delivery_extra template values with payload data."""
|
||||
rendered: Dict[str, Any] = {}
|
||||
for key, value in extra.items():
|
||||
if isinstance(value, str):
|
||||
rendered[key] = self._render_prompt(value, payload, "", "")
|
||||
else:
|
||||
rendered[key] = value
|
||||
return rendered
|
||||
|
||||
# ------------------------------------------------------------------
|
||||
# Response delivery
|
||||
# ------------------------------------------------------------------
|
||||
|
||||
async def _deliver_github_comment(
|
||||
self, content: str, delivery: dict
|
||||
) -> SendResult:
|
||||
"""Post agent response as a GitHub PR/issue comment via ``gh`` CLI."""
|
||||
extra = delivery.get("deliver_extra", {})
|
||||
repo = extra.get("repo", "")
|
||||
pr_number = extra.get("pr_number", "")
|
||||
|
||||
if not repo or not pr_number:
|
||||
logger.error(
|
||||
"[webhook] github_comment delivery missing repo or pr_number"
|
||||
)
|
||||
return SendResult(
|
||||
success=False, error="Missing repo or pr_number"
|
||||
)
|
||||
|
||||
try:
|
||||
result = subprocess.run(
|
||||
[
|
||||
"gh",
|
||||
"pr",
|
||||
"comment",
|
||||
str(pr_number),
|
||||
"--repo",
|
||||
repo,
|
||||
"--body",
|
||||
content,
|
||||
],
|
||||
capture_output=True,
|
||||
text=True,
|
||||
timeout=30,
|
||||
)
|
||||
if result.returncode == 0:
|
||||
logger.info(
|
||||
"[webhook] Posted comment on %s#%s", repo, pr_number
|
||||
)
|
||||
return SendResult(success=True)
|
||||
else:
|
||||
logger.error(
|
||||
"[webhook] gh pr comment failed: %s", result.stderr
|
||||
)
|
||||
return SendResult(success=False, error=result.stderr)
|
||||
except FileNotFoundError:
|
||||
logger.error(
|
||||
"[webhook] 'gh' CLI not found — install GitHub CLI for "
|
||||
"github_comment delivery"
|
||||
)
|
||||
return SendResult(
|
||||
success=False, error="gh CLI not installed"
|
||||
)
|
||||
except Exception as e:
|
||||
logger.error("[webhook] github_comment delivery error: %s", e)
|
||||
return SendResult(success=False, error=str(e))
|
||||
|
||||
async def _deliver_cross_platform(
|
||||
self, platform_name: str, content: str, delivery: dict
|
||||
) -> SendResult:
|
||||
"""Route response to another platform (telegram, discord, etc.)."""
|
||||
if not self.gateway_runner:
|
||||
return SendResult(
|
||||
success=False,
|
||||
error="No gateway runner for cross-platform delivery",
|
||||
)
|
||||
|
||||
try:
|
||||
target_platform = Platform(platform_name)
|
||||
except ValueError:
|
||||
return SendResult(
|
||||
success=False, error=f"Unknown platform: {platform_name}"
|
||||
)
|
||||
|
||||
adapter = self.gateway_runner.adapters.get(target_platform)
|
||||
if not adapter:
|
||||
return SendResult(
|
||||
success=False,
|
||||
error=f"Platform {platform_name} not connected",
|
||||
)
|
||||
|
||||
# Use home channel if no specific chat_id in deliver_extra
|
||||
extra = delivery.get("deliver_extra", {})
|
||||
chat_id = extra.get("chat_id", "")
|
||||
if not chat_id:
|
||||
home = self.gateway_runner.config.get_home_channel(target_platform)
|
||||
if home:
|
||||
chat_id = home.chat_id
|
||||
else:
|
||||
return SendResult(
|
||||
success=False,
|
||||
error=f"No chat_id or home channel for {platform_name}",
|
||||
)
|
||||
|
||||
return await adapter.send(chat_id, content)
|
||||
@@ -182,9 +182,31 @@ class WhatsAppAdapter(BasePlatformAdapter):
|
||||
# Ensure session directory exists
|
||||
self._session_path.mkdir(parents=True, exist_ok=True)
|
||||
|
||||
# Check if bridge is already running and connected
|
||||
import aiohttp
|
||||
import asyncio
|
||||
try:
|
||||
async with aiohttp.ClientSession() as session:
|
||||
async with session.get(
|
||||
f"http://127.0.0.1:{self._bridge_port}/health",
|
||||
timeout=aiohttp.ClientTimeout(total=2)
|
||||
) as resp:
|
||||
if resp.status == 200:
|
||||
data = await resp.json()
|
||||
bridge_status = data.get("status", "unknown")
|
||||
if bridge_status == "connected":
|
||||
print(f"[{self.name}] Using existing bridge (status: {bridge_status})")
|
||||
self._running = True
|
||||
self._bridge_process = None # Not managed by us
|
||||
asyncio.create_task(self._poll_messages())
|
||||
return True
|
||||
else:
|
||||
print(f"[{self.name}] Bridge found but not connected (status: {bridge_status}), restarting")
|
||||
except Exception:
|
||||
pass # Bridge not running, start a new one
|
||||
|
||||
# Kill any orphaned bridge from a previous gateway run
|
||||
_kill_port_process(self._bridge_port)
|
||||
import asyncio
|
||||
await asyncio.sleep(1)
|
||||
|
||||
# Start the bridge process in its own process group.
|
||||
@@ -232,7 +254,7 @@ class WhatsAppAdapter(BasePlatformAdapter):
|
||||
try:
|
||||
async with aiohttp.ClientSession() as session:
|
||||
async with session.get(
|
||||
f"http://localhost:{self._bridge_port}/health",
|
||||
f"http://127.0.0.1:{self._bridge_port}/health",
|
||||
timeout=aiohttp.ClientTimeout(total=2)
|
||||
) as resp:
|
||||
if resp.status == 200:
|
||||
@@ -264,7 +286,7 @@ class WhatsAppAdapter(BasePlatformAdapter):
|
||||
try:
|
||||
async with aiohttp.ClientSession() as session:
|
||||
async with session.get(
|
||||
f"http://localhost:{self._bridge_port}/health",
|
||||
f"http://127.0.0.1:{self._bridge_port}/health",
|
||||
timeout=aiohttp.ClientTimeout(total=2)
|
||||
) as resp:
|
||||
if resp.status == 200:
|
||||
@@ -326,9 +348,9 @@ class WhatsAppAdapter(BasePlatformAdapter):
|
||||
self._bridge_process.kill()
|
||||
except Exception as e:
|
||||
print(f"[{self.name}] Error stopping bridge: {e}")
|
||||
|
||||
# Also kill any orphaned bridge processes on our port
|
||||
_kill_port_process(self._bridge_port)
|
||||
else:
|
||||
# Bridge was not started by us, don't kill it
|
||||
print(f"[{self.name}] Disconnecting (external bridge left running)")
|
||||
|
||||
self._running = False
|
||||
self._bridge_process = None
|
||||
@@ -358,7 +380,7 @@ class WhatsAppAdapter(BasePlatformAdapter):
|
||||
payload["replyTo"] = reply_to
|
||||
|
||||
async with session.post(
|
||||
f"http://localhost:{self._bridge_port}/send",
|
||||
f"http://127.0.0.1:{self._bridge_port}/send",
|
||||
json=payload,
|
||||
timeout=aiohttp.ClientTimeout(total=30)
|
||||
) as resp:
|
||||
@@ -394,7 +416,7 @@ class WhatsAppAdapter(BasePlatformAdapter):
|
||||
import aiohttp
|
||||
async with aiohttp.ClientSession() as session:
|
||||
async with session.post(
|
||||
f"http://localhost:{self._bridge_port}/edit",
|
||||
f"http://127.0.0.1:{self._bridge_port}/edit",
|
||||
json={
|
||||
"chatId": chat_id,
|
||||
"messageId": message_id,
|
||||
@@ -439,7 +461,7 @@ class WhatsAppAdapter(BasePlatformAdapter):
|
||||
|
||||
async with aiohttp.ClientSession() as session:
|
||||
async with session.post(
|
||||
f"http://localhost:{self._bridge_port}/send-media",
|
||||
f"http://127.0.0.1:{self._bridge_port}/send-media",
|
||||
json=payload,
|
||||
timeout=aiohttp.ClientTimeout(total=120),
|
||||
) as resp:
|
||||
@@ -515,7 +537,7 @@ class WhatsAppAdapter(BasePlatformAdapter):
|
||||
|
||||
async with aiohttp.ClientSession() as session:
|
||||
await session.post(
|
||||
f"http://localhost:{self._bridge_port}/typing",
|
||||
f"http://127.0.0.1:{self._bridge_port}/typing",
|
||||
json={"chatId": chat_id},
|
||||
timeout=aiohttp.ClientTimeout(total=5)
|
||||
)
|
||||
@@ -532,7 +554,7 @@ class WhatsAppAdapter(BasePlatformAdapter):
|
||||
|
||||
async with aiohttp.ClientSession() as session:
|
||||
async with session.get(
|
||||
f"http://localhost:{self._bridge_port}/chat/{chat_id}",
|
||||
f"http://127.0.0.1:{self._bridge_port}/chat/{chat_id}",
|
||||
timeout=aiohttp.ClientTimeout(total=10)
|
||||
) as resp:
|
||||
if resp.status == 200:
|
||||
@@ -559,7 +581,7 @@ class WhatsAppAdapter(BasePlatformAdapter):
|
||||
try:
|
||||
async with aiohttp.ClientSession() as session:
|
||||
async with session.get(
|
||||
f"http://localhost:{self._bridge_port}/messages",
|
||||
f"http://127.0.0.1:{self._bridge_port}/messages",
|
||||
timeout=aiohttp.ClientTimeout(total=30)
|
||||
) as resp:
|
||||
if resp.status == 200:
|
||||
@@ -621,6 +643,11 @@ class WhatsAppAdapter(BasePlatformAdapter):
|
||||
print(f"[{self.name}] Failed to cache image: {e}", flush=True)
|
||||
cached_urls.append(url)
|
||||
media_types.append("image/jpeg")
|
||||
elif msg_type == MessageType.PHOTO and os.path.isabs(url):
|
||||
# Local file path — bridge already downloaded the image
|
||||
cached_urls.append(url)
|
||||
media_types.append("image/jpeg")
|
||||
print(f"[{self.name}] Using bridge-cached image: {url}", flush=True)
|
||||
elif msg_type == MessageType.VOICE and url.startswith(("http://", "https://")):
|
||||
try:
|
||||
cached_path = await cache_audio_from_url(url, ext=".ogg")
|
||||
|
||||
+237
-35
@@ -222,6 +222,12 @@ from gateway.platforms.base import BasePlatformAdapter, MessageEvent, MessageTyp
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
# Sentinel placed into _running_agents immediately when a session starts
|
||||
# processing, *before* any await. Prevents a second message for the same
|
||||
# session from bypassing the "already running" guard during the async gap
|
||||
# between the guard check and actual agent creation.
|
||||
_AGENT_PENDING_SENTINEL = object()
|
||||
|
||||
|
||||
def _resolve_runtime_agent_kwargs() -> dict:
|
||||
"""Resolve provider credentials for gateway-created AIAgent instances."""
|
||||
@@ -242,6 +248,8 @@ def _resolve_runtime_agent_kwargs() -> dict:
|
||||
"base_url": runtime.get("base_url"),
|
||||
"provider": runtime.get("provider"),
|
||||
"api_mode": runtime.get("api_mode"),
|
||||
"command": runtime.get("command"),
|
||||
"args": list(runtime.get("args") or []),
|
||||
}
|
||||
|
||||
|
||||
@@ -611,6 +619,8 @@ class GatewayRunner:
|
||||
"base_url": runtime_kwargs.get("base_url"),
|
||||
"provider": runtime_kwargs.get("provider"),
|
||||
"api_mode": runtime_kwargs.get("api_mode"),
|
||||
"command": runtime_kwargs.get("command"),
|
||||
"args": list(runtime_kwargs.get("args") or []),
|
||||
}
|
||||
return resolve_turn_route(user_message, getattr(self, "_smart_model_routing", {}), primary)
|
||||
|
||||
@@ -1046,6 +1056,8 @@ class GatewayRunner:
|
||||
self._running = False
|
||||
|
||||
for session_key, agent in list(self._running_agents.items()):
|
||||
if agent is _AGENT_PENDING_SENTINEL:
|
||||
continue
|
||||
try:
|
||||
agent.interrupt("Gateway shutting down")
|
||||
logger.debug("Interrupted running agent for session %s during shutdown", session_key[:20])
|
||||
@@ -1179,6 +1191,15 @@ class GatewayRunner:
|
||||
return None
|
||||
return APIServerAdapter(config)
|
||||
|
||||
elif platform == Platform.WEBHOOK:
|
||||
from gateway.platforms.webhook import WebhookAdapter, check_webhook_requirements
|
||||
if not check_webhook_requirements():
|
||||
logger.warning("Webhook: aiohttp not installed")
|
||||
return None
|
||||
adapter = WebhookAdapter(config)
|
||||
adapter.gateway_runner = self # For cross-platform delivery
|
||||
return adapter
|
||||
|
||||
return None
|
||||
|
||||
def _is_user_authorized(self, source: SessionSource) -> bool:
|
||||
@@ -1195,7 +1216,9 @@ class GatewayRunner:
|
||||
# Home Assistant events are system-generated (state changes), not
|
||||
# user-initiated messages. The HASS_TOKEN already authenticates the
|
||||
# connection, so HA events are always authorized.
|
||||
if source.platform == Platform.HOMEASSISTANT:
|
||||
# Webhook events are authenticated via HMAC signature validation in
|
||||
# the adapter itself — no user allowlist applies.
|
||||
if source.platform in (Platform.HOMEASSISTANT, Platform.WEBHOOK):
|
||||
return True
|
||||
|
||||
user_id = source.user_id
|
||||
@@ -1257,6 +1280,13 @@ class GatewayRunner:
|
||||
if "@" in user_id:
|
||||
check_ids.add(user_id.split("@")[0])
|
||||
return bool(check_ids & allowed_ids)
|
||||
|
||||
def _get_unauthorized_dm_behavior(self, platform: Optional[Platform]) -> str:
|
||||
"""Return how unauthorized DMs should be handled for a platform."""
|
||||
config = getattr(self, "config", None)
|
||||
if config and hasattr(config, "get_unauthorized_dm_behavior"):
|
||||
return config.get_unauthorized_dm_behavior(platform)
|
||||
return "pair"
|
||||
|
||||
async def _handle_message(self, event: MessageEvent) -> Optional[str]:
|
||||
"""
|
||||
@@ -1277,7 +1307,7 @@ class GatewayRunner:
|
||||
if not self._is_user_authorized(source):
|
||||
logger.warning("Unauthorized user: %s (%s) on %s", source.user_id, source.user_name, source.platform.value)
|
||||
# In DMs: offer pairing code. In groups: silently ignore.
|
||||
if source.chat_type == "dm":
|
||||
if source.chat_type == "dm" and self._get_unauthorized_dm_behavior(source.platform) == "pair":
|
||||
platform_name = source.platform.value if source.platform else "unknown"
|
||||
code = self.pairing_store.generate_code(
|
||||
platform_name, source.user_id, source.user_name or ""
|
||||
@@ -1314,6 +1344,48 @@ class GatewayRunner:
|
||||
if event.get_command() == "status":
|
||||
return await self._handle_status_command(event)
|
||||
|
||||
# /reset and /new must bypass the running-agent guard so they
|
||||
# actually dispatch as commands instead of being queued as user
|
||||
# text (which would be fed back to the agent with the same
|
||||
# broken history — #2170). Interrupt the agent first, then
|
||||
# clear the adapter's pending queue so the stale "/reset" text
|
||||
# doesn't get re-processed as a user message after the
|
||||
# interrupt completes.
|
||||
from hermes_cli.commands import resolve_command as _resolve_cmd_inner
|
||||
_evt_cmd = event.get_command()
|
||||
_cmd_def_inner = _resolve_cmd_inner(_evt_cmd) if _evt_cmd else None
|
||||
if _cmd_def_inner and _cmd_def_inner.name == "new":
|
||||
running_agent = self._running_agents.get(_quick_key)
|
||||
if running_agent and running_agent is not _AGENT_PENDING_SENTINEL:
|
||||
running_agent.interrupt("Session reset requested")
|
||||
# Clear any pending messages so the old text doesn't replay
|
||||
adapter = self.adapters.get(source.platform)
|
||||
if adapter and hasattr(adapter, 'get_pending_message'):
|
||||
adapter.get_pending_message(_quick_key) # consume and discard
|
||||
self._pending_messages.pop(_quick_key, None)
|
||||
# Clean up the running agent entry so the reset handler
|
||||
# doesn't think an agent is still active.
|
||||
if _quick_key in self._running_agents:
|
||||
del self._running_agents[_quick_key]
|
||||
return await self._handle_reset_command(event)
|
||||
|
||||
# /queue <prompt> — queue without interrupting
|
||||
if event.get_command() in ("queue", "q"):
|
||||
queued_text = event.get_command_args().strip()
|
||||
if not queued_text:
|
||||
return "Usage: /queue <prompt>"
|
||||
adapter = self.adapters.get(source.platform)
|
||||
if adapter:
|
||||
from gateway.platforms.base import MessageEvent as _ME, MessageType as _MT
|
||||
queued_event = _ME(
|
||||
text=queued_text,
|
||||
message_type=_MT.TEXT,
|
||||
source=event.source,
|
||||
message_id=event.message_id,
|
||||
)
|
||||
adapter._pending_messages[_quick_key] = queued_event
|
||||
return "Queued for the next turn."
|
||||
|
||||
if event.message_type == MessageType.PHOTO:
|
||||
logger.debug("PRIORITY photo follow-up for session %s — queueing without interrupt", _quick_key[:20])
|
||||
adapter = self.adapters.get(source.platform)
|
||||
@@ -1335,7 +1407,18 @@ class GatewayRunner:
|
||||
adapter._pending_messages[_quick_key] = event
|
||||
return None
|
||||
|
||||
running_agent = self._running_agents[_quick_key]
|
||||
running_agent = self._running_agents.get(_quick_key)
|
||||
if running_agent is _AGENT_PENDING_SENTINEL:
|
||||
# Agent is being set up but not ready yet.
|
||||
if event.get_command() == "stop":
|
||||
# Nothing to interrupt — agent hasn't started yet.
|
||||
return "⏳ The agent is still starting up — nothing to stop yet."
|
||||
# Queue the message so it will be picked up after the
|
||||
# agent starts.
|
||||
adapter = self.adapters.get(source.platform)
|
||||
if adapter:
|
||||
adapter._pending_messages[_quick_key] = event
|
||||
return None
|
||||
logger.debug("PRIORITY interrupt for session %s", _quick_key[:20])
|
||||
running_agent.interrupt(event.text)
|
||||
if _quick_key in self._pending_messages:
|
||||
@@ -1343,7 +1426,7 @@ class GatewayRunner:
|
||||
else:
|
||||
self._pending_messages[_quick_key] = event.text
|
||||
return None
|
||||
|
||||
|
||||
# Check for commands
|
||||
command = event.get_command()
|
||||
|
||||
@@ -1430,6 +1513,12 @@ class GatewayRunner:
|
||||
if canonical == "reload-mcp":
|
||||
return await self._handle_reload_mcp_command(event)
|
||||
|
||||
if canonical == "approve":
|
||||
return await self._handle_approve_command(event)
|
||||
|
||||
if canonical == "deny":
|
||||
return await self._handle_deny_command(event)
|
||||
|
||||
if canonical == "update":
|
||||
return await self._handle_update_command(event)
|
||||
|
||||
@@ -1507,33 +1596,32 @@ class GatewayRunner:
|
||||
except Exception as e:
|
||||
logger.debug("Skill command check failed (non-fatal): %s", e)
|
||||
|
||||
# Check for pending exec approval responses
|
||||
session_key_preview = self._session_key_for_source(source)
|
||||
if session_key_preview in self._pending_approvals:
|
||||
user_text = event.text.strip().lower()
|
||||
if user_text in ("yes", "y", "approve", "ok", "go", "do it"):
|
||||
approval = self._pending_approvals.pop(session_key_preview)
|
||||
cmd = approval["command"]
|
||||
pattern_keys = approval.get("pattern_keys", [])
|
||||
if not pattern_keys:
|
||||
pk = approval.get("pattern_key", "")
|
||||
pattern_keys = [pk] if pk else []
|
||||
logger.info("User approved dangerous command: %s...", cmd[:60])
|
||||
from tools.terminal_tool import terminal_tool
|
||||
from tools.approval import approve_session
|
||||
for pk in pattern_keys:
|
||||
approve_session(session_key_preview, pk)
|
||||
result = terminal_tool(command=cmd, force=True)
|
||||
return f"✅ Command approved and executed.\n\n```\n{result[:3500]}\n```"
|
||||
elif user_text in ("no", "n", "deny", "cancel", "nope"):
|
||||
self._pending_approvals.pop(session_key_preview)
|
||||
return "❌ Command denied."
|
||||
elif user_text in ("full", "show", "view", "show full", "view full"):
|
||||
# Show full command without consuming the approval
|
||||
cmd = self._pending_approvals[session_key_preview]["command"]
|
||||
return f"Full command:\n\n```\n{cmd}\n```\n\nReply yes/no to approve or deny."
|
||||
# If it's not clearly an approval/denial, fall through to normal processing
|
||||
|
||||
# Pending exec approvals are handled by /approve and /deny commands above.
|
||||
# No bare text matching — "yes" in normal conversation must not trigger
|
||||
# execution of a dangerous command.
|
||||
|
||||
# ── Claim this session before any await ───────────────────────
|
||||
# Between here and _run_agent registering the real AIAgent, there
|
||||
# are numerous await points (hooks, vision enrichment, STT,
|
||||
# session hygiene compression). Without this sentinel a second
|
||||
# message arriving during any of those yields would pass the
|
||||
# "already running" guard and spin up a duplicate agent for the
|
||||
# same session — corrupting the transcript.
|
||||
self._running_agents[_quick_key] = _AGENT_PENDING_SENTINEL
|
||||
|
||||
try:
|
||||
return await self._handle_message_with_agent(event, source, _quick_key)
|
||||
finally:
|
||||
# If _run_agent replaced the sentinel with a real agent and
|
||||
# then cleaned it up, this is a no-op. If we exited early
|
||||
# (exception, command fallthrough, etc.) the sentinel must
|
||||
# not linger or the session would be permanently locked out.
|
||||
if self._running_agents.get(_quick_key) is _AGENT_PENDING_SENTINEL:
|
||||
del self._running_agents[_quick_key]
|
||||
|
||||
async def _handle_message_with_agent(self, event, source, _quick_key: str):
|
||||
"""Inner handler that runs under the _running_agents sentinel guard."""
|
||||
|
||||
# Get or create session
|
||||
session_entry = self.session_store.get_or_create_session(source)
|
||||
session_key = session_entry.session_key
|
||||
@@ -2048,9 +2136,22 @@ class GatewayRunner:
|
||||
# Check if the agent encountered a dangerous command needing approval
|
||||
try:
|
||||
from tools.approval import pop_pending
|
||||
import time as _time
|
||||
pending = pop_pending(session_key)
|
||||
if pending:
|
||||
pending["timestamp"] = _time.time()
|
||||
self._pending_approvals[session_key] = pending
|
||||
# Append structured instructions so the user knows how to respond
|
||||
cmd_preview = pending.get("command", "")
|
||||
if len(cmd_preview) > 200:
|
||||
cmd_preview = cmd_preview[:200] + "..."
|
||||
approval_hint = (
|
||||
f"\n\n⚠️ **Dangerous command requires approval:**\n"
|
||||
f"```\n{cmd_preview}\n```\n"
|
||||
f"Reply `/approve` to execute, `/approve session` to approve this pattern "
|
||||
f"for the session, or `/deny` to cancel."
|
||||
)
|
||||
response = (response or "") + approval_hint
|
||||
except Exception as e:
|
||||
logger.debug("Failed to check pending approvals: %s", e)
|
||||
|
||||
@@ -2284,8 +2385,10 @@ class GatewayRunner:
|
||||
session_entry = self.session_store.get_or_create_session(source)
|
||||
session_key = session_entry.session_key
|
||||
|
||||
if session_key in self._running_agents:
|
||||
agent = self._running_agents[session_key]
|
||||
agent = self._running_agents.get(session_key)
|
||||
if agent is _AGENT_PENDING_SENTINEL:
|
||||
return "⏳ The agent is still starting up — nothing to stop yet."
|
||||
if agent:
|
||||
agent.interrupt()
|
||||
return "⚡ Stopping the current task... The agent will finish its current step and respond."
|
||||
else:
|
||||
@@ -2373,8 +2476,14 @@ class GatewayRunner:
|
||||
lines = [
|
||||
f"🤖 **Current model:** `{current}`",
|
||||
f"**Provider:** {provider_label}",
|
||||
"",
|
||||
]
|
||||
# Show custom endpoint URL when using a custom provider
|
||||
if current_provider == "custom":
|
||||
from hermes_cli.models import _get_custom_base_url
|
||||
custom_url = _get_custom_base_url() or os.getenv("OPENAI_BASE_URL", "")
|
||||
if custom_url:
|
||||
lines.append(f"**Endpoint:** `{custom_url}`")
|
||||
lines.append("")
|
||||
curated = curated_models_for_provider(current_provider)
|
||||
if curated:
|
||||
lines.append(f"**Available models ({provider_label}):**")
|
||||
@@ -2384,7 +2493,7 @@ class GatewayRunner:
|
||||
lines.append(f"• `{mid}`{label}{marker}")
|
||||
lines.append("")
|
||||
lines.append("To change: `/model model-name`")
|
||||
lines.append("Switch provider: `/model provider:model-name`")
|
||||
lines.append("Switch provider: `/model provider-name` or `/model provider:model-name`")
|
||||
return "\n".join(lines)
|
||||
|
||||
# Parse provider:model syntax
|
||||
@@ -3685,6 +3794,78 @@ class GatewayRunner:
|
||||
logger.warning("MCP reload failed: %s", e)
|
||||
return f"❌ MCP reload failed: {e}"
|
||||
|
||||
# ------------------------------------------------------------------
|
||||
# /approve & /deny — explicit dangerous-command approval
|
||||
# ------------------------------------------------------------------
|
||||
|
||||
_APPROVAL_TIMEOUT_SECONDS = 300 # 5 minutes
|
||||
|
||||
async def _handle_approve_command(self, event: MessageEvent) -> str:
|
||||
"""Handle /approve command — execute a pending dangerous command.
|
||||
|
||||
Usage:
|
||||
/approve — approve and execute the pending command
|
||||
/approve session — approve and remember for this session
|
||||
/approve always — approve this pattern permanently
|
||||
"""
|
||||
source = event.source
|
||||
session_key = self._session_key_for_source(source)
|
||||
|
||||
if session_key not in self._pending_approvals:
|
||||
return "No pending command to approve."
|
||||
|
||||
import time as _time
|
||||
approval = self._pending_approvals[session_key]
|
||||
|
||||
# Check for timeout
|
||||
ts = approval.get("timestamp", 0)
|
||||
if _time.time() - ts > self._APPROVAL_TIMEOUT_SECONDS:
|
||||
self._pending_approvals.pop(session_key, None)
|
||||
return "⚠️ Approval expired (timed out after 5 minutes). Ask the agent to try again."
|
||||
|
||||
self._pending_approvals.pop(session_key)
|
||||
cmd = approval["command"]
|
||||
pattern_keys = approval.get("pattern_keys", [])
|
||||
if not pattern_keys:
|
||||
pk = approval.get("pattern_key", "")
|
||||
pattern_keys = [pk] if pk else []
|
||||
|
||||
# Determine approval scope from args
|
||||
args = event.get_command_args().strip().lower()
|
||||
from tools.approval import approve_session, approve_permanent
|
||||
|
||||
if args in ("always", "permanent", "permanently"):
|
||||
for pk in pattern_keys:
|
||||
approve_permanent(pk)
|
||||
scope_msg = " (pattern approved permanently)"
|
||||
elif args in ("session", "ses"):
|
||||
for pk in pattern_keys:
|
||||
approve_session(session_key, pk)
|
||||
scope_msg = " (pattern approved for this session)"
|
||||
else:
|
||||
# One-time approval — just approve for session so the immediate
|
||||
# replay works, but don't advertise it as session-wide
|
||||
for pk in pattern_keys:
|
||||
approve_session(session_key, pk)
|
||||
scope_msg = ""
|
||||
|
||||
logger.info("User approved dangerous command via /approve: %s...%s", cmd[:60], scope_msg)
|
||||
from tools.terminal_tool import terminal_tool
|
||||
result = terminal_tool(command=cmd, force=True)
|
||||
return f"✅ Command approved and executed{scope_msg}.\n\n```\n{result[:3500]}\n```"
|
||||
|
||||
async def _handle_deny_command(self, event: MessageEvent) -> str:
|
||||
"""Handle /deny command — reject a pending dangerous command."""
|
||||
source = event.source
|
||||
session_key = self._session_key_for_source(source)
|
||||
|
||||
if session_key not in self._pending_approvals:
|
||||
return "No pending command to deny."
|
||||
|
||||
self._pending_approvals.pop(session_key)
|
||||
logger.info("User denied dangerous command via /deny")
|
||||
return "❌ Command denied."
|
||||
|
||||
async def _handle_update_command(self, event: MessageEvent) -> str:
|
||||
"""Handle /update command — update Hermes Agent to the latest version.
|
||||
|
||||
@@ -4400,6 +4581,26 @@ class GatewayRunner:
|
||||
except Exception as _e:
|
||||
logger.debug("agent:step hook error: %s", _e)
|
||||
|
||||
# Bridge sync status_callback → async adapter.send for context pressure
|
||||
_status_adapter = self.adapters.get(source.platform)
|
||||
_status_chat_id = source.chat_id
|
||||
_status_thread_metadata = {"thread_id": source.thread_id} if source.thread_id else None
|
||||
|
||||
def _status_callback_sync(event_type: str, message: str) -> None:
|
||||
if not _status_adapter:
|
||||
return
|
||||
try:
|
||||
asyncio.run_coroutine_threadsafe(
|
||||
_status_adapter.send(
|
||||
_status_chat_id,
|
||||
message,
|
||||
metadata=_status_thread_metadata,
|
||||
),
|
||||
_loop_for_step,
|
||||
)
|
||||
except Exception as _e:
|
||||
logger.debug("status_callback error (%s): %s", event_type, _e)
|
||||
|
||||
def run_sync():
|
||||
# Pass session_key to process registry via env var so background
|
||||
# processes can be mapped back to this gateway session
|
||||
@@ -4492,6 +4693,7 @@ class GatewayRunner:
|
||||
tool_progress_callback=progress_callback if tool_progress_enabled else None,
|
||||
step_callback=_step_callback_sync if _hooks_ref.loaded_hooks else None,
|
||||
stream_delta_callback=_stream_delta_cb,
|
||||
status_callback=_status_callback_sync,
|
||||
platform=platform_key,
|
||||
honcho_session_key=session_key,
|
||||
honcho_manager=honcho_manager,
|
||||
|
||||
@@ -11,5 +11,5 @@ Provides subcommands for:
|
||||
- hermes cron - Manage cron jobs
|
||||
"""
|
||||
|
||||
__version__ = "0.3.0"
|
||||
__release_date__ = "2026.3.17"
|
||||
__version__ = "0.4.0"
|
||||
__release_date__ = "2026.3.18"
|
||||
|
||||
+165
-14
@@ -19,6 +19,7 @@ import json
|
||||
import logging
|
||||
import os
|
||||
import shutil
|
||||
import shlex
|
||||
import stat
|
||||
import base64
|
||||
import hashlib
|
||||
@@ -66,6 +67,8 @@ DEFAULT_AGENT_KEY_MIN_TTL_SECONDS = 30 * 60 # 30 minutes
|
||||
ACCESS_TOKEN_REFRESH_SKEW_SECONDS = 120 # refresh 2 min before expiry
|
||||
DEVICE_AUTH_POLL_INTERVAL_CAP_SECONDS = 1 # poll at most every 1s
|
||||
DEFAULT_CODEX_BASE_URL = "https://chatgpt.com/backend-api/codex"
|
||||
DEFAULT_GITHUB_MODELS_BASE_URL = "https://api.githubcopilot.com"
|
||||
DEFAULT_COPILOT_ACP_BASE_URL = "acp://copilot"
|
||||
CODEX_OAUTH_CLIENT_ID = "app_EMoamEEZ73f0CkXaXp7hrann"
|
||||
CODEX_OAUTH_TOKEN_URL = "https://auth.openai.com/oauth/token"
|
||||
CODEX_ACCESS_TOKEN_REFRESH_SKEW_SECONDS = 120
|
||||
@@ -108,6 +111,20 @@ PROVIDER_REGISTRY: Dict[str, ProviderConfig] = {
|
||||
auth_type="oauth_external",
|
||||
inference_base_url=DEFAULT_CODEX_BASE_URL,
|
||||
),
|
||||
"copilot": ProviderConfig(
|
||||
id="copilot",
|
||||
name="GitHub Copilot",
|
||||
auth_type="api_key",
|
||||
inference_base_url=DEFAULT_GITHUB_MODELS_BASE_URL,
|
||||
api_key_env_vars=("COPILOT_GITHUB_TOKEN", "GH_TOKEN", "GITHUB_TOKEN"),
|
||||
),
|
||||
"copilot-acp": ProviderConfig(
|
||||
id="copilot-acp",
|
||||
name="GitHub Copilot ACP",
|
||||
auth_type="external_process",
|
||||
inference_base_url=DEFAULT_COPILOT_ACP_BASE_URL,
|
||||
base_url_env_var="COPILOT_ACP_BASE_URL",
|
||||
),
|
||||
"zai": ProviderConfig(
|
||||
id="zai",
|
||||
name="Z.AI / GLM",
|
||||
@@ -128,7 +145,7 @@ PROVIDER_REGISTRY: Dict[str, ProviderConfig] = {
|
||||
id="minimax",
|
||||
name="MiniMax",
|
||||
auth_type="api_key",
|
||||
inference_base_url="https://api.minimax.io/v1",
|
||||
inference_base_url="https://api.minimax.io/anthropic",
|
||||
api_key_env_vars=("MINIMAX_API_KEY",),
|
||||
base_url_env_var="MINIMAX_BASE_URL",
|
||||
),
|
||||
@@ -151,7 +168,7 @@ PROVIDER_REGISTRY: Dict[str, ProviderConfig] = {
|
||||
id="minimax-cn",
|
||||
name="MiniMax (China)",
|
||||
auth_type="api_key",
|
||||
inference_base_url="https://api.minimaxi.com/v1",
|
||||
inference_base_url="https://api.minimaxi.com/anthropic",
|
||||
api_key_env_vars=("MINIMAX_CN_API_KEY",),
|
||||
base_url_env_var="MINIMAX_CN_BASE_URL",
|
||||
),
|
||||
@@ -222,6 +239,70 @@ def _resolve_kimi_base_url(api_key: str, default_url: str, env_override: str) ->
|
||||
return default_url
|
||||
|
||||
|
||||
def _gh_cli_candidates() -> list[str]:
|
||||
"""Return candidate ``gh`` binary paths, including common Homebrew installs."""
|
||||
candidates: list[str] = []
|
||||
|
||||
resolved = shutil.which("gh")
|
||||
if resolved:
|
||||
candidates.append(resolved)
|
||||
|
||||
for candidate in (
|
||||
"/opt/homebrew/bin/gh",
|
||||
"/usr/local/bin/gh",
|
||||
str(Path.home() / ".local" / "bin" / "gh"),
|
||||
):
|
||||
if candidate in candidates:
|
||||
continue
|
||||
if os.path.isfile(candidate) and os.access(candidate, os.X_OK):
|
||||
candidates.append(candidate)
|
||||
|
||||
return candidates
|
||||
|
||||
|
||||
def _try_gh_cli_token() -> Optional[str]:
|
||||
"""Return a token from ``gh auth token`` when the GitHub CLI is available."""
|
||||
for gh_path in _gh_cli_candidates():
|
||||
try:
|
||||
result = subprocess.run(
|
||||
[gh_path, "auth", "token"],
|
||||
capture_output=True,
|
||||
text=True,
|
||||
timeout=5,
|
||||
)
|
||||
except (FileNotFoundError, subprocess.TimeoutExpired) as exc:
|
||||
logger.debug("gh CLI token lookup failed (%s): %s", gh_path, exc)
|
||||
continue
|
||||
if result.returncode == 0 and result.stdout.strip():
|
||||
return result.stdout.strip()
|
||||
return None
|
||||
|
||||
|
||||
def _resolve_api_key_provider_secret(
|
||||
provider_id: str, pconfig: ProviderConfig
|
||||
) -> tuple[str, str]:
|
||||
"""Resolve an API-key provider's token and indicate where it came from."""
|
||||
if provider_id == "copilot":
|
||||
# Use the dedicated copilot auth module for proper token validation
|
||||
try:
|
||||
from hermes_cli.copilot_auth import resolve_copilot_token
|
||||
token, source = resolve_copilot_token()
|
||||
if token:
|
||||
return token, source
|
||||
except ValueError as exc:
|
||||
logger.warning("Copilot token validation failed: %s", exc)
|
||||
except Exception:
|
||||
pass
|
||||
return "", ""
|
||||
|
||||
for env_var in pconfig.api_key_env_vars:
|
||||
val = os.getenv(env_var, "").strip()
|
||||
if val:
|
||||
return val, env_var
|
||||
|
||||
return "", ""
|
||||
|
||||
|
||||
# =============================================================================
|
||||
# Z.AI Endpoint Detection
|
||||
# =============================================================================
|
||||
@@ -572,6 +653,9 @@ def resolve_provider(
|
||||
"kimi": "kimi-coding", "moonshot": "kimi-coding",
|
||||
"minimax-china": "minimax-cn", "minimax_cn": "minimax-cn",
|
||||
"claude": "anthropic", "claude-code": "anthropic",
|
||||
"github": "copilot", "github-copilot": "copilot",
|
||||
"github-models": "copilot", "github-model": "copilot",
|
||||
"github-copilot-acp": "copilot-acp", "copilot-acp-agent": "copilot-acp",
|
||||
"aigateway": "ai-gateway", "vercel": "ai-gateway", "vercel-ai-gateway": "ai-gateway",
|
||||
"opencode": "opencode-zen", "zen": "opencode-zen",
|
||||
"go": "opencode-go", "opencode-go-sub": "opencode-go",
|
||||
@@ -611,6 +695,11 @@ def resolve_provider(
|
||||
for pid, pconfig in PROVIDER_REGISTRY.items():
|
||||
if pconfig.auth_type != "api_key":
|
||||
continue
|
||||
# GitHub tokens are commonly present for repo/tool access but should not
|
||||
# hijack inference auto-selection unless the user explicitly chooses
|
||||
# Copilot/GitHub Models as the provider.
|
||||
if pid == "copilot":
|
||||
continue
|
||||
for env_var in pconfig.api_key_env_vars:
|
||||
if os.getenv(env_var, "").strip():
|
||||
return pid
|
||||
@@ -1479,12 +1568,7 @@ def get_api_key_provider_status(provider_id: str) -> Dict[str, Any]:
|
||||
|
||||
api_key = ""
|
||||
key_source = ""
|
||||
for env_var in pconfig.api_key_env_vars:
|
||||
val = os.getenv(env_var, "").strip()
|
||||
if val:
|
||||
api_key = val
|
||||
key_source = env_var
|
||||
break
|
||||
api_key, key_source = _resolve_api_key_provider_secret(provider_id, pconfig)
|
||||
|
||||
env_url = ""
|
||||
if pconfig.base_url_env_var:
|
||||
@@ -1507,6 +1591,36 @@ def get_api_key_provider_status(provider_id: str) -> Dict[str, Any]:
|
||||
}
|
||||
|
||||
|
||||
def get_external_process_provider_status(provider_id: str) -> Dict[str, Any]:
|
||||
"""Status snapshot for providers that run a local subprocess."""
|
||||
pconfig = PROVIDER_REGISTRY.get(provider_id)
|
||||
if not pconfig or pconfig.auth_type != "external_process":
|
||||
return {"configured": False}
|
||||
|
||||
command = (
|
||||
os.getenv("HERMES_COPILOT_ACP_COMMAND", "").strip()
|
||||
or os.getenv("COPILOT_CLI_PATH", "").strip()
|
||||
or "copilot"
|
||||
)
|
||||
raw_args = os.getenv("HERMES_COPILOT_ACP_ARGS", "").strip()
|
||||
args = shlex.split(raw_args) if raw_args else ["--acp", "--stdio"]
|
||||
base_url = os.getenv(pconfig.base_url_env_var, "").strip() if pconfig.base_url_env_var else ""
|
||||
if not base_url:
|
||||
base_url = pconfig.inference_base_url
|
||||
|
||||
resolved_command = shutil.which(command) if command else None
|
||||
return {
|
||||
"configured": bool(resolved_command or base_url.startswith("acp+tcp://")),
|
||||
"provider": provider_id,
|
||||
"name": pconfig.name,
|
||||
"command": command,
|
||||
"args": args,
|
||||
"resolved_command": resolved_command,
|
||||
"base_url": base_url,
|
||||
"logged_in": bool(resolved_command or base_url.startswith("acp+tcp://")),
|
||||
}
|
||||
|
||||
|
||||
def get_auth_status(provider_id: Optional[str] = None) -> Dict[str, Any]:
|
||||
"""Generic auth status dispatcher."""
|
||||
target = provider_id or get_active_provider()
|
||||
@@ -1514,6 +1628,8 @@ def get_auth_status(provider_id: Optional[str] = None) -> Dict[str, Any]:
|
||||
return get_nous_auth_status()
|
||||
if target == "openai-codex":
|
||||
return get_codex_auth_status()
|
||||
if target == "copilot-acp":
|
||||
return get_external_process_provider_status(target)
|
||||
# API-key providers
|
||||
pconfig = PROVIDER_REGISTRY.get(target)
|
||||
if pconfig and pconfig.auth_type == "api_key":
|
||||
@@ -1536,12 +1652,7 @@ def resolve_api_key_provider_credentials(provider_id: str) -> Dict[str, Any]:
|
||||
|
||||
api_key = ""
|
||||
key_source = ""
|
||||
for env_var in pconfig.api_key_env_vars:
|
||||
val = os.getenv(env_var, "").strip()
|
||||
if val:
|
||||
api_key = val
|
||||
key_source = env_var
|
||||
break
|
||||
api_key, key_source = _resolve_api_key_provider_secret(provider_id, pconfig)
|
||||
|
||||
env_url = ""
|
||||
if pconfig.base_url_env_var:
|
||||
@@ -1562,6 +1673,46 @@ def resolve_api_key_provider_credentials(provider_id: str) -> Dict[str, Any]:
|
||||
}
|
||||
|
||||
|
||||
def resolve_external_process_provider_credentials(provider_id: str) -> Dict[str, Any]:
|
||||
"""Resolve runtime details for local subprocess-backed providers."""
|
||||
pconfig = PROVIDER_REGISTRY.get(provider_id)
|
||||
if not pconfig or pconfig.auth_type != "external_process":
|
||||
raise AuthError(
|
||||
f"Provider '{provider_id}' is not an external-process provider.",
|
||||
provider=provider_id,
|
||||
code="invalid_provider",
|
||||
)
|
||||
|
||||
base_url = os.getenv(pconfig.base_url_env_var, "").strip() if pconfig.base_url_env_var else ""
|
||||
if not base_url:
|
||||
base_url = pconfig.inference_base_url
|
||||
|
||||
command = (
|
||||
os.getenv("HERMES_COPILOT_ACP_COMMAND", "").strip()
|
||||
or os.getenv("COPILOT_CLI_PATH", "").strip()
|
||||
or "copilot"
|
||||
)
|
||||
raw_args = os.getenv("HERMES_COPILOT_ACP_ARGS", "").strip()
|
||||
args = shlex.split(raw_args) if raw_args else ["--acp", "--stdio"]
|
||||
resolved_command = shutil.which(command) if command else None
|
||||
if not resolved_command and not base_url.startswith("acp+tcp://"):
|
||||
raise AuthError(
|
||||
f"Could not find the Copilot CLI command '{command}'. "
|
||||
"Install GitHub Copilot CLI or set HERMES_COPILOT_ACP_COMMAND/COPILOT_CLI_PATH.",
|
||||
provider=provider_id,
|
||||
code="missing_copilot_cli",
|
||||
)
|
||||
|
||||
return {
|
||||
"provider": provider_id,
|
||||
"api_key": "copilot-acp",
|
||||
"base_url": base_url.rstrip("/"),
|
||||
"command": resolved_command or command,
|
||||
"args": args,
|
||||
"source": "process",
|
||||
}
|
||||
|
||||
|
||||
# =============================================================================
|
||||
# External credential detection
|
||||
# =============================================================================
|
||||
|
||||
@@ -289,6 +289,8 @@ def build_welcome_banner(console: Console, model: str, cwd: str,
|
||||
_hero = HERMES_CADUCEUS
|
||||
left_lines = ["", _hero, ""]
|
||||
model_short = model.split("/")[-1] if "/" in model else model
|
||||
if model_short.endswith(".gguf"):
|
||||
model_short = model_short[:-5]
|
||||
if len(model_short) > 28:
|
||||
model_short = model_short[:25] + "..."
|
||||
ctx_str = f" [dim {dim}]·[/] [dim {dim}]{_format_context_length(context_length)} context[/]" if context_length else ""
|
||||
|
||||
@@ -61,8 +61,14 @@ COMMAND_REGISTRY: list[CommandDef] = [
|
||||
CommandDef("rollback", "List or restore filesystem checkpoints", "Session",
|
||||
args_hint="[number]"),
|
||||
CommandDef("stop", "Kill all running background processes", "Session"),
|
||||
CommandDef("approve", "Approve a pending dangerous command", "Session",
|
||||
gateway_only=True, args_hint="[session|always]"),
|
||||
CommandDef("deny", "Deny a pending dangerous command", "Session",
|
||||
gateway_only=True),
|
||||
CommandDef("background", "Run a prompt in the background", "Session",
|
||||
aliases=("bg",), args_hint="<prompt>"),
|
||||
CommandDef("queue", "Queue a prompt for the next turn (doesn't interrupt)", "Session",
|
||||
aliases=("q",), args_hint="<prompt>"),
|
||||
CommandDef("status", "Show session info", "Session",
|
||||
gateway_only=True),
|
||||
CommandDef("sethome", "Set this chat as the home channel", "Session",
|
||||
|
||||
@@ -670,6 +670,11 @@ OPTIONAL_ENV_VARS = {
|
||||
"password": True,
|
||||
"category": "tool",
|
||||
},
|
||||
"HONCHO_BASE_URL": {
|
||||
"description": "Base URL for self-hosted Honcho instances (no API key needed)",
|
||||
"prompt": "Honcho base URL (e.g. http://localhost:8000)",
|
||||
"category": "tool",
|
||||
},
|
||||
|
||||
# ── Messaging platforms ──
|
||||
"TELEGRAM_BOT_TOKEN": {
|
||||
@@ -807,6 +812,27 @@ OPTIONAL_ENV_VARS = {
|
||||
"category": "messaging",
|
||||
"advanced": True,
|
||||
},
|
||||
"WEBHOOK_ENABLED": {
|
||||
"description": "Enable the webhook platform adapter for receiving events from GitHub, GitLab, etc.",
|
||||
"prompt": "Enable webhooks (true/false)",
|
||||
"url": None,
|
||||
"password": False,
|
||||
"category": "messaging",
|
||||
},
|
||||
"WEBHOOK_PORT": {
|
||||
"description": "Port for the webhook HTTP server (default: 8644).",
|
||||
"prompt": "Webhook port",
|
||||
"url": None,
|
||||
"password": False,
|
||||
"category": "messaging",
|
||||
},
|
||||
"WEBHOOK_SECRET": {
|
||||
"description": "Global HMAC secret for webhook signature validation (overridable per route in config.yaml).",
|
||||
"prompt": "Webhook secret",
|
||||
"url": None,
|
||||
"password": True,
|
||||
"category": "messaging",
|
||||
},
|
||||
|
||||
# ── Agent settings ──
|
||||
"MESSAGING_CWD": {
|
||||
|
||||
@@ -0,0 +1,295 @@
|
||||
"""GitHub Copilot authentication utilities.
|
||||
|
||||
Implements the OAuth device code flow used by the Copilot CLI and handles
|
||||
token validation/exchange for the Copilot API.
|
||||
|
||||
Token type support (per GitHub docs):
|
||||
gho_ OAuth token ✓ (default via copilot login)
|
||||
github_pat_ Fine-grained PAT ✓ (needs Copilot Requests permission)
|
||||
ghu_ GitHub App token ✓ (via environment variable)
|
||||
ghp_ Classic PAT ✗ NOT SUPPORTED
|
||||
|
||||
Credential search order (matching Copilot CLI behaviour):
|
||||
1. COPILOT_GITHUB_TOKEN env var
|
||||
2. GH_TOKEN env var
|
||||
3. GITHUB_TOKEN env var
|
||||
4. gh auth token CLI fallback
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import json
|
||||
import logging
|
||||
import os
|
||||
import re
|
||||
import shutil
|
||||
import subprocess
|
||||
import time
|
||||
from pathlib import Path
|
||||
from typing import Any, Optional
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
# OAuth device code flow constants (same client ID as opencode/Copilot CLI)
|
||||
COPILOT_OAUTH_CLIENT_ID = "Ov23li8tweQw6odWQebz"
|
||||
COPILOT_DEVICE_CODE_URL = "https://github.com/login/device/code"
|
||||
COPILOT_ACCESS_TOKEN_URL = "https://github.com/login/oauth/access_token"
|
||||
|
||||
# Copilot API constants
|
||||
COPILOT_TOKEN_EXCHANGE_URL = "https://api.github.com/copilot_internal/v2/token"
|
||||
COPILOT_API_BASE_URL = "https://api.githubcopilot.com"
|
||||
|
||||
# Token type prefixes
|
||||
_CLASSIC_PAT_PREFIX = "ghp_"
|
||||
_SUPPORTED_PREFIXES = ("gho_", "github_pat_", "ghu_")
|
||||
|
||||
# Env var search order (matches Copilot CLI)
|
||||
COPILOT_ENV_VARS = ("COPILOT_GITHUB_TOKEN", "GH_TOKEN", "GITHUB_TOKEN")
|
||||
|
||||
# Polling constants
|
||||
_DEVICE_CODE_POLL_INTERVAL = 5 # seconds
|
||||
_DEVICE_CODE_POLL_SAFETY_MARGIN = 3 # seconds
|
||||
|
||||
|
||||
def is_classic_pat(token: str) -> bool:
|
||||
"""Check if a token is a classic PAT (ghp_*), which Copilot doesn't support."""
|
||||
return token.strip().startswith(_CLASSIC_PAT_PREFIX)
|
||||
|
||||
|
||||
def validate_copilot_token(token: str) -> tuple[bool, str]:
|
||||
"""Validate that a token is usable with the Copilot API.
|
||||
|
||||
Returns (valid, message).
|
||||
"""
|
||||
token = token.strip()
|
||||
if not token:
|
||||
return False, "Empty token"
|
||||
|
||||
if token.startswith(_CLASSIC_PAT_PREFIX):
|
||||
return False, (
|
||||
"Classic Personal Access Tokens (ghp_*) are not supported by the "
|
||||
"Copilot API. Use one of:\n"
|
||||
" → `copilot login` or `hermes model` to authenticate via OAuth\n"
|
||||
" → A fine-grained PAT (github_pat_*) with Copilot Requests permission\n"
|
||||
" → `gh auth login` with the default device code flow (produces gho_* tokens)"
|
||||
)
|
||||
|
||||
return True, "OK"
|
||||
|
||||
|
||||
def resolve_copilot_token() -> tuple[str, str]:
|
||||
"""Resolve a GitHub token suitable for Copilot API use.
|
||||
|
||||
Returns (token, source) where source describes where the token came from.
|
||||
Raises ValueError if only a classic PAT is available.
|
||||
"""
|
||||
# 1. Check env vars in priority order
|
||||
for env_var in COPILOT_ENV_VARS:
|
||||
val = os.getenv(env_var, "").strip()
|
||||
if val:
|
||||
valid, msg = validate_copilot_token(val)
|
||||
if not valid:
|
||||
logger.warning(
|
||||
"Token from %s is not supported: %s", env_var, msg
|
||||
)
|
||||
continue
|
||||
return val, env_var
|
||||
|
||||
# 2. Fall back to gh auth token
|
||||
token = _try_gh_cli_token()
|
||||
if token:
|
||||
valid, msg = validate_copilot_token(token)
|
||||
if not valid:
|
||||
raise ValueError(
|
||||
f"Token from `gh auth token` is a classic PAT (ghp_*). {msg}"
|
||||
)
|
||||
return token, "gh auth token"
|
||||
|
||||
return "", ""
|
||||
|
||||
|
||||
def _gh_cli_candidates() -> list[str]:
|
||||
"""Return candidate ``gh`` binary paths, including common Homebrew installs."""
|
||||
candidates: list[str] = []
|
||||
|
||||
resolved = shutil.which("gh")
|
||||
if resolved:
|
||||
candidates.append(resolved)
|
||||
|
||||
for candidate in (
|
||||
"/opt/homebrew/bin/gh",
|
||||
"/usr/local/bin/gh",
|
||||
str(Path.home() / ".local" / "bin" / "gh"),
|
||||
):
|
||||
if candidate in candidates:
|
||||
continue
|
||||
if os.path.isfile(candidate) and os.access(candidate, os.X_OK):
|
||||
candidates.append(candidate)
|
||||
|
||||
return candidates
|
||||
|
||||
|
||||
def _try_gh_cli_token() -> Optional[str]:
|
||||
"""Return a token from ``gh auth token`` when the GitHub CLI is available."""
|
||||
for gh_path in _gh_cli_candidates():
|
||||
try:
|
||||
result = subprocess.run(
|
||||
[gh_path, "auth", "token"],
|
||||
capture_output=True,
|
||||
text=True,
|
||||
timeout=5,
|
||||
)
|
||||
except (FileNotFoundError, subprocess.TimeoutExpired) as exc:
|
||||
logger.debug("gh CLI token lookup failed (%s): %s", gh_path, exc)
|
||||
continue
|
||||
if result.returncode == 0 and result.stdout.strip():
|
||||
return result.stdout.strip()
|
||||
return None
|
||||
|
||||
|
||||
# ─── OAuth Device Code Flow ────────────────────────────────────────────────
|
||||
|
||||
def copilot_device_code_login(
|
||||
*,
|
||||
host: str = "github.com",
|
||||
timeout_seconds: float = 300,
|
||||
) -> Optional[str]:
|
||||
"""Run the GitHub OAuth device code flow for Copilot.
|
||||
|
||||
Prints instructions for the user, polls for completion, and returns
|
||||
the OAuth access token on success, or None on failure/cancellation.
|
||||
|
||||
This replicates the flow used by opencode and the Copilot CLI.
|
||||
"""
|
||||
import urllib.request
|
||||
import urllib.parse
|
||||
|
||||
domain = host.rstrip("/")
|
||||
device_code_url = f"https://{domain}/login/device/code"
|
||||
access_token_url = f"https://{domain}/login/oauth/access_token"
|
||||
|
||||
# Step 1: Request device code
|
||||
data = urllib.parse.urlencode({
|
||||
"client_id": COPILOT_OAUTH_CLIENT_ID,
|
||||
"scope": "read:user",
|
||||
}).encode()
|
||||
|
||||
req = urllib.request.Request(
|
||||
device_code_url,
|
||||
data=data,
|
||||
headers={
|
||||
"Accept": "application/json",
|
||||
"Content-Type": "application/x-www-form-urlencoded",
|
||||
"User-Agent": "HermesAgent/1.0",
|
||||
},
|
||||
)
|
||||
|
||||
try:
|
||||
with urllib.request.urlopen(req, timeout=15) as resp:
|
||||
device_data = json.loads(resp.read().decode())
|
||||
except Exception as exc:
|
||||
logger.error("Failed to initiate device authorization: %s", exc)
|
||||
print(f" ✗ Failed to start device authorization: {exc}")
|
||||
return None
|
||||
|
||||
verification_uri = device_data.get("verification_uri", "https://github.com/login/device")
|
||||
user_code = device_data.get("user_code", "")
|
||||
device_code = device_data.get("device_code", "")
|
||||
interval = max(device_data.get("interval", _DEVICE_CODE_POLL_INTERVAL), 1)
|
||||
|
||||
if not device_code or not user_code:
|
||||
print(" ✗ GitHub did not return a device code.")
|
||||
return None
|
||||
|
||||
# Step 2: Show instructions
|
||||
print()
|
||||
print(f" Open this URL in your browser: {verification_uri}")
|
||||
print(f" Enter this code: {user_code}")
|
||||
print()
|
||||
print(" Waiting for authorization...", end="", flush=True)
|
||||
|
||||
# Step 3: Poll for completion
|
||||
deadline = time.time() + timeout_seconds
|
||||
|
||||
while time.time() < deadline:
|
||||
time.sleep(interval + _DEVICE_CODE_POLL_SAFETY_MARGIN)
|
||||
|
||||
poll_data = urllib.parse.urlencode({
|
||||
"client_id": COPILOT_OAUTH_CLIENT_ID,
|
||||
"device_code": device_code,
|
||||
"grant_type": "urn:ietf:params:oauth:grant-type:device_code",
|
||||
}).encode()
|
||||
|
||||
poll_req = urllib.request.Request(
|
||||
access_token_url,
|
||||
data=poll_data,
|
||||
headers={
|
||||
"Accept": "application/json",
|
||||
"Content-Type": "application/x-www-form-urlencoded",
|
||||
"User-Agent": "HermesAgent/1.0",
|
||||
},
|
||||
)
|
||||
|
||||
try:
|
||||
with urllib.request.urlopen(poll_req, timeout=10) as resp:
|
||||
result = json.loads(resp.read().decode())
|
||||
except Exception:
|
||||
print(".", end="", flush=True)
|
||||
continue
|
||||
|
||||
if result.get("access_token"):
|
||||
print(" ✓")
|
||||
return result["access_token"]
|
||||
|
||||
error = result.get("error", "")
|
||||
if error == "authorization_pending":
|
||||
print(".", end="", flush=True)
|
||||
continue
|
||||
elif error == "slow_down":
|
||||
# RFC 8628: add 5 seconds to polling interval
|
||||
server_interval = result.get("interval")
|
||||
if isinstance(server_interval, (int, float)) and server_interval > 0:
|
||||
interval = int(server_interval)
|
||||
else:
|
||||
interval += 5
|
||||
print(".", end="", flush=True)
|
||||
continue
|
||||
elif error == "expired_token":
|
||||
print()
|
||||
print(" ✗ Device code expired. Please try again.")
|
||||
return None
|
||||
elif error == "access_denied":
|
||||
print()
|
||||
print(" ✗ Authorization was denied.")
|
||||
return None
|
||||
elif error:
|
||||
print()
|
||||
print(f" ✗ Authorization failed: {error}")
|
||||
return None
|
||||
|
||||
print()
|
||||
print(" ✗ Timed out waiting for authorization.")
|
||||
return None
|
||||
|
||||
|
||||
# ─── Copilot API Headers ───────────────────────────────────────────────────
|
||||
|
||||
def copilot_request_headers(
|
||||
*,
|
||||
is_agent_turn: bool = True,
|
||||
is_vision: bool = False,
|
||||
) -> dict[str, str]:
|
||||
"""Build the standard headers for Copilot API requests.
|
||||
|
||||
Replicates the header set used by opencode and the Copilot CLI.
|
||||
"""
|
||||
headers: dict[str, str] = {
|
||||
"Editor-Version": "vscode/1.104.1",
|
||||
"User-Agent": "HermesAgent/1.0",
|
||||
"Openai-Intent": "conversation-edits",
|
||||
"x-initiator": "agent" if is_agent_turn else "user",
|
||||
}
|
||||
if is_vision:
|
||||
headers["Copilot-Vision-Request"] = "true"
|
||||
|
||||
return headers
|
||||
+444
-13
@@ -125,6 +125,17 @@ def _has_any_provider_configured() -> bool:
|
||||
except Exception:
|
||||
pass
|
||||
|
||||
# Check provider-specific auth fallbacks (for example, Copilot via gh auth).
|
||||
try:
|
||||
for provider_id, pconfig in PROVIDER_REGISTRY.items():
|
||||
if pconfig.auth_type != "api_key":
|
||||
continue
|
||||
status = get_auth_status(provider_id)
|
||||
if status.get("logged_in"):
|
||||
return True
|
||||
except Exception:
|
||||
pass
|
||||
|
||||
# Check for Nous Portal OAuth credentials
|
||||
auth_file = get_hermes_home() / "auth.json"
|
||||
if auth_file.exists():
|
||||
@@ -775,6 +786,8 @@ def cmd_model(args):
|
||||
"openrouter": "OpenRouter",
|
||||
"nous": "Nous Portal",
|
||||
"openai-codex": "OpenAI Codex",
|
||||
"copilot-acp": "GitHub Copilot ACP",
|
||||
"copilot": "GitHub Copilot",
|
||||
"anthropic": "Anthropic",
|
||||
"zai": "Z.AI / GLM",
|
||||
"kimi-coding": "Kimi / Moonshot",
|
||||
@@ -799,6 +812,8 @@ def cmd_model(args):
|
||||
("openrouter", "OpenRouter (100+ models, pay-per-use)"),
|
||||
("nous", "Nous Portal (Nous Research subscription)"),
|
||||
("openai-codex", "OpenAI Codex"),
|
||||
("copilot-acp", "GitHub Copilot ACP (spawns `copilot --acp --stdio`)"),
|
||||
("copilot", "GitHub Copilot (uses GITHUB_TOKEN or gh auth token)"),
|
||||
("anthropic", "Anthropic (Claude models — API key or Claude Code)"),
|
||||
("zai", "Z.AI / GLM (Zhipu AI direct API)"),
|
||||
("kimi-coding", "Kimi / Moonshot (Moonshot AI direct API)"),
|
||||
@@ -867,6 +882,10 @@ def cmd_model(args):
|
||||
_model_flow_nous(config, current_model)
|
||||
elif selected_provider == "openai-codex":
|
||||
_model_flow_openai_codex(config, current_model)
|
||||
elif selected_provider == "copilot-acp":
|
||||
_model_flow_copilot_acp(config, current_model)
|
||||
elif selected_provider == "copilot":
|
||||
_model_flow_copilot(config, current_model)
|
||||
elif selected_provider == "custom":
|
||||
_model_flow_custom(config)
|
||||
elif selected_provider.startswith("custom:") and selected_provider in _custom_provider_map:
|
||||
@@ -1118,10 +1137,21 @@ def _model_flow_custom(config):
|
||||
base_url = input(f"API base URL [{current_url or 'e.g. https://api.example.com/v1'}]: ").strip()
|
||||
api_key = input(f"API key [{current_key[:8] + '...' if current_key else 'optional'}]: ").strip()
|
||||
model_name = input("Model name (e.g. gpt-4, llama-3-70b): ").strip()
|
||||
context_length_str = input("Context length in tokens [leave blank for auto-detect]: ").strip()
|
||||
except (KeyboardInterrupt, EOFError):
|
||||
print("\nCancelled.")
|
||||
return
|
||||
|
||||
context_length = None
|
||||
if context_length_str:
|
||||
try:
|
||||
context_length = int(context_length_str.replace(",", "").replace("k", "000").replace("K", "000"))
|
||||
if context_length <= 0:
|
||||
context_length = None
|
||||
except ValueError:
|
||||
print(f"Invalid context length: {context_length_str} — will auto-detect.")
|
||||
context_length = None
|
||||
|
||||
if not base_url and not current_url:
|
||||
print("No URL provided. Cancelled.")
|
||||
return
|
||||
@@ -1184,14 +1214,14 @@ def _model_flow_custom(config):
|
||||
print("Endpoint saved. Use `/model` in chat or `hermes model` to set a model.")
|
||||
|
||||
# Auto-save to custom_providers so it appears in the menu next time
|
||||
_save_custom_provider(effective_url, effective_key, model_name or "")
|
||||
_save_custom_provider(effective_url, effective_key, model_name or "", context_length=context_length)
|
||||
|
||||
|
||||
def _save_custom_provider(base_url, api_key="", model=""):
|
||||
def _save_custom_provider(base_url, api_key="", model="", context_length=None):
|
||||
"""Save a custom endpoint to custom_providers in config.yaml.
|
||||
|
||||
Deduplicates by base_url — if the URL already exists, updates the
|
||||
model name but doesn't add a duplicate entry.
|
||||
model name and context_length but doesn't add a duplicate entry.
|
||||
Auto-generates a display name from the URL hostname.
|
||||
"""
|
||||
from hermes_cli.config import load_config, save_config
|
||||
@@ -1201,14 +1231,24 @@ def _save_custom_provider(base_url, api_key="", model=""):
|
||||
if not isinstance(providers, list):
|
||||
providers = []
|
||||
|
||||
# Check if this URL is already saved — update model if so
|
||||
# Check if this URL is already saved — update model/context_length if so
|
||||
for entry in providers:
|
||||
if isinstance(entry, dict) and entry.get("base_url", "").rstrip("/") == base_url.rstrip("/"):
|
||||
changed = False
|
||||
if model and entry.get("model") != model:
|
||||
entry["model"] = model
|
||||
changed = True
|
||||
if model and context_length:
|
||||
models_cfg = entry.get("models", {})
|
||||
if not isinstance(models_cfg, dict):
|
||||
models_cfg = {}
|
||||
models_cfg[model] = {"context_length": context_length}
|
||||
entry["models"] = models_cfg
|
||||
changed = True
|
||||
if changed:
|
||||
cfg["custom_providers"] = providers
|
||||
save_config(cfg)
|
||||
return # already saved, updated model if needed
|
||||
return # already saved, updated if needed
|
||||
|
||||
# Auto-generate a name from the URL
|
||||
import re
|
||||
@@ -1230,6 +1270,8 @@ def _save_custom_provider(base_url, api_key="", model=""):
|
||||
entry["api_key"] = api_key
|
||||
if model:
|
||||
entry["model"] = model
|
||||
if model and context_length:
|
||||
entry["models"] = {model: {"context_length": context_length}}
|
||||
|
||||
providers.append(entry)
|
||||
cfg["custom_providers"] = providers
|
||||
@@ -1407,6 +1449,25 @@ def _model_flow_named_custom(config, provider_info):
|
||||
|
||||
# Curated model lists for direct API-key providers
|
||||
_PROVIDER_MODELS = {
|
||||
"copilot-acp": [
|
||||
"copilot-acp",
|
||||
],
|
||||
"copilot": [
|
||||
"gpt-5.4",
|
||||
"gpt-5.4-mini",
|
||||
"gpt-5-mini",
|
||||
"gpt-5.3-codex",
|
||||
"gpt-5.2-codex",
|
||||
"gpt-4.1",
|
||||
"gpt-4o",
|
||||
"gpt-4o-mini",
|
||||
"claude-opus-4.6",
|
||||
"claude-sonnet-4.6",
|
||||
"claude-sonnet-4.5",
|
||||
"claude-haiku-4.5",
|
||||
"gemini-2.5-pro",
|
||||
"grok-code-fast-1",
|
||||
],
|
||||
"zai": [
|
||||
"glm-5",
|
||||
"glm-4.7",
|
||||
@@ -1447,6 +1508,376 @@ _PROVIDER_MODELS = {
|
||||
}
|
||||
|
||||
|
||||
def _current_reasoning_effort(config) -> str:
|
||||
agent_cfg = config.get("agent")
|
||||
if isinstance(agent_cfg, dict):
|
||||
return str(agent_cfg.get("reasoning_effort") or "").strip().lower()
|
||||
return ""
|
||||
|
||||
|
||||
def _set_reasoning_effort(config, effort: str) -> None:
|
||||
agent_cfg = config.get("agent")
|
||||
if not isinstance(agent_cfg, dict):
|
||||
agent_cfg = {}
|
||||
config["agent"] = agent_cfg
|
||||
agent_cfg["reasoning_effort"] = effort
|
||||
|
||||
|
||||
def _prompt_reasoning_effort_selection(efforts, current_effort=""):
|
||||
"""Prompt for a reasoning effort. Returns effort, 'none', or None to keep current."""
|
||||
ordered = list(dict.fromkeys(str(effort).strip().lower() for effort in efforts if str(effort).strip()))
|
||||
if not ordered:
|
||||
return None
|
||||
|
||||
def _label(effort):
|
||||
if effort == current_effort:
|
||||
return f"{effort} ← currently in use"
|
||||
return effort
|
||||
|
||||
disable_label = "Disable reasoning"
|
||||
skip_label = "Skip (keep current)"
|
||||
|
||||
if current_effort == "none":
|
||||
default_idx = len(ordered)
|
||||
elif current_effort in ordered:
|
||||
default_idx = ordered.index(current_effort)
|
||||
elif "medium" in ordered:
|
||||
default_idx = ordered.index("medium")
|
||||
else:
|
||||
default_idx = 0
|
||||
|
||||
try:
|
||||
from simple_term_menu import TerminalMenu
|
||||
|
||||
choices = [f" {_label(effort)}" for effort in ordered]
|
||||
choices.append(f" {disable_label}")
|
||||
choices.append(f" {skip_label}")
|
||||
menu = TerminalMenu(
|
||||
choices,
|
||||
cursor_index=default_idx,
|
||||
menu_cursor="-> ",
|
||||
menu_cursor_style=("fg_green", "bold"),
|
||||
menu_highlight_style=("fg_green",),
|
||||
cycle_cursor=True,
|
||||
clear_screen=False,
|
||||
title="Select reasoning effort:",
|
||||
)
|
||||
idx = menu.show()
|
||||
if idx is None:
|
||||
return None
|
||||
print()
|
||||
if idx < len(ordered):
|
||||
return ordered[idx]
|
||||
if idx == len(ordered):
|
||||
return "none"
|
||||
return None
|
||||
except (ImportError, NotImplementedError):
|
||||
pass
|
||||
|
||||
print("Select reasoning effort:")
|
||||
for i, effort in enumerate(ordered, 1):
|
||||
print(f" {i}. {_label(effort)}")
|
||||
n = len(ordered)
|
||||
print(f" {n + 1}. {disable_label}")
|
||||
print(f" {n + 2}. {skip_label}")
|
||||
print()
|
||||
|
||||
while True:
|
||||
try:
|
||||
choice = input(f"Choice [1-{n + 2}] (default: keep current): ").strip()
|
||||
if not choice:
|
||||
return None
|
||||
idx = int(choice)
|
||||
if 1 <= idx <= n:
|
||||
return ordered[idx - 1]
|
||||
if idx == n + 1:
|
||||
return "none"
|
||||
if idx == n + 2:
|
||||
return None
|
||||
print(f"Please enter 1-{n + 2}")
|
||||
except ValueError:
|
||||
print("Please enter a number")
|
||||
except (KeyboardInterrupt, EOFError):
|
||||
return None
|
||||
|
||||
|
||||
def _model_flow_copilot(config, current_model=""):
|
||||
"""GitHub Copilot flow using env vars, gh CLI, or OAuth device code."""
|
||||
from hermes_cli.auth import (
|
||||
PROVIDER_REGISTRY,
|
||||
_prompt_model_selection,
|
||||
_save_model_choice,
|
||||
deactivate_provider,
|
||||
resolve_api_key_provider_credentials,
|
||||
)
|
||||
from hermes_cli.config import get_env_value, save_env_value, load_config, save_config
|
||||
from hermes_cli.models import (
|
||||
fetch_api_models,
|
||||
fetch_github_model_catalog,
|
||||
github_model_reasoning_efforts,
|
||||
copilot_model_api_mode,
|
||||
normalize_copilot_model_id,
|
||||
)
|
||||
|
||||
provider_id = "copilot"
|
||||
pconfig = PROVIDER_REGISTRY[provider_id]
|
||||
|
||||
creds = resolve_api_key_provider_credentials(provider_id)
|
||||
api_key = creds.get("api_key", "")
|
||||
source = creds.get("source", "")
|
||||
|
||||
if not api_key:
|
||||
print("No GitHub token configured for GitHub Copilot.")
|
||||
print()
|
||||
print(" Supported token types:")
|
||||
print(" → OAuth token (gho_*) via `copilot login` or device code flow")
|
||||
print(" → Fine-grained PAT (github_pat_*) with Copilot Requests permission")
|
||||
print(" → GitHub App token (ghu_*) via environment variable")
|
||||
print(" ✗ Classic PAT (ghp_*) NOT supported by Copilot API")
|
||||
print()
|
||||
print(" Options:")
|
||||
print(" 1. Login with GitHub (OAuth device code flow)")
|
||||
print(" 2. Enter a token manually")
|
||||
print(" 3. Cancel")
|
||||
print()
|
||||
try:
|
||||
choice = input(" Choice [1-3]: ").strip()
|
||||
except (KeyboardInterrupt, EOFError):
|
||||
print()
|
||||
return
|
||||
|
||||
if choice == "1":
|
||||
try:
|
||||
from hermes_cli.copilot_auth import copilot_device_code_login
|
||||
token = copilot_device_code_login()
|
||||
if token:
|
||||
save_env_value("COPILOT_GITHUB_TOKEN", token)
|
||||
print(" Copilot token saved.")
|
||||
print()
|
||||
else:
|
||||
print(" Login cancelled or failed.")
|
||||
return
|
||||
except Exception as exc:
|
||||
print(f" Login failed: {exc}")
|
||||
return
|
||||
elif choice == "2":
|
||||
try:
|
||||
new_key = input(" Token (COPILOT_GITHUB_TOKEN): ").strip()
|
||||
except (KeyboardInterrupt, EOFError):
|
||||
print()
|
||||
return
|
||||
if not new_key:
|
||||
print(" Cancelled.")
|
||||
return
|
||||
# Validate token type
|
||||
try:
|
||||
from hermes_cli.copilot_auth import validate_copilot_token
|
||||
valid, msg = validate_copilot_token(new_key)
|
||||
if not valid:
|
||||
print(f" ✗ {msg}")
|
||||
return
|
||||
except ImportError:
|
||||
pass
|
||||
save_env_value("COPILOT_GITHUB_TOKEN", new_key)
|
||||
print(" Token saved.")
|
||||
print()
|
||||
else:
|
||||
print(" Cancelled.")
|
||||
return
|
||||
|
||||
creds = resolve_api_key_provider_credentials(provider_id)
|
||||
api_key = creds.get("api_key", "")
|
||||
source = creds.get("source", "")
|
||||
else:
|
||||
if source in ("GITHUB_TOKEN", "GH_TOKEN"):
|
||||
print(f" GitHub token: {api_key[:8]}... ✓ ({source})")
|
||||
elif source == "gh auth token":
|
||||
print(" GitHub token: ✓ (from `gh auth token`)")
|
||||
else:
|
||||
print(" GitHub token: ✓")
|
||||
print()
|
||||
|
||||
effective_base = pconfig.inference_base_url
|
||||
|
||||
catalog = fetch_github_model_catalog(api_key)
|
||||
live_models = [item.get("id", "") for item in catalog if item.get("id")] if catalog else fetch_api_models(api_key, effective_base)
|
||||
normalized_current_model = normalize_copilot_model_id(
|
||||
current_model,
|
||||
catalog=catalog,
|
||||
api_key=api_key,
|
||||
) or current_model
|
||||
if live_models:
|
||||
model_list = [model_id for model_id in live_models if model_id]
|
||||
print(f" Found {len(model_list)} model(s) from GitHub Copilot")
|
||||
else:
|
||||
model_list = _PROVIDER_MODELS.get(provider_id, [])
|
||||
if model_list:
|
||||
print(" ⚠ Could not auto-detect models from GitHub Copilot — showing defaults.")
|
||||
print(' Use "Enter custom model name" if you do not see your model.')
|
||||
|
||||
if model_list:
|
||||
selected = _prompt_model_selection(model_list, current_model=normalized_current_model)
|
||||
else:
|
||||
try:
|
||||
selected = input("Model name: ").strip()
|
||||
except (KeyboardInterrupt, EOFError):
|
||||
selected = None
|
||||
|
||||
if selected:
|
||||
selected = normalize_copilot_model_id(
|
||||
selected,
|
||||
catalog=catalog,
|
||||
api_key=api_key,
|
||||
) or selected
|
||||
# Clear stale custom-endpoint overrides so the Copilot provider wins cleanly.
|
||||
if get_env_value("OPENAI_BASE_URL"):
|
||||
save_env_value("OPENAI_BASE_URL", "")
|
||||
save_env_value("OPENAI_API_KEY", "")
|
||||
|
||||
initial_cfg = load_config()
|
||||
current_effort = _current_reasoning_effort(initial_cfg)
|
||||
reasoning_efforts = github_model_reasoning_efforts(
|
||||
selected,
|
||||
catalog=catalog,
|
||||
api_key=api_key,
|
||||
)
|
||||
selected_effort = None
|
||||
if reasoning_efforts:
|
||||
print(f" {selected} supports reasoning controls.")
|
||||
selected_effort = _prompt_reasoning_effort_selection(
|
||||
reasoning_efforts, current_effort=current_effort
|
||||
)
|
||||
|
||||
_save_model_choice(selected)
|
||||
|
||||
cfg = load_config()
|
||||
model = cfg.get("model")
|
||||
if not isinstance(model, dict):
|
||||
model = {"default": model} if model else {}
|
||||
cfg["model"] = model
|
||||
model["provider"] = provider_id
|
||||
model["base_url"] = effective_base
|
||||
model["api_mode"] = copilot_model_api_mode(
|
||||
selected,
|
||||
catalog=catalog,
|
||||
api_key=api_key,
|
||||
)
|
||||
if selected_effort is not None:
|
||||
_set_reasoning_effort(cfg, selected_effort)
|
||||
save_config(cfg)
|
||||
deactivate_provider()
|
||||
|
||||
print(f"Default model set to: {selected} (via {pconfig.name})")
|
||||
if reasoning_efforts:
|
||||
if selected_effort == "none":
|
||||
print("Reasoning disabled for this model.")
|
||||
elif selected_effort:
|
||||
print(f"Reasoning effort set to: {selected_effort}")
|
||||
else:
|
||||
print("No change.")
|
||||
|
||||
|
||||
def _model_flow_copilot_acp(config, current_model=""):
|
||||
"""GitHub Copilot ACP flow using the local Copilot CLI."""
|
||||
from hermes_cli.auth import (
|
||||
PROVIDER_REGISTRY,
|
||||
_prompt_model_selection,
|
||||
_save_model_choice,
|
||||
deactivate_provider,
|
||||
get_external_process_provider_status,
|
||||
resolve_api_key_provider_credentials,
|
||||
resolve_external_process_provider_credentials,
|
||||
)
|
||||
from hermes_cli.models import (
|
||||
fetch_github_model_catalog,
|
||||
normalize_copilot_model_id,
|
||||
)
|
||||
from hermes_cli.config import load_config, save_config
|
||||
|
||||
del config
|
||||
|
||||
provider_id = "copilot-acp"
|
||||
pconfig = PROVIDER_REGISTRY[provider_id]
|
||||
|
||||
status = get_external_process_provider_status(provider_id)
|
||||
resolved_command = status.get("resolved_command") or status.get("command") or "copilot"
|
||||
effective_base = status.get("base_url") or pconfig.inference_base_url
|
||||
|
||||
print(" GitHub Copilot ACP delegates Hermes turns to `copilot --acp`.")
|
||||
print(" Hermes currently starts its own ACP subprocess for each request.")
|
||||
print(" Hermes uses your selected model as a hint for the Copilot ACP session.")
|
||||
print(f" Command: {resolved_command}")
|
||||
print(f" Backend marker: {effective_base}")
|
||||
print()
|
||||
|
||||
try:
|
||||
creds = resolve_external_process_provider_credentials(provider_id)
|
||||
except Exception as exc:
|
||||
print(f" ⚠ {exc}")
|
||||
print(" Set HERMES_COPILOT_ACP_COMMAND or COPILOT_CLI_PATH if Copilot CLI is installed elsewhere.")
|
||||
return
|
||||
|
||||
effective_base = creds.get("base_url") or effective_base
|
||||
|
||||
catalog_api_key = ""
|
||||
try:
|
||||
catalog_creds = resolve_api_key_provider_credentials("copilot")
|
||||
catalog_api_key = catalog_creds.get("api_key", "")
|
||||
except Exception:
|
||||
pass
|
||||
|
||||
catalog = fetch_github_model_catalog(catalog_api_key)
|
||||
normalized_current_model = normalize_copilot_model_id(
|
||||
current_model,
|
||||
catalog=catalog,
|
||||
api_key=catalog_api_key,
|
||||
) or current_model
|
||||
|
||||
if catalog:
|
||||
model_list = [item.get("id", "") for item in catalog if item.get("id")]
|
||||
print(f" Found {len(model_list)} model(s) from GitHub Copilot")
|
||||
else:
|
||||
model_list = _PROVIDER_MODELS.get("copilot", [])
|
||||
if model_list:
|
||||
print(" ⚠ Could not auto-detect models from GitHub Copilot — showing defaults.")
|
||||
print(' Use "Enter custom model name" if you do not see your model.')
|
||||
|
||||
if model_list:
|
||||
selected = _prompt_model_selection(
|
||||
model_list,
|
||||
current_model=normalized_current_model,
|
||||
)
|
||||
else:
|
||||
try:
|
||||
selected = input("Model name: ").strip()
|
||||
except (KeyboardInterrupt, EOFError):
|
||||
selected = None
|
||||
|
||||
if not selected:
|
||||
print("No change.")
|
||||
return
|
||||
|
||||
selected = normalize_copilot_model_id(
|
||||
selected,
|
||||
catalog=catalog,
|
||||
api_key=catalog_api_key,
|
||||
) or selected
|
||||
_save_model_choice(selected)
|
||||
|
||||
cfg = load_config()
|
||||
model = cfg.get("model")
|
||||
if not isinstance(model, dict):
|
||||
model = {"default": model} if model else {}
|
||||
cfg["model"] = model
|
||||
model["provider"] = provider_id
|
||||
model["base_url"] = effective_base
|
||||
model["api_mode"] = "chat_completions"
|
||||
save_config(cfg)
|
||||
deactivate_provider()
|
||||
|
||||
print(f"Default model set to: {selected} (via {pconfig.name})")
|
||||
|
||||
|
||||
def _model_flow_kimi(config, current_model=""):
|
||||
"""Kimi / Moonshot model selection with automatic endpoint routing.
|
||||
|
||||
@@ -2642,7 +3073,7 @@ For more help on a command:
|
||||
)
|
||||
chat_parser.add_argument(
|
||||
"--provider",
|
||||
choices=["auto", "openrouter", "nous", "openai-codex", "anthropic", "zai", "kimi-coding", "minimax", "minimax-cn", "kilocode"],
|
||||
choices=["auto", "openrouter", "nous", "openai-codex", "copilot-acp", "copilot", "anthropic", "zai", "kimi-coding", "minimax", "minimax-cn", "kilocode"],
|
||||
default=None,
|
||||
help="Inference provider (default: auto)"
|
||||
)
|
||||
@@ -3313,20 +3744,20 @@ For more help on a command:
|
||||
return
|
||||
has_titles = any(s.get("title") for s in sessions)
|
||||
if has_titles:
|
||||
print(f"{'Title':<22} {'Preview':<40} {'Last Active':<13} {'ID'}")
|
||||
print("─" * 100)
|
||||
print(f"{'Title':<32} {'Preview':<40} {'Last Active':<13} {'ID'}")
|
||||
print("─" * 110)
|
||||
else:
|
||||
print(f"{'Preview':<50} {'Last Active':<13} {'Src':<6} {'ID'}")
|
||||
print("─" * 90)
|
||||
print("─" * 95)
|
||||
for s in sessions:
|
||||
last_active = _relative_time(s.get("last_active"))
|
||||
preview = s.get("preview", "")[:38] if has_titles else s.get("preview", "")[:48]
|
||||
if has_titles:
|
||||
title = (s.get("title") or "—")[:20]
|
||||
sid = s["id"][:20]
|
||||
print(f"{title:<22} {preview:<40} {last_active:<13} {sid}")
|
||||
title = (s.get("title") or "—")[:30]
|
||||
sid = s["id"]
|
||||
print(f"{title:<32} {preview:<40} {last_active:<13} {sid}")
|
||||
else:
|
||||
sid = s["id"][:20]
|
||||
sid = s["id"]
|
||||
print(f"{preview:<50} {last_active:<13} {s['source']:<6} {sid}")
|
||||
|
||||
elif action == "export":
|
||||
|
||||
+394
-3
@@ -14,6 +14,16 @@ import urllib.error
|
||||
from difflib import get_close_matches
|
||||
from typing import Any, Optional
|
||||
|
||||
COPILOT_BASE_URL = "https://api.githubcopilot.com"
|
||||
COPILOT_MODELS_URL = f"{COPILOT_BASE_URL}/models"
|
||||
COPILOT_EDITOR_VERSION = "vscode/1.104.1"
|
||||
COPILOT_REASONING_EFFORTS_GPT5 = ["minimal", "low", "medium", "high"]
|
||||
COPILOT_REASONING_EFFORTS_O_SERIES = ["low", "medium", "high"]
|
||||
|
||||
# Backward-compatible aliases for the earlier GitHub Models-backed Copilot work.
|
||||
GITHUB_MODELS_BASE_URL = COPILOT_BASE_URL
|
||||
GITHUB_MODELS_CATALOG_URL = COPILOT_MODELS_URL
|
||||
|
||||
# (model_id, display description shown in menus)
|
||||
OPENROUTER_MODELS: list[tuple[str, str]] = [
|
||||
("anthropic/claude-opus-4.6", "recommended"),
|
||||
@@ -55,6 +65,25 @@ _PROVIDER_MODELS: dict[str, list[str]] = {
|
||||
"gpt-5.1-codex-mini",
|
||||
"gpt-5.1-codex-max",
|
||||
],
|
||||
"copilot-acp": [
|
||||
"copilot-acp",
|
||||
],
|
||||
"copilot": [
|
||||
"gpt-5.4",
|
||||
"gpt-5.4-mini",
|
||||
"gpt-5-mini",
|
||||
"gpt-5.3-codex",
|
||||
"gpt-5.2-codex",
|
||||
"gpt-4.1",
|
||||
"gpt-4o",
|
||||
"gpt-4o-mini",
|
||||
"claude-opus-4.6",
|
||||
"claude-sonnet-4.6",
|
||||
"claude-sonnet-4.5",
|
||||
"claude-haiku-4.5",
|
||||
"gemini-2.5-pro",
|
||||
"grok-code-fast-1",
|
||||
],
|
||||
"zai": [
|
||||
"glm-5",
|
||||
"glm-4.7",
|
||||
@@ -173,7 +202,9 @@ _PROVIDER_MODELS: dict[str, list[str]] = {
|
||||
_PROVIDER_LABELS = {
|
||||
"openrouter": "OpenRouter",
|
||||
"openai-codex": "OpenAI Codex",
|
||||
"copilot-acp": "GitHub Copilot ACP",
|
||||
"nous": "Nous Portal",
|
||||
"copilot": "GitHub Copilot",
|
||||
"zai": "Z.AI / GLM",
|
||||
"kimi-coding": "Kimi / Moonshot",
|
||||
"minimax": "MiniMax",
|
||||
@@ -193,6 +224,12 @@ _PROVIDER_ALIASES = {
|
||||
"z-ai": "zai",
|
||||
"z.ai": "zai",
|
||||
"zhipu": "zai",
|
||||
"github": "copilot",
|
||||
"github-copilot": "copilot",
|
||||
"github-models": "copilot",
|
||||
"github-model": "copilot",
|
||||
"github-copilot-acp": "copilot-acp",
|
||||
"copilot-acp-agent": "copilot-acp",
|
||||
"kimi": "kimi-coding",
|
||||
"moonshot": "kimi-coding",
|
||||
"minimax-china": "minimax-cn",
|
||||
@@ -246,7 +283,7 @@ def list_available_providers() -> list[dict[str, str]]:
|
||||
"""
|
||||
# Canonical providers in display order
|
||||
_PROVIDER_ORDER = [
|
||||
"openrouter", "nous", "openai-codex",
|
||||
"openrouter", "nous", "openai-codex", "copilot", "copilot-acp",
|
||||
"zai", "kimi-coding", "minimax", "minimax-cn", "kilocode", "anthropic", "alibaba",
|
||||
"opencode-zen", "opencode-go",
|
||||
"ai-gateway", "deepseek", "custom",
|
||||
@@ -352,6 +389,7 @@ def detect_provider_for_model(
|
||||
Returns ``None`` when no confident match is found.
|
||||
|
||||
Priority:
|
||||
0. Bare provider name → switch to that provider's default model
|
||||
1. Direct provider with credentials (highest)
|
||||
2. Direct provider without credentials → remap to OpenRouter slug
|
||||
3. OpenRouter catalog match
|
||||
@@ -362,6 +400,21 @@ def detect_provider_for_model(
|
||||
|
||||
name_lower = name.lower()
|
||||
|
||||
# --- Step 0: bare provider name typed as model ---
|
||||
# If someone types `/model nous` or `/model anthropic`, treat it as a
|
||||
# provider switch and pick the first model from that provider's catalog.
|
||||
# Skip "custom" and "openrouter" — custom has no model catalog, and
|
||||
# openrouter requires an explicit model name to be useful.
|
||||
resolved_provider = _PROVIDER_ALIASES.get(name_lower, name_lower)
|
||||
if resolved_provider not in {"custom", "openrouter"}:
|
||||
default_models = _PROVIDER_MODELS.get(resolved_provider, [])
|
||||
if (
|
||||
resolved_provider in _PROVIDER_LABELS
|
||||
and default_models
|
||||
and resolved_provider != normalize_provider(current_provider)
|
||||
):
|
||||
return (resolved_provider, default_models[0])
|
||||
|
||||
# Aggregators list other providers' models — never auto-switch TO them
|
||||
_AGGREGATORS = {"nous", "openrouter"}
|
||||
|
||||
@@ -467,6 +520,17 @@ def provider_label(provider: Optional[str]) -> str:
|
||||
return _PROVIDER_LABELS.get(normalized, original or "OpenRouter")
|
||||
|
||||
|
||||
def _resolve_copilot_catalog_api_key() -> str:
|
||||
"""Best-effort GitHub token for fetching the Copilot model catalog."""
|
||||
try:
|
||||
from hermes_cli.auth import resolve_api_key_provider_credentials
|
||||
|
||||
creds = resolve_api_key_provider_credentials("copilot")
|
||||
return str(creds.get("api_key") or "").strip()
|
||||
except Exception:
|
||||
return ""
|
||||
|
||||
|
||||
def provider_model_ids(provider: Optional[str]) -> list[str]:
|
||||
"""Return the best known model catalog for a provider.
|
||||
|
||||
@@ -480,6 +544,15 @@ def provider_model_ids(provider: Optional[str]) -> list[str]:
|
||||
from hermes_cli.codex_models import get_codex_model_ids
|
||||
|
||||
return get_codex_model_ids()
|
||||
if normalized in {"copilot", "copilot-acp"}:
|
||||
try:
|
||||
live = _fetch_github_models(_resolve_copilot_catalog_api_key())
|
||||
if live:
|
||||
return live
|
||||
except Exception:
|
||||
pass
|
||||
if normalized == "copilot-acp":
|
||||
return list(_PROVIDER_MODELS.get("copilot", []))
|
||||
if normalized == "nous":
|
||||
# Try live Nous Portal /models endpoint
|
||||
try:
|
||||
@@ -558,6 +631,306 @@ def _fetch_anthropic_models(timeout: float = 5.0) -> Optional[list[str]]:
|
||||
return None
|
||||
|
||||
|
||||
def _payload_items(payload: Any) -> list[dict[str, Any]]:
|
||||
if isinstance(payload, list):
|
||||
return [item for item in payload if isinstance(item, dict)]
|
||||
if isinstance(payload, dict):
|
||||
data = payload.get("data", [])
|
||||
if isinstance(data, list):
|
||||
return [item for item in data if isinstance(item, dict)]
|
||||
return []
|
||||
|
||||
|
||||
def _extract_model_ids(payload: Any) -> list[str]:
|
||||
return [item.get("id", "") for item in _payload_items(payload) if item.get("id")]
|
||||
|
||||
|
||||
def copilot_default_headers() -> dict[str, str]:
|
||||
"""Standard headers for Copilot API requests.
|
||||
|
||||
Includes Openai-Intent and x-initiator headers that opencode and the
|
||||
Copilot CLI send on every request.
|
||||
"""
|
||||
try:
|
||||
from hermes_cli.copilot_auth import copilot_request_headers
|
||||
return copilot_request_headers(is_agent_turn=True)
|
||||
except ImportError:
|
||||
return {
|
||||
"Editor-Version": COPILOT_EDITOR_VERSION,
|
||||
"User-Agent": "HermesAgent/1.0",
|
||||
"Openai-Intent": "conversation-edits",
|
||||
"x-initiator": "agent",
|
||||
}
|
||||
|
||||
|
||||
def _copilot_catalog_item_is_text_model(item: dict[str, Any]) -> bool:
|
||||
model_id = str(item.get("id") or "").strip()
|
||||
if not model_id:
|
||||
return False
|
||||
|
||||
if item.get("model_picker_enabled") is False:
|
||||
return False
|
||||
|
||||
capabilities = item.get("capabilities")
|
||||
if isinstance(capabilities, dict):
|
||||
model_type = str(capabilities.get("type") or "").strip().lower()
|
||||
if model_type and model_type != "chat":
|
||||
return False
|
||||
|
||||
supported_endpoints = item.get("supported_endpoints")
|
||||
if isinstance(supported_endpoints, list):
|
||||
normalized_endpoints = {
|
||||
str(endpoint).strip()
|
||||
for endpoint in supported_endpoints
|
||||
if str(endpoint).strip()
|
||||
}
|
||||
if normalized_endpoints and not normalized_endpoints.intersection(
|
||||
{"/chat/completions", "/responses", "/v1/messages"}
|
||||
):
|
||||
return False
|
||||
|
||||
return True
|
||||
|
||||
|
||||
def fetch_github_model_catalog(
|
||||
api_key: Optional[str] = None, timeout: float = 5.0
|
||||
) -> Optional[list[dict[str, Any]]]:
|
||||
"""Fetch the live GitHub Copilot model catalog for this account."""
|
||||
attempts: list[dict[str, str]] = []
|
||||
if api_key:
|
||||
attempts.append({
|
||||
**copilot_default_headers(),
|
||||
"Authorization": f"Bearer {api_key}",
|
||||
})
|
||||
attempts.append(copilot_default_headers())
|
||||
|
||||
for headers in attempts:
|
||||
req = urllib.request.Request(COPILOT_MODELS_URL, headers=headers)
|
||||
try:
|
||||
with urllib.request.urlopen(req, timeout=timeout) as resp:
|
||||
data = json.loads(resp.read().decode())
|
||||
items = _payload_items(data)
|
||||
models: list[dict[str, Any]] = []
|
||||
seen_ids: set[str] = set()
|
||||
for item in items:
|
||||
if not _copilot_catalog_item_is_text_model(item):
|
||||
continue
|
||||
model_id = str(item.get("id") or "").strip()
|
||||
if not model_id or model_id in seen_ids:
|
||||
continue
|
||||
seen_ids.add(model_id)
|
||||
models.append(item)
|
||||
if models:
|
||||
return models
|
||||
except Exception:
|
||||
continue
|
||||
return None
|
||||
|
||||
|
||||
def _is_github_models_base_url(base_url: Optional[str]) -> bool:
|
||||
normalized = (base_url or "").strip().rstrip("/").lower()
|
||||
return (
|
||||
normalized.startswith(COPILOT_BASE_URL)
|
||||
or normalized.startswith("https://models.github.ai/inference")
|
||||
)
|
||||
|
||||
|
||||
def _fetch_github_models(api_key: Optional[str] = None, timeout: float = 5.0) -> Optional[list[str]]:
|
||||
catalog = fetch_github_model_catalog(api_key=api_key, timeout=timeout)
|
||||
if not catalog:
|
||||
return None
|
||||
return [item.get("id", "") for item in catalog if item.get("id")]
|
||||
|
||||
|
||||
_COPILOT_MODEL_ALIASES = {
|
||||
"openai/gpt-5": "gpt-5-mini",
|
||||
"openai/gpt-5-chat": "gpt-5-mini",
|
||||
"openai/gpt-5-mini": "gpt-5-mini",
|
||||
"openai/gpt-5-nano": "gpt-5-mini",
|
||||
"openai/gpt-4.1": "gpt-4.1",
|
||||
"openai/gpt-4.1-mini": "gpt-4.1",
|
||||
"openai/gpt-4.1-nano": "gpt-4.1",
|
||||
"openai/gpt-4o": "gpt-4o",
|
||||
"openai/gpt-4o-mini": "gpt-4o-mini",
|
||||
"openai/o1": "gpt-5.2",
|
||||
"openai/o1-mini": "gpt-5-mini",
|
||||
"openai/o1-preview": "gpt-5.2",
|
||||
"openai/o3": "gpt-5.3-codex",
|
||||
"openai/o3-mini": "gpt-5-mini",
|
||||
"openai/o4-mini": "gpt-5-mini",
|
||||
"anthropic/claude-opus-4.6": "claude-opus-4.6",
|
||||
"anthropic/claude-sonnet-4.6": "claude-sonnet-4.6",
|
||||
"anthropic/claude-sonnet-4.5": "claude-sonnet-4.5",
|
||||
"anthropic/claude-haiku-4.5": "claude-haiku-4.5",
|
||||
}
|
||||
|
||||
|
||||
def _copilot_catalog_ids(
|
||||
catalog: Optional[list[dict[str, Any]]] = None,
|
||||
api_key: Optional[str] = None,
|
||||
) -> set[str]:
|
||||
if catalog is None and api_key:
|
||||
catalog = fetch_github_model_catalog(api_key=api_key)
|
||||
if not catalog:
|
||||
return set()
|
||||
return {
|
||||
str(item.get("id") or "").strip()
|
||||
for item in catalog
|
||||
if str(item.get("id") or "").strip()
|
||||
}
|
||||
|
||||
|
||||
def normalize_copilot_model_id(
|
||||
model_id: Optional[str],
|
||||
*,
|
||||
catalog: Optional[list[dict[str, Any]]] = None,
|
||||
api_key: Optional[str] = None,
|
||||
) -> str:
|
||||
raw = str(model_id or "").strip()
|
||||
if not raw:
|
||||
return ""
|
||||
|
||||
catalog_ids = _copilot_catalog_ids(catalog=catalog, api_key=api_key)
|
||||
alias = _COPILOT_MODEL_ALIASES.get(raw)
|
||||
if alias:
|
||||
return alias
|
||||
|
||||
candidates = [raw]
|
||||
if "/" in raw:
|
||||
candidates.append(raw.split("/", 1)[1].strip())
|
||||
|
||||
if raw.endswith("-mini"):
|
||||
candidates.append(raw[:-5])
|
||||
if raw.endswith("-nano"):
|
||||
candidates.append(raw[:-5])
|
||||
if raw.endswith("-chat"):
|
||||
candidates.append(raw[:-5])
|
||||
|
||||
seen: set[str] = set()
|
||||
for candidate in candidates:
|
||||
if not candidate or candidate in seen:
|
||||
continue
|
||||
seen.add(candidate)
|
||||
if candidate in _COPILOT_MODEL_ALIASES:
|
||||
return _COPILOT_MODEL_ALIASES[candidate]
|
||||
if candidate in catalog_ids:
|
||||
return candidate
|
||||
|
||||
if "/" in raw:
|
||||
return raw.split("/", 1)[1].strip()
|
||||
return raw
|
||||
|
||||
|
||||
def _github_reasoning_efforts_for_model_id(model_id: str) -> list[str]:
|
||||
raw = (model_id or "").strip().lower()
|
||||
if raw.startswith(("openai/o1", "openai/o3", "openai/o4", "o1", "o3", "o4")):
|
||||
return list(COPILOT_REASONING_EFFORTS_O_SERIES)
|
||||
normalized = normalize_copilot_model_id(model_id).lower()
|
||||
if normalized.startswith("gpt-5"):
|
||||
return list(COPILOT_REASONING_EFFORTS_GPT5)
|
||||
return []
|
||||
|
||||
|
||||
def _should_use_copilot_responses_api(model_id: str) -> bool:
|
||||
"""Decide whether a Copilot model should use the Responses API.
|
||||
|
||||
Replicates opencode's ``shouldUseCopilotResponsesApi`` logic:
|
||||
GPT-5+ models use Responses API, except ``gpt-5-mini`` which uses
|
||||
Chat Completions. All non-GPT models (Claude, Gemini, etc.) use
|
||||
Chat Completions.
|
||||
"""
|
||||
import re
|
||||
|
||||
match = re.match(r"^gpt-(\d+)", model_id)
|
||||
if not match:
|
||||
return False
|
||||
major = int(match.group(1))
|
||||
return major >= 5 and not model_id.startswith("gpt-5-mini")
|
||||
|
||||
|
||||
def copilot_model_api_mode(
|
||||
model_id: Optional[str],
|
||||
*,
|
||||
catalog: Optional[list[dict[str, Any]]] = None,
|
||||
api_key: Optional[str] = None,
|
||||
) -> str:
|
||||
"""Determine the API mode for a Copilot model.
|
||||
|
||||
Uses the model ID pattern (matching opencode's approach) as the
|
||||
primary signal. Falls back to the catalog's ``supported_endpoints``
|
||||
only for models not covered by the pattern check.
|
||||
"""
|
||||
normalized = normalize_copilot_model_id(model_id, catalog=catalog, api_key=api_key)
|
||||
if not normalized:
|
||||
return "chat_completions"
|
||||
|
||||
# Primary: model ID pattern (matches opencode's shouldUseCopilotResponsesApi)
|
||||
if _should_use_copilot_responses_api(normalized):
|
||||
return "codex_responses"
|
||||
|
||||
# Secondary: check catalog for non-GPT-5 models (Claude via /v1/messages, etc.)
|
||||
if catalog is None and api_key:
|
||||
catalog = fetch_github_model_catalog(api_key=api_key)
|
||||
|
||||
if catalog:
|
||||
catalog_entry = next((item for item in catalog if item.get("id") == normalized), None)
|
||||
if isinstance(catalog_entry, dict):
|
||||
supported_endpoints = {
|
||||
str(endpoint).strip()
|
||||
for endpoint in (catalog_entry.get("supported_endpoints") or [])
|
||||
if str(endpoint).strip()
|
||||
}
|
||||
# For non-GPT-5 models, check if they only support messages API
|
||||
if "/v1/messages" in supported_endpoints and "/chat/completions" not in supported_endpoints:
|
||||
return "anthropic_messages"
|
||||
|
||||
return "chat_completions"
|
||||
|
||||
|
||||
def github_model_reasoning_efforts(
|
||||
model_id: Optional[str],
|
||||
*,
|
||||
catalog: Optional[list[dict[str, Any]]] = None,
|
||||
api_key: Optional[str] = None,
|
||||
) -> list[str]:
|
||||
"""Return supported reasoning-effort levels for a Copilot-visible model."""
|
||||
normalized = normalize_copilot_model_id(model_id, catalog=catalog, api_key=api_key)
|
||||
if not normalized:
|
||||
return []
|
||||
|
||||
catalog_entry = None
|
||||
if catalog is not None:
|
||||
catalog_entry = next((item for item in catalog if item.get("id") == normalized), None)
|
||||
elif api_key:
|
||||
fetched_catalog = fetch_github_model_catalog(api_key=api_key)
|
||||
if fetched_catalog:
|
||||
catalog_entry = next((item for item in fetched_catalog if item.get("id") == normalized), None)
|
||||
|
||||
if catalog_entry is not None:
|
||||
capabilities = catalog_entry.get("capabilities")
|
||||
if isinstance(capabilities, dict):
|
||||
supports = capabilities.get("supports")
|
||||
if isinstance(supports, dict):
|
||||
efforts = supports.get("reasoning_effort")
|
||||
if isinstance(efforts, list):
|
||||
normalized_efforts = [
|
||||
str(effort).strip().lower()
|
||||
for effort in efforts
|
||||
if str(effort).strip()
|
||||
]
|
||||
return list(dict.fromkeys(normalized_efforts))
|
||||
return []
|
||||
legacy_capabilities = {
|
||||
str(capability).strip().lower()
|
||||
for capability in catalog_entry.get("capabilities", [])
|
||||
if str(capability).strip()
|
||||
}
|
||||
if "reasoning" not in legacy_capabilities:
|
||||
return []
|
||||
|
||||
return _github_reasoning_efforts_for_model_id(str(model_id or normalized))
|
||||
|
||||
|
||||
def probe_api_models(
|
||||
api_key: Optional[str],
|
||||
base_url: Optional[str],
|
||||
@@ -574,6 +947,16 @@ def probe_api_models(
|
||||
"used_fallback": False,
|
||||
}
|
||||
|
||||
if _is_github_models_base_url(normalized):
|
||||
models = _fetch_github_models(api_key=api_key, timeout=timeout)
|
||||
return {
|
||||
"models": models,
|
||||
"probed_url": COPILOT_MODELS_URL,
|
||||
"resolved_base_url": COPILOT_BASE_URL,
|
||||
"suggested_base_url": None,
|
||||
"used_fallback": False,
|
||||
}
|
||||
|
||||
if normalized.endswith("/v1"):
|
||||
alternate_base = normalized[:-3].rstrip("/")
|
||||
else:
|
||||
@@ -587,6 +970,8 @@ def probe_api_models(
|
||||
headers: dict[str, str] = {}
|
||||
if api_key:
|
||||
headers["Authorization"] = f"Bearer {api_key}"
|
||||
if normalized.startswith(COPILOT_BASE_URL):
|
||||
headers.update(copilot_default_headers())
|
||||
|
||||
for candidate_base, is_fallback in candidates:
|
||||
url = candidate_base.rstrip("/") + "/models"
|
||||
@@ -677,6 +1062,12 @@ def validate_requested_model(
|
||||
normalized = normalize_provider(provider)
|
||||
if normalized == "openrouter" and base_url and "openrouter.ai" not in base_url:
|
||||
normalized = "custom"
|
||||
requested_for_lookup = requested
|
||||
if normalized == "copilot":
|
||||
requested_for_lookup = normalize_copilot_model_id(
|
||||
requested,
|
||||
api_key=api_key,
|
||||
) or requested
|
||||
|
||||
if not requested:
|
||||
return {
|
||||
@@ -698,7 +1089,7 @@ def validate_requested_model(
|
||||
probe = probe_api_models(api_key, base_url)
|
||||
api_models = probe.get("models")
|
||||
if api_models is not None:
|
||||
if requested in set(api_models):
|
||||
if requested_for_lookup in set(api_models):
|
||||
return {
|
||||
"accepted": True,
|
||||
"persist": True,
|
||||
@@ -747,7 +1138,7 @@ def validate_requested_model(
|
||||
api_models = fetch_api_models(api_key, base_url)
|
||||
|
||||
if api_models is not None:
|
||||
if requested in set(api_models):
|
||||
if requested_for_lookup in set(api_models):
|
||||
# API confirmed the model exists
|
||||
return {
|
||||
"accepted": True,
|
||||
|
||||
+123
-16
@@ -14,6 +14,7 @@ from hermes_cli.auth import (
|
||||
resolve_nous_runtime_credentials,
|
||||
resolve_codex_runtime_credentials,
|
||||
resolve_api_key_provider_credentials,
|
||||
resolve_external_process_provider_credentials,
|
||||
)
|
||||
from hermes_cli.config import load_config
|
||||
from hermes_constants import OPENROUTER_BASE_URL
|
||||
@@ -23,17 +24,76 @@ def _normalize_custom_provider_name(value: str) -> str:
|
||||
return value.strip().lower().replace(" ", "-")
|
||||
|
||||
|
||||
def _detect_api_mode_for_url(base_url: str) -> Optional[str]:
|
||||
"""Auto-detect api_mode from the resolved base URL.
|
||||
|
||||
Direct api.openai.com endpoints need the Responses API for GPT-5.x
|
||||
tool calls with reasoning (chat/completions returns 400).
|
||||
"""
|
||||
normalized = (base_url or "").strip().lower().rstrip("/")
|
||||
if "api.openai.com" in normalized and "openrouter" not in normalized:
|
||||
return "codex_responses"
|
||||
return None
|
||||
|
||||
|
||||
def _auto_detect_local_model(base_url: str) -> str:
|
||||
"""Query a local server for its model name when only one model is loaded."""
|
||||
if not base_url:
|
||||
return ""
|
||||
try:
|
||||
import requests
|
||||
url = base_url.rstrip("/")
|
||||
if not url.endswith("/v1"):
|
||||
url += "/v1"
|
||||
resp = requests.get(url + "/models", timeout=5)
|
||||
if resp.ok:
|
||||
models = resp.json().get("data", [])
|
||||
if len(models) == 1:
|
||||
model_id = models[0].get("id", "")
|
||||
if model_id:
|
||||
return model_id
|
||||
except Exception:
|
||||
pass
|
||||
return ""
|
||||
|
||||
|
||||
def _get_model_config() -> Dict[str, Any]:
|
||||
config = load_config()
|
||||
model_cfg = config.get("model")
|
||||
if isinstance(model_cfg, dict):
|
||||
return dict(model_cfg)
|
||||
cfg = dict(model_cfg)
|
||||
default = cfg.get("default", "").strip()
|
||||
base_url = cfg.get("base_url", "").strip()
|
||||
is_local = "localhost" in base_url or "127.0.0.1" in base_url
|
||||
is_fallback = not default or default == "anthropic/claude-opus-4.6"
|
||||
if is_local and is_fallback and base_url:
|
||||
detected = _auto_detect_local_model(base_url)
|
||||
if detected:
|
||||
cfg["default"] = detected
|
||||
return cfg
|
||||
if isinstance(model_cfg, str) and model_cfg.strip():
|
||||
return {"default": model_cfg.strip()}
|
||||
return {}
|
||||
|
||||
|
||||
_VALID_API_MODES = {"chat_completions", "codex_responses"}
|
||||
def _copilot_runtime_api_mode(model_cfg: Dict[str, Any], api_key: str) -> str:
|
||||
configured_mode = _parse_api_mode(model_cfg.get("api_mode"))
|
||||
if configured_mode:
|
||||
return configured_mode
|
||||
|
||||
model_name = str(model_cfg.get("default") or "").strip()
|
||||
if not model_name:
|
||||
return "chat_completions"
|
||||
|
||||
try:
|
||||
from hermes_cli.models import copilot_model_api_mode
|
||||
|
||||
return copilot_model_api_mode(model_name, api_key=api_key)
|
||||
except Exception:
|
||||
return "chat_completions"
|
||||
|
||||
|
||||
_VALID_API_MODES = {"chat_completions", "codex_responses", "anthropic_messages"}
|
||||
|
||||
|
||||
def _parse_api_mode(raw: Any) -> Optional[str]:
|
||||
@@ -137,7 +197,9 @@ def _resolve_named_custom_runtime(
|
||||
|
||||
return {
|
||||
"provider": "openrouter",
|
||||
"api_mode": custom_provider.get("api_mode", "chat_completions"),
|
||||
"api_mode": custom_provider.get("api_mode")
|
||||
or _detect_api_mode_for_url(base_url)
|
||||
or "chat_completions",
|
||||
"base_url": base_url,
|
||||
"api_key": api_key,
|
||||
"source": f"custom_provider:{custom_provider.get('name', requested_provider)}",
|
||||
@@ -153,6 +215,12 @@ def _resolve_openrouter_runtime(
|
||||
model_cfg = _get_model_config()
|
||||
cfg_base_url = model_cfg.get("base_url") if isinstance(model_cfg.get("base_url"), str) else ""
|
||||
cfg_provider = model_cfg.get("provider") if isinstance(model_cfg.get("provider"), str) else ""
|
||||
cfg_api_key = ""
|
||||
for k in ("api_key", "api"):
|
||||
v = model_cfg.get(k)
|
||||
if isinstance(v, str) and v.strip():
|
||||
cfg_api_key = v.strip()
|
||||
break
|
||||
requested_norm = (requested_provider or "").strip().lower()
|
||||
cfg_provider = cfg_provider.strip().lower()
|
||||
|
||||
@@ -160,26 +228,24 @@ def _resolve_openrouter_runtime(
|
||||
env_openrouter_base_url = os.getenv("OPENROUTER_BASE_URL", "").strip()
|
||||
|
||||
use_config_base_url = False
|
||||
if cfg_base_url.strip() and not explicit_base_url and not env_openai_base_url:
|
||||
if cfg_base_url.strip() and not explicit_base_url:
|
||||
if requested_norm == "auto":
|
||||
if not cfg_provider or cfg_provider == "auto":
|
||||
use_config_base_url = True
|
||||
elif requested_norm == "custom":
|
||||
# Persisted custom endpoints store their base URL in config.yaml.
|
||||
# If OPENAI_BASE_URL is not currently set in the environment, keep
|
||||
# honoring that saved endpoint instead of falling back to OpenRouter.
|
||||
if cfg_provider == "custom":
|
||||
if (not cfg_provider or cfg_provider == "auto") and not env_openai_base_url:
|
||||
use_config_base_url = True
|
||||
elif requested_norm == "custom" and cfg_provider == "custom":
|
||||
# provider: custom — use base_url from config (Fixes #1760).
|
||||
use_config_base_url = True
|
||||
|
||||
# When the user explicitly requested the openrouter provider, skip
|
||||
# OPENAI_BASE_URL — it typically points to a custom / non-OpenRouter
|
||||
# endpoint and would prevent switching back to OpenRouter (#874).
|
||||
skip_openai_base = requested_norm == "openrouter"
|
||||
|
||||
# For custom, prefer config base_url over env so config.yaml is honored (#1760).
|
||||
base_url = (
|
||||
(explicit_base_url or "").strip()
|
||||
or ("" if skip_openai_base else env_openai_base_url)
|
||||
or (cfg_base_url.strip() if use_config_base_url else "")
|
||||
or ("" if skip_openai_base else env_openai_base_url)
|
||||
or env_openrouter_base_url
|
||||
or OPENROUTER_BASE_URL
|
||||
).rstrip("/")
|
||||
@@ -198,8 +264,10 @@ def _resolve_openrouter_runtime(
|
||||
or ""
|
||||
)
|
||||
else:
|
||||
# Custom endpoint: use api_key from config when using config base_url (#1760).
|
||||
api_key = (
|
||||
explicit_api_key
|
||||
or (cfg_api_key if use_config_base_url else "")
|
||||
or os.getenv("OPENAI_API_KEY")
|
||||
or os.getenv("OPENROUTER_API_KEY")
|
||||
or ""
|
||||
@@ -209,7 +277,9 @@ def _resolve_openrouter_runtime(
|
||||
|
||||
return {
|
||||
"provider": "openrouter",
|
||||
"api_mode": _parse_api_mode(model_cfg.get("api_mode")) or "chat_completions",
|
||||
"api_mode": _parse_api_mode(model_cfg.get("api_mode"))
|
||||
or _detect_api_mode_for_url(base_url)
|
||||
or "chat_completions",
|
||||
"base_url": base_url,
|
||||
"api_key": api_key,
|
||||
"source": source,
|
||||
@@ -267,6 +337,19 @@ def resolve_runtime_provider(
|
||||
"requested_provider": requested_provider,
|
||||
}
|
||||
|
||||
if provider == "copilot-acp":
|
||||
creds = resolve_external_process_provider_credentials(provider)
|
||||
return {
|
||||
"provider": "copilot-acp",
|
||||
"api_mode": "chat_completions",
|
||||
"base_url": creds.get("base_url", "").rstrip("/"),
|
||||
"api_key": creds.get("api_key", ""),
|
||||
"command": creds.get("command", ""),
|
||||
"args": list(creds.get("args") or []),
|
||||
"source": creds.get("source", "process"),
|
||||
"requested_provider": requested_provider,
|
||||
}
|
||||
|
||||
# Anthropic (native Messages API)
|
||||
if provider == "anthropic":
|
||||
from agent.anthropic_adapter import resolve_anthropic_token
|
||||
@@ -276,10 +359,14 @@ def resolve_runtime_provider(
|
||||
"No Anthropic credentials found. Set ANTHROPIC_TOKEN or ANTHROPIC_API_KEY, "
|
||||
"run 'claude setup-token', or authenticate with 'claude /login'."
|
||||
)
|
||||
# Allow base URL override from config.yaml model.base_url
|
||||
model_cfg = _get_model_config()
|
||||
cfg_base_url = (model_cfg.get("base_url") or "").strip().rstrip("/")
|
||||
base_url = cfg_base_url or "https://api.anthropic.com"
|
||||
return {
|
||||
"provider": "anthropic",
|
||||
"api_mode": "anthropic_messages",
|
||||
"base_url": "https://api.anthropic.com",
|
||||
"base_url": base_url,
|
||||
"api_key": token,
|
||||
"source": "env",
|
||||
"requested_provider": requested_provider,
|
||||
@@ -302,10 +389,30 @@ def resolve_runtime_provider(
|
||||
pconfig = PROVIDER_REGISTRY.get(provider)
|
||||
if pconfig and pconfig.auth_type == "api_key":
|
||||
creds = resolve_api_key_provider_credentials(provider)
|
||||
model_cfg = _get_model_config()
|
||||
base_url = creds.get("base_url", "").rstrip("/")
|
||||
api_mode = "chat_completions"
|
||||
if provider == "copilot":
|
||||
api_mode = _copilot_runtime_api_mode(model_cfg, creds.get("api_key", ""))
|
||||
else:
|
||||
# Check explicit api_mode from model config first
|
||||
configured_mode = _parse_api_mode(model_cfg.get("api_mode"))
|
||||
if configured_mode:
|
||||
api_mode = configured_mode
|
||||
# Auto-detect Anthropic-compatible endpoints by URL convention
|
||||
# (e.g. https://api.minimax.io/anthropic, https://dashscope.../anthropic)
|
||||
elif base_url.rstrip("/").endswith("/anthropic"):
|
||||
api_mode = "anthropic_messages"
|
||||
# MiniMax providers always use Anthropic Messages API.
|
||||
# Auto-correct stale /v1 URLs (from old .env or config) to /anthropic.
|
||||
elif provider in ("minimax", "minimax-cn"):
|
||||
api_mode = "anthropic_messages"
|
||||
if base_url.rstrip("/").endswith("/v1"):
|
||||
base_url = base_url.rstrip("/")[:-3] + "/anthropic"
|
||||
return {
|
||||
"provider": provider,
|
||||
"api_mode": "chat_completions",
|
||||
"base_url": creds.get("base_url", "").rstrip("/"),
|
||||
"api_mode": api_mode,
|
||||
"base_url": base_url,
|
||||
"api_key": creds.get("api_key", ""),
|
||||
"source": creds.get("source", "env"),
|
||||
"requested_provider": requested_provider,
|
||||
|
||||
+290
-103
@@ -55,6 +55,25 @@ def _set_default_model(config: Dict[str, Any], model_name: str) -> None:
|
||||
# Default model lists per provider — used as fallback when the live
|
||||
# /models endpoint can't be reached.
|
||||
_DEFAULT_PROVIDER_MODELS = {
|
||||
"copilot-acp": [
|
||||
"copilot-acp",
|
||||
],
|
||||
"copilot": [
|
||||
"gpt-5.4",
|
||||
"gpt-5.4-mini",
|
||||
"gpt-5-mini",
|
||||
"gpt-5.3-codex",
|
||||
"gpt-5.2-codex",
|
||||
"gpt-4.1",
|
||||
"gpt-4o",
|
||||
"gpt-4o-mini",
|
||||
"claude-opus-4.6",
|
||||
"claude-sonnet-4.6",
|
||||
"claude-sonnet-4.5",
|
||||
"claude-haiku-4.5",
|
||||
"gemini-2.5-pro",
|
||||
"grok-code-fast-1",
|
||||
],
|
||||
"zai": ["glm-5", "glm-4.7", "glm-4.5", "glm-4.5-flash"],
|
||||
"kimi-coding": ["kimi-k2.5", "kimi-k2-thinking", "kimi-k2-turbo-preview"],
|
||||
"minimax": ["MiniMax-M2.7", "MiniMax-M2.7-highspeed", "MiniMax-M2.5", "MiniMax-M2.5-highspeed", "MiniMax-M2.1"],
|
||||
@@ -64,6 +83,59 @@ _DEFAULT_PROVIDER_MODELS = {
|
||||
}
|
||||
|
||||
|
||||
def _current_reasoning_effort(config: Dict[str, Any]) -> str:
|
||||
agent_cfg = config.get("agent")
|
||||
if isinstance(agent_cfg, dict):
|
||||
return str(agent_cfg.get("reasoning_effort") or "").strip().lower()
|
||||
return ""
|
||||
|
||||
|
||||
def _set_reasoning_effort(config: Dict[str, Any], effort: str) -> None:
|
||||
agent_cfg = config.get("agent")
|
||||
if not isinstance(agent_cfg, dict):
|
||||
agent_cfg = {}
|
||||
config["agent"] = agent_cfg
|
||||
agent_cfg["reasoning_effort"] = effort
|
||||
|
||||
|
||||
def _setup_copilot_reasoning_selection(
|
||||
config: Dict[str, Any],
|
||||
model_id: str,
|
||||
prompt_choice,
|
||||
*,
|
||||
catalog: Optional[list[dict[str, Any]]] = None,
|
||||
api_key: str = "",
|
||||
) -> None:
|
||||
from hermes_cli.models import github_model_reasoning_efforts, normalize_copilot_model_id
|
||||
|
||||
normalized_model = normalize_copilot_model_id(
|
||||
model_id,
|
||||
catalog=catalog,
|
||||
api_key=api_key,
|
||||
) or model_id
|
||||
efforts = github_model_reasoning_efforts(normalized_model, catalog=catalog, api_key=api_key)
|
||||
if not efforts:
|
||||
return
|
||||
|
||||
current_effort = _current_reasoning_effort(config)
|
||||
choices = list(efforts) + ["Disable reasoning", f"Keep current ({current_effort or 'default'})"]
|
||||
|
||||
if current_effort == "none":
|
||||
default_idx = len(efforts)
|
||||
elif current_effort in efforts:
|
||||
default_idx = efforts.index(current_effort)
|
||||
elif "medium" in efforts:
|
||||
default_idx = efforts.index("medium")
|
||||
else:
|
||||
default_idx = len(choices) - 1
|
||||
|
||||
effort_idx = prompt_choice("Select reasoning effort:", choices, default_idx)
|
||||
if effort_idx < len(efforts):
|
||||
_set_reasoning_effort(config, efforts[effort_idx])
|
||||
elif effort_idx == len(efforts):
|
||||
_set_reasoning_effort(config, "none")
|
||||
|
||||
|
||||
def _setup_provider_model_selection(config, provider_id, current_model, prompt_choice, prompt_fn):
|
||||
"""Model selection for API-key providers with live /models detection.
|
||||
|
||||
@@ -71,29 +143,60 @@ def _setup_provider_model_selection(config, provider_id, current_model, prompt_c
|
||||
hardcoded default list with a warning if the endpoint is unreachable.
|
||||
Always offers a 'Custom model' escape hatch.
|
||||
"""
|
||||
from hermes_cli.auth import PROVIDER_REGISTRY
|
||||
from hermes_cli.auth import PROVIDER_REGISTRY, resolve_api_key_provider_credentials
|
||||
from hermes_cli.config import get_env_value
|
||||
from hermes_cli.models import fetch_api_models
|
||||
from hermes_cli.models import (
|
||||
copilot_model_api_mode,
|
||||
fetch_api_models,
|
||||
fetch_github_model_catalog,
|
||||
normalize_copilot_model_id,
|
||||
)
|
||||
|
||||
pconfig = PROVIDER_REGISTRY[provider_id]
|
||||
is_copilot_catalog_provider = provider_id in {"copilot", "copilot-acp"}
|
||||
|
||||
# Resolve API key and base URL for the probe
|
||||
api_key = ""
|
||||
for ev in pconfig.api_key_env_vars:
|
||||
api_key = get_env_value(ev) or os.getenv(ev, "")
|
||||
if api_key:
|
||||
break
|
||||
base_url_env = pconfig.base_url_env_var or ""
|
||||
base_url = (get_env_value(base_url_env) if base_url_env else "") or pconfig.inference_base_url
|
||||
if is_copilot_catalog_provider:
|
||||
api_key = ""
|
||||
if provider_id == "copilot":
|
||||
creds = resolve_api_key_provider_credentials(provider_id)
|
||||
api_key = creds.get("api_key", "")
|
||||
base_url = creds.get("base_url", "") or pconfig.inference_base_url
|
||||
else:
|
||||
try:
|
||||
creds = resolve_api_key_provider_credentials("copilot")
|
||||
api_key = creds.get("api_key", "")
|
||||
except Exception:
|
||||
pass
|
||||
base_url = pconfig.inference_base_url
|
||||
catalog = fetch_github_model_catalog(api_key)
|
||||
current_model = normalize_copilot_model_id(
|
||||
current_model,
|
||||
catalog=catalog,
|
||||
api_key=api_key,
|
||||
) or current_model
|
||||
else:
|
||||
api_key = ""
|
||||
for ev in pconfig.api_key_env_vars:
|
||||
api_key = get_env_value(ev) or os.getenv(ev, "")
|
||||
if api_key:
|
||||
break
|
||||
base_url_env = pconfig.base_url_env_var or ""
|
||||
base_url = (get_env_value(base_url_env) if base_url_env else "") or pconfig.inference_base_url
|
||||
catalog = None
|
||||
|
||||
# Try live /models endpoint
|
||||
live_models = fetch_api_models(api_key, base_url)
|
||||
if is_copilot_catalog_provider and catalog:
|
||||
live_models = [item.get("id", "") for item in catalog if item.get("id")]
|
||||
else:
|
||||
live_models = fetch_api_models(api_key, base_url)
|
||||
|
||||
if live_models:
|
||||
provider_models = live_models
|
||||
print_info(f"Found {len(live_models)} model(s) from {pconfig.name} API")
|
||||
else:
|
||||
provider_models = _DEFAULT_PROVIDER_MODELS.get(provider_id, [])
|
||||
fallback_provider_id = "copilot" if provider_id == "copilot-acp" else provider_id
|
||||
provider_models = _DEFAULT_PROVIDER_MODELS.get(fallback_provider_id, [])
|
||||
if provider_models:
|
||||
print_warning(
|
||||
f"Could not auto-detect models from {pconfig.name} API — showing defaults.\n"
|
||||
@@ -107,12 +210,29 @@ def _setup_provider_model_selection(config, provider_id, current_model, prompt_c
|
||||
keep_idx = len(model_choices) - 1
|
||||
model_idx = prompt_choice("Select default model:", model_choices, keep_idx)
|
||||
|
||||
selected_model = current_model
|
||||
|
||||
if model_idx < len(provider_models):
|
||||
_set_default_model(config, provider_models[model_idx])
|
||||
selected_model = provider_models[model_idx]
|
||||
if is_copilot_catalog_provider:
|
||||
selected_model = normalize_copilot_model_id(
|
||||
selected_model,
|
||||
catalog=catalog,
|
||||
api_key=api_key,
|
||||
) or selected_model
|
||||
_set_default_model(config, selected_model)
|
||||
elif model_idx == len(provider_models):
|
||||
custom = prompt_fn("Enter model name")
|
||||
if custom:
|
||||
_set_default_model(config, custom)
|
||||
if is_copilot_catalog_provider:
|
||||
selected_model = normalize_copilot_model_id(
|
||||
custom,
|
||||
catalog=catalog,
|
||||
api_key=api_key,
|
||||
) or custom
|
||||
else:
|
||||
selected_model = custom
|
||||
_set_default_model(config, selected_model)
|
||||
else:
|
||||
# "Keep current" selected — validate it's compatible with the new
|
||||
# provider. OpenRouter-formatted names (containing "/") won't work
|
||||
@@ -123,8 +243,25 @@ def _setup_provider_model_selection(config, provider_id, current_model, prompt_c
|
||||
f"and won't work with {pconfig.name}. "
|
||||
f"Switching to {provider_models[0]}."
|
||||
)
|
||||
selected_model = provider_models[0]
|
||||
_set_default_model(config, provider_models[0])
|
||||
|
||||
if provider_id == "copilot" and selected_model:
|
||||
model_cfg = _model_config_dict(config)
|
||||
model_cfg["api_mode"] = copilot_model_api_mode(
|
||||
selected_model,
|
||||
catalog=catalog,
|
||||
api_key=api_key,
|
||||
)
|
||||
config["model"] = model_cfg
|
||||
_setup_copilot_reasoning_selection(
|
||||
config,
|
||||
selected_model,
|
||||
prompt_choice,
|
||||
catalog=catalog,
|
||||
api_key=api_key,
|
||||
)
|
||||
|
||||
|
||||
def _sync_model_from_disk(config: Dict[str, Any]) -> None:
|
||||
disk_model = load_config().get("model")
|
||||
@@ -673,6 +810,8 @@ def setup_model_provider(config: dict):
|
||||
resolve_codex_runtime_credentials,
|
||||
DEFAULT_CODEX_BASE_URL,
|
||||
detect_external_credentials,
|
||||
get_auth_status,
|
||||
resolve_api_key_provider_credentials,
|
||||
)
|
||||
|
||||
print_header("Inference Provider")
|
||||
@@ -682,6 +821,8 @@ def setup_model_provider(config: dict):
|
||||
existing_or = get_env_value("OPENROUTER_API_KEY")
|
||||
active_oauth = get_active_provider()
|
||||
existing_custom = get_env_value("OPENAI_BASE_URL")
|
||||
copilot_status = get_auth_status("copilot")
|
||||
copilot_acp_status = get_auth_status("copilot-acp")
|
||||
|
||||
model_cfg = config.get("model") if isinstance(config.get("model"), dict) else {}
|
||||
current_config_provider = str(model_cfg.get("provider") or "").strip().lower() or None
|
||||
@@ -702,7 +843,12 @@ def setup_model_provider(config: dict):
|
||||
|
||||
# Detect if any provider is already configured
|
||||
has_any_provider = bool(
|
||||
current_config_provider or active_oauth or existing_custom or existing_or
|
||||
current_config_provider
|
||||
or active_oauth
|
||||
or existing_custom
|
||||
or existing_or
|
||||
or copilot_status.get("logged_in")
|
||||
or copilot_acp_status.get("logged_in")
|
||||
)
|
||||
|
||||
# Build "keep current" label
|
||||
@@ -741,6 +887,8 @@ def setup_model_provider(config: dict):
|
||||
"Alibaba Cloud / DashScope (Qwen models via Anthropic-compatible API)",
|
||||
"OpenCode Zen (35+ curated models, pay-as-you-go)",
|
||||
"OpenCode Go (open models, $10/month subscription)",
|
||||
"GitHub Copilot (uses GITHUB_TOKEN or gh auth token)",
|
||||
"GitHub Copilot ACP (spawns `copilot --acp --stdio`)",
|
||||
]
|
||||
if keep_label:
|
||||
provider_choices.append(keep_label)
|
||||
@@ -897,93 +1045,17 @@ def setup_model_provider(config: dict):
|
||||
print()
|
||||
print_header("Custom OpenAI-Compatible Endpoint")
|
||||
print_info("Works with any API that follows OpenAI's chat completions spec")
|
||||
print()
|
||||
|
||||
current_url = get_env_value("OPENAI_BASE_URL") or ""
|
||||
current_key = get_env_value("OPENAI_API_KEY")
|
||||
_raw_model = config.get("model", "")
|
||||
current_model = (
|
||||
_raw_model.get("default", "")
|
||||
if isinstance(_raw_model, dict)
|
||||
else (_raw_model or "")
|
||||
)
|
||||
|
||||
if current_url:
|
||||
print_info(f" Current URL: {current_url}")
|
||||
if current_key:
|
||||
print_info(f" Current key: {current_key[:8]}... (configured)")
|
||||
|
||||
base_url = prompt(
|
||||
" API base URL (e.g., https://api.example.com/v1)", current_url
|
||||
).strip()
|
||||
api_key = prompt(" API key", password=True)
|
||||
model_name = prompt(" Model name (e.g., gpt-4, claude-3-opus)", current_model)
|
||||
|
||||
if base_url:
|
||||
from hermes_cli.models import probe_api_models
|
||||
|
||||
probe = probe_api_models(api_key, base_url)
|
||||
if probe.get("used_fallback") and probe.get("resolved_base_url"):
|
||||
print_warning(
|
||||
f"Endpoint verification worked at {probe['resolved_base_url']}/models, "
|
||||
f"not the exact URL you entered. Saving the working base URL instead."
|
||||
)
|
||||
base_url = probe["resolved_base_url"]
|
||||
elif probe.get("models") is not None:
|
||||
print_success(
|
||||
f"Verified endpoint via {probe.get('probed_url')} "
|
||||
f"({len(probe.get('models') or [])} model(s) visible)"
|
||||
)
|
||||
else:
|
||||
print_warning(
|
||||
f"Could not verify this endpoint via {probe.get('probed_url')}. "
|
||||
f"Hermes will still save it."
|
||||
)
|
||||
if probe.get("suggested_base_url"):
|
||||
print_info(
|
||||
f" If this server expects /v1, try base URL: {probe['suggested_base_url']}"
|
||||
)
|
||||
|
||||
save_env_value("OPENAI_BASE_URL", base_url)
|
||||
if api_key:
|
||||
save_env_value("OPENAI_API_KEY", api_key)
|
||||
if model_name:
|
||||
_set_default_model(config, model_name)
|
||||
|
||||
try:
|
||||
from hermes_cli.auth import deactivate_provider
|
||||
|
||||
deactivate_provider()
|
||||
except Exception:
|
||||
pass
|
||||
|
||||
# Save provider and base_url to config.yaml so the gateway and CLI
|
||||
# both resolve the correct provider without relying on env-var heuristics.
|
||||
if base_url:
|
||||
import yaml
|
||||
|
||||
config_path = (
|
||||
Path(os.environ.get("HERMES_HOME", Path.home() / ".hermes"))
|
||||
/ "config.yaml"
|
||||
)
|
||||
try:
|
||||
disk_cfg = {}
|
||||
if config_path.exists():
|
||||
disk_cfg = yaml.safe_load(config_path.read_text()) or {}
|
||||
model_section = disk_cfg.get("model", {})
|
||||
if isinstance(model_section, str):
|
||||
model_section = {"default": model_section}
|
||||
model_section["provider"] = "custom"
|
||||
model_section["base_url"] = base_url.rstrip("/")
|
||||
if model_name:
|
||||
model_section["default"] = model_name
|
||||
disk_cfg["model"] = model_section
|
||||
config_path.write_text(yaml.safe_dump(disk_cfg, sort_keys=False))
|
||||
except Exception as e:
|
||||
logger.debug("Could not save provider to config.yaml: %s", e)
|
||||
|
||||
_set_model_provider(config, "custom", base_url)
|
||||
|
||||
print_success("Custom endpoint configured")
|
||||
# Reuse the shared custom endpoint flow from `hermes model`.
|
||||
# This handles: URL/key/model/context-length prompts, endpoint probing,
|
||||
# env saving, config.yaml updates, and custom_providers persistence.
|
||||
from hermes_cli.main import _model_flow_custom
|
||||
_model_flow_custom(config)
|
||||
# _model_flow_custom handles model selection, config, env vars,
|
||||
# and custom_providers. Keep selected_provider = "custom" so
|
||||
# the model selection step below is skipped (line 1631 check)
|
||||
# but vision and TTS setup still run.
|
||||
|
||||
elif provider_idx == 4: # Z.AI / GLM
|
||||
selected_provider = "zai"
|
||||
@@ -1412,7 +1484,56 @@ def setup_model_provider(config: dict):
|
||||
_set_model_provider(config, "opencode-go", pconfig.inference_base_url)
|
||||
selected_base_url = pconfig.inference_base_url
|
||||
|
||||
# else: provider_idx == 14 (Keep current) — only shown when a provider already exists
|
||||
elif provider_idx == 14: # GitHub Copilot
|
||||
selected_provider = "copilot"
|
||||
print()
|
||||
print_header("GitHub Copilot")
|
||||
pconfig = PROVIDER_REGISTRY["copilot"]
|
||||
print_info("Hermes can use GITHUB_TOKEN, GH_TOKEN, or your gh CLI login.")
|
||||
print_info(f"Base URL: {pconfig.inference_base_url}")
|
||||
print()
|
||||
|
||||
copilot_creds = resolve_api_key_provider_credentials("copilot")
|
||||
source = copilot_creds.get("source", "")
|
||||
token = copilot_creds.get("api_key", "")
|
||||
if token:
|
||||
if source in ("GITHUB_TOKEN", "GH_TOKEN"):
|
||||
print_info(f"Current: {token[:8]}... ({source})")
|
||||
elif source == "gh auth token":
|
||||
print_info("Current: authenticated via `gh auth token`")
|
||||
else:
|
||||
print_info("Current: GitHub token configured")
|
||||
else:
|
||||
api_key = prompt(" GitHub token", password=True)
|
||||
if api_key:
|
||||
save_env_value("GITHUB_TOKEN", api_key)
|
||||
print_success("GitHub token saved")
|
||||
else:
|
||||
print_warning("Skipped - agent won't work without a GitHub token or gh auth login")
|
||||
|
||||
if existing_custom:
|
||||
save_env_value("OPENAI_BASE_URL", "")
|
||||
save_env_value("OPENAI_API_KEY", "")
|
||||
_set_model_provider(config, "copilot", pconfig.inference_base_url)
|
||||
selected_base_url = pconfig.inference_base_url
|
||||
|
||||
elif provider_idx == 15: # GitHub Copilot ACP
|
||||
selected_provider = "copilot-acp"
|
||||
print()
|
||||
print_header("GitHub Copilot ACP")
|
||||
pconfig = PROVIDER_REGISTRY["copilot-acp"]
|
||||
print_info("Hermes will start `copilot --acp --stdio` for each request.")
|
||||
print_info("Use HERMES_COPILOT_ACP_COMMAND or COPILOT_CLI_PATH to override the command.")
|
||||
print_info(f"Base marker: {pconfig.inference_base_url}")
|
||||
print()
|
||||
|
||||
if existing_custom:
|
||||
save_env_value("OPENAI_BASE_URL", "")
|
||||
save_env_value("OPENAI_API_KEY", "")
|
||||
_set_model_provider(config, "copilot-acp", pconfig.inference_base_url)
|
||||
selected_base_url = pconfig.inference_base_url
|
||||
|
||||
# else: provider_idx == 16 (Keep current) — only shown when a provider already exists
|
||||
# Normalize "keep current" to an explicit provider so downstream logic
|
||||
# doesn't fall back to the generic OpenRouter/static-model path.
|
||||
if selected_provider is None:
|
||||
@@ -1444,6 +1565,8 @@ def setup_model_provider(config: dict):
|
||||
if _vision_needs_setup:
|
||||
_prov_names = {
|
||||
"nous-api": "Nous Portal API key",
|
||||
"copilot": "GitHub Copilot",
|
||||
"copilot-acp": "GitHub Copilot ACP",
|
||||
"zai": "Z.AI / GLM",
|
||||
"kimi-coding": "Kimi / Moonshot",
|
||||
"minimax": "MiniMax",
|
||||
@@ -1583,7 +1706,15 @@ def setup_model_provider(config: dict):
|
||||
_set_default_model(config, custom)
|
||||
_update_config_for_provider("openai-codex", DEFAULT_CODEX_BASE_URL)
|
||||
_set_model_provider(config, "openai-codex", DEFAULT_CODEX_BASE_URL)
|
||||
elif selected_provider in ("zai", "kimi-coding", "minimax", "minimax-cn", "kilocode", "ai-gateway"):
|
||||
elif selected_provider == "copilot-acp":
|
||||
_setup_provider_model_selection(
|
||||
config, selected_provider, current_model,
|
||||
prompt_choice, prompt,
|
||||
)
|
||||
model_cfg = _model_config_dict(config)
|
||||
model_cfg["api_mode"] = "chat_completions"
|
||||
config["model"] = model_cfg
|
||||
elif selected_provider in ("copilot", "zai", "kimi-coding", "minimax", "minimax-cn", "kilocode", "ai-gateway"):
|
||||
_setup_provider_model_selection(
|
||||
config, selected_provider, current_model,
|
||||
prompt_choice, prompt,
|
||||
@@ -1644,7 +1775,7 @@ def setup_model_provider(config: dict):
|
||||
# Write provider+base_url to config.yaml only after model selection is complete.
|
||||
# This prevents a race condition where the gateway picks up a new provider
|
||||
# before the model name has been updated to match.
|
||||
if selected_provider in ("zai", "kimi-coding", "minimax", "minimax-cn", "kilocode", "anthropic") and selected_base_url is not None:
|
||||
if selected_provider in ("copilot-acp", "copilot", "zai", "kimi-coding", "minimax", "minimax-cn", "kilocode", "anthropic") and selected_base_url is not None:
|
||||
_update_config_for_provider(selected_provider, selected_base_url)
|
||||
|
||||
save_config(config)
|
||||
@@ -2644,6 +2775,61 @@ def setup_gateway(config: dict):
|
||||
print_info("Run 'hermes whatsapp' to choose your mode (separate bot number")
|
||||
print_info("or personal self-chat) and pair via QR code.")
|
||||
|
||||
# ── Webhooks ──
|
||||
existing_webhook = get_env_value("WEBHOOK_ENABLED")
|
||||
if existing_webhook:
|
||||
print_info("Webhooks: already configured")
|
||||
if prompt_yes_no("Reconfigure webhooks?", False):
|
||||
existing_webhook = None
|
||||
|
||||
if not existing_webhook and prompt_yes_no("Set up webhooks? (GitHub, GitLab, etc.)", False):
|
||||
print()
|
||||
print_warning(
|
||||
"⚠ Webhook and SMS platforms require exposing gateway ports to the"
|
||||
)
|
||||
print_warning(
|
||||
" internet. For security, run the gateway in a sandboxed environment"
|
||||
)
|
||||
print_warning(
|
||||
" (Docker, VM, etc.) to limit blast radius from prompt injection."
|
||||
)
|
||||
print()
|
||||
print_info(
|
||||
" Full guide: https://hermes-agent.nousresearch.com/docs/user-guide/messaging/webhooks/"
|
||||
)
|
||||
print()
|
||||
|
||||
port = prompt("Webhook port (default 8644)")
|
||||
if port:
|
||||
try:
|
||||
save_env_value("WEBHOOK_PORT", str(int(port)))
|
||||
print_success(f"Webhook port set to {port}")
|
||||
except ValueError:
|
||||
print_warning("Invalid port number, using default 8644")
|
||||
|
||||
secret = prompt("Global HMAC secret (shared across all routes)", password=True)
|
||||
if secret:
|
||||
save_env_value("WEBHOOK_SECRET", secret)
|
||||
print_success("Webhook secret saved")
|
||||
else:
|
||||
print_warning("No secret set — you must configure per-route secrets in config.yaml")
|
||||
|
||||
save_env_value("WEBHOOK_ENABLED", "true")
|
||||
print()
|
||||
print_success("Webhooks enabled! Next steps:")
|
||||
print_info(" 1. Define webhook routes in ~/.hermes/config.yaml")
|
||||
print_info(" 2. Point your service (GitHub, GitLab, etc.) at:")
|
||||
print_info(" http://your-server:8644/webhooks/<route-name>")
|
||||
print()
|
||||
print_info(
|
||||
" Route configuration guide:"
|
||||
)
|
||||
print_info(
|
||||
" https://hermes-agent.nousresearch.com/docs/user-guide/messaging/webhooks/#configuring-routes"
|
||||
)
|
||||
print()
|
||||
print_info(" Open config in your editor: hermes config edit")
|
||||
|
||||
# ── Gateway Service Setup ──
|
||||
any_messaging = (
|
||||
get_env_value("TELEGRAM_BOT_TOKEN")
|
||||
@@ -2653,6 +2839,7 @@ def setup_gateway(config: dict):
|
||||
or get_env_value("MATRIX_ACCESS_TOKEN")
|
||||
or get_env_value("MATRIX_PASSWORD")
|
||||
or get_env_value("WHATSAPP_ENABLED")
|
||||
or get_env_value("WEBHOOK_ENABLED")
|
||||
)
|
||||
if any_messaging:
|
||||
print()
|
||||
|
||||
+5
-1
@@ -181,7 +181,11 @@ class SessionDB:
|
||||
]
|
||||
for name, column_type in new_columns:
|
||||
try:
|
||||
cursor.execute(f"ALTER TABLE sessions ADD COLUMN {name} {column_type}")
|
||||
# name and column_type come from the hardcoded tuple above,
|
||||
# not user input. Double-quote identifier escaping is applied
|
||||
# as defense-in-depth; SQLite DDL cannot be parameterized.
|
||||
safe_name = name.replace('"', '""')
|
||||
cursor.execute(f'ALTER TABLE sessions ADD COLUMN "{safe_name}" {column_type}')
|
||||
except sqlite3.OperationalError:
|
||||
pass
|
||||
cursor.execute("UPDATE schema_version SET version = 5")
|
||||
|
||||
@@ -117,11 +117,13 @@ class HonchoClientConfig:
|
||||
def from_env(cls, workspace_id: str = "hermes") -> HonchoClientConfig:
|
||||
"""Create config from environment variables (fallback)."""
|
||||
api_key = os.environ.get("HONCHO_API_KEY")
|
||||
base_url = os.environ.get("HONCHO_BASE_URL", "").strip() or None
|
||||
return cls(
|
||||
workspace_id=workspace_id,
|
||||
api_key=api_key,
|
||||
environment=os.environ.get("HONCHO_ENVIRONMENT", "production"),
|
||||
enabled=bool(api_key),
|
||||
base_url=base_url,
|
||||
enabled=bool(api_key or base_url),
|
||||
)
|
||||
|
||||
@classmethod
|
||||
@@ -171,8 +173,14 @@ class HonchoClientConfig:
|
||||
or raw.get("environment", "production")
|
||||
)
|
||||
|
||||
# Auto-enable when API key is present (unless explicitly disabled)
|
||||
# Host-level enabled wins, then root-level, then auto-enable if key exists.
|
||||
base_url = (
|
||||
raw.get("baseUrl")
|
||||
or os.environ.get("HONCHO_BASE_URL", "").strip()
|
||||
or None
|
||||
)
|
||||
|
||||
# Auto-enable when API key or base_url is present (unless explicitly disabled)
|
||||
# Host-level enabled wins, then root-level, then auto-enable if key/url exists.
|
||||
host_enabled = host_block.get("enabled")
|
||||
root_enabled = raw.get("enabled")
|
||||
if host_enabled is not None:
|
||||
@@ -180,8 +188,8 @@ class HonchoClientConfig:
|
||||
elif root_enabled is not None:
|
||||
enabled = root_enabled
|
||||
else:
|
||||
# Not explicitly set anywhere -> auto-enable if API key exists
|
||||
enabled = bool(api_key)
|
||||
# Not explicitly set anywhere -> auto-enable if API key or base_url exists
|
||||
enabled = bool(api_key or base_url)
|
||||
|
||||
# write_frequency: accept int or string
|
||||
raw_wf = (
|
||||
@@ -214,6 +222,7 @@ class HonchoClientConfig:
|
||||
workspace_id=workspace,
|
||||
api_key=api_key,
|
||||
environment=environment,
|
||||
base_url=base_url,
|
||||
peer_name=host_block.get("peerName") or raw.get("peerName"),
|
||||
ai_peer=ai_peer,
|
||||
linked_hosts=linked_hosts,
|
||||
@@ -348,11 +357,12 @@ def get_honcho_client(config: HonchoClientConfig | None = None) -> Honcho:
|
||||
if config is None:
|
||||
config = HonchoClientConfig.from_global_config()
|
||||
|
||||
if not config.api_key:
|
||||
if not config.api_key and not config.base_url:
|
||||
raise ValueError(
|
||||
"Honcho API key not found. "
|
||||
"Get your API key at https://app.honcho.dev, "
|
||||
"then run 'hermes honcho setup' or set HONCHO_API_KEY."
|
||||
"then run 'hermes honcho setup' or set HONCHO_API_KEY. "
|
||||
"For local instances, set HONCHO_BASE_URL instead."
|
||||
)
|
||||
|
||||
try:
|
||||
|
||||
+96
-5
@@ -24,6 +24,7 @@ import json
|
||||
import asyncio
|
||||
import os
|
||||
import logging
|
||||
import threading
|
||||
from typing import Dict, Any, List, Optional, Tuple
|
||||
|
||||
from tools.registry import registry
|
||||
@@ -36,6 +37,48 @@ logger = logging.getLogger(__name__)
|
||||
# Async Bridging (single source of truth -- used by registry.dispatch too)
|
||||
# =============================================================================
|
||||
|
||||
_tool_loop = None # persistent loop for the main (CLI) thread
|
||||
_tool_loop_lock = threading.Lock()
|
||||
_worker_thread_local = threading.local() # per-worker-thread persistent loops
|
||||
|
||||
|
||||
def _get_tool_loop():
|
||||
"""Return a long-lived event loop for running async tool handlers.
|
||||
|
||||
Using a persistent loop (instead of asyncio.run() which creates and
|
||||
*closes* a fresh loop every time) prevents "Event loop is closed"
|
||||
errors that occur when cached httpx/AsyncOpenAI clients attempt to
|
||||
close their transport on a dead loop during garbage collection.
|
||||
"""
|
||||
global _tool_loop
|
||||
with _tool_loop_lock:
|
||||
if _tool_loop is None or _tool_loop.is_closed():
|
||||
_tool_loop = asyncio.new_event_loop()
|
||||
return _tool_loop
|
||||
|
||||
|
||||
def _get_worker_loop():
|
||||
"""Return a persistent event loop for the current worker thread.
|
||||
|
||||
Each worker thread (e.g., delegate_task's ThreadPoolExecutor threads)
|
||||
gets its own long-lived loop stored in thread-local storage. This
|
||||
prevents the "Event loop is closed" errors that occurred when
|
||||
asyncio.run() was used per-call: asyncio.run() creates a loop, runs
|
||||
the coroutine, then *closes* the loop — but cached httpx/AsyncOpenAI
|
||||
clients remain bound to that now-dead loop and raise RuntimeError
|
||||
during garbage collection or subsequent use.
|
||||
|
||||
By keeping the loop alive for the thread's lifetime, cached clients
|
||||
stay valid and their cleanup runs on a live loop.
|
||||
"""
|
||||
loop = getattr(_worker_thread_local, 'loop', None)
|
||||
if loop is None or loop.is_closed():
|
||||
loop = asyncio.new_event_loop()
|
||||
asyncio.set_event_loop(loop)
|
||||
_worker_thread_local.loop = loop
|
||||
return loop
|
||||
|
||||
|
||||
def _run_async(coro):
|
||||
"""Run an async coroutine from a sync context.
|
||||
|
||||
@@ -44,6 +87,15 @@ def _run_async(coro):
|
||||
disposable thread so asyncio.run() can create its own loop without
|
||||
conflicting.
|
||||
|
||||
For the common CLI path (no running loop), we use a persistent event
|
||||
loop so that cached async clients (httpx / AsyncOpenAI) remain bound
|
||||
to a live loop and don't trigger "Event loop is closed" on GC.
|
||||
|
||||
When called from a worker thread (parallel tool execution), we use a
|
||||
per-thread persistent loop to avoid both contention with the main
|
||||
thread's shared loop AND the "Event loop is closed" errors caused by
|
||||
asyncio.run()'s create-and-destroy lifecycle.
|
||||
|
||||
This is the single source of truth for sync->async bridging in tool
|
||||
handlers. The RL paths (agent_loop.py, tool_context.py) also provide
|
||||
outer thread-pool wrapping as defense-in-depth, but each handler is
|
||||
@@ -55,11 +107,23 @@ def _run_async(coro):
|
||||
loop = None
|
||||
|
||||
if loop and loop.is_running():
|
||||
# Inside an async context (gateway, RL env) — run in a fresh thread.
|
||||
import concurrent.futures
|
||||
with concurrent.futures.ThreadPoolExecutor(max_workers=1) as pool:
|
||||
future = pool.submit(asyncio.run, coro)
|
||||
return future.result(timeout=300)
|
||||
return asyncio.run(coro)
|
||||
|
||||
# If we're on a worker thread (e.g., parallel tool execution in
|
||||
# delegate_task), use a per-thread persistent loop. This avoids
|
||||
# contention with the main thread's shared loop while keeping cached
|
||||
# httpx/AsyncOpenAI clients bound to a live loop for the thread's
|
||||
# lifetime — preventing "Event loop is closed" on GC cleanup.
|
||||
if threading.current_thread() is not threading.main_thread():
|
||||
worker_loop = _get_worker_loop()
|
||||
return worker_loop.run_until_complete(coro)
|
||||
|
||||
tool_loop = _get_tool_loop()
|
||||
return tool_loop.run_until_complete(coro)
|
||||
|
||||
|
||||
# =============================================================================
|
||||
@@ -242,18 +306,45 @@ def get_tool_definitions(
|
||||
# Ask the registry for schemas (only returns tools whose check_fn passes)
|
||||
filtered_tools = registry.get_definitions(tools_to_include, quiet=quiet_mode)
|
||||
|
||||
# The set of tool names that actually passed check_fn filtering.
|
||||
# Use this (not tools_to_include) for any downstream schema that references
|
||||
# other tools by name — otherwise the model sees tools mentioned in
|
||||
# descriptions that don't actually exist, and hallucinates calls to them.
|
||||
available_tool_names = {t["function"]["name"] for t in filtered_tools}
|
||||
|
||||
# Rebuild execute_code schema to only list sandbox tools that are actually
|
||||
# enabled. Without this, the model sees "web_search is available in
|
||||
# execute_code" even when the user disabled the web toolset (#560-discord).
|
||||
if "execute_code" in tools_to_include:
|
||||
# available. Without this, the model sees "web_search is available in
|
||||
# execute_code" even when the API key isn't configured or the toolset is
|
||||
# disabled (#560-discord).
|
||||
if "execute_code" in available_tool_names:
|
||||
from tools.code_execution_tool import SANDBOX_ALLOWED_TOOLS, build_execute_code_schema
|
||||
sandbox_enabled = SANDBOX_ALLOWED_TOOLS & tools_to_include
|
||||
sandbox_enabled = SANDBOX_ALLOWED_TOOLS & available_tool_names
|
||||
dynamic_schema = build_execute_code_schema(sandbox_enabled)
|
||||
for i, td in enumerate(filtered_tools):
|
||||
if td.get("function", {}).get("name") == "execute_code":
|
||||
filtered_tools[i] = {"type": "function", "function": dynamic_schema}
|
||||
break
|
||||
|
||||
# Strip web tool cross-references from browser_navigate description when
|
||||
# web_search / web_extract are not available. The static schema says
|
||||
# "prefer web_search or web_extract" which causes the model to hallucinate
|
||||
# those tools when they're missing.
|
||||
if "browser_navigate" in available_tool_names:
|
||||
web_tools_available = {"web_search", "web_extract"} & available_tool_names
|
||||
if not web_tools_available:
|
||||
for i, td in enumerate(filtered_tools):
|
||||
if td.get("function", {}).get("name") == "browser_navigate":
|
||||
desc = td["function"].get("description", "")
|
||||
desc = desc.replace(
|
||||
" For simple information retrieval, prefer web_search or web_extract (faster, cheaper).",
|
||||
"",
|
||||
)
|
||||
filtered_tools[i] = {
|
||||
"type": "function",
|
||||
"function": {**td["function"], "description": desc},
|
||||
}
|
||||
break
|
||||
|
||||
if not quiet_mode:
|
||||
if filtered_tools:
|
||||
tool_names = [t["function"]["name"] for t in filtered_tools]
|
||||
|
||||
@@ -0,0 +1,3 @@
|
||||
# MCP
|
||||
|
||||
Skills for building, testing, and deploying MCP (Model Context Protocol) servers.
|
||||
@@ -0,0 +1,299 @@
|
||||
---
|
||||
name: fastmcp
|
||||
description: Build, test, inspect, install, and deploy MCP servers with FastMCP in Python. Use when creating a new MCP server, wrapping an API or database as MCP tools, exposing resources or prompts, or preparing a FastMCP server for Claude Code, Cursor, or HTTP deployment.
|
||||
version: 1.0.0
|
||||
author: Hermes Agent
|
||||
license: MIT
|
||||
metadata:
|
||||
hermes:
|
||||
tags: [MCP, FastMCP, Python, Tools, Resources, Prompts, Deployment]
|
||||
homepage: https://gofastmcp.com
|
||||
related_skills: [native-mcp, mcporter]
|
||||
prerequisites:
|
||||
commands: [python3]
|
||||
---
|
||||
|
||||
# FastMCP
|
||||
|
||||
Build MCP servers in Python with FastMCP, validate them locally, install them into MCP clients, and deploy them as HTTP endpoints.
|
||||
|
||||
## When to Use
|
||||
|
||||
Use this skill when the task is to:
|
||||
|
||||
- create a new MCP server in Python
|
||||
- wrap an API, database, CLI, or file-processing workflow as MCP tools
|
||||
- expose resources or prompts in addition to tools
|
||||
- smoke-test a server with the FastMCP CLI before wiring it into Hermes or another client
|
||||
- install a server into Claude Code, Claude Desktop, Cursor, or a similar MCP client
|
||||
- prepare a FastMCP server repo for HTTP deployment
|
||||
|
||||
Use `native-mcp` when the server already exists and only needs to be connected to Hermes. Use `mcporter` when the goal is ad-hoc CLI access to an existing MCP server instead of building one.
|
||||
|
||||
## Prerequisites
|
||||
|
||||
Install FastMCP in the working environment first:
|
||||
|
||||
```bash
|
||||
pip install fastmcp
|
||||
fastmcp version
|
||||
```
|
||||
|
||||
For the API template, install `httpx` if it is not already present:
|
||||
|
||||
```bash
|
||||
pip install httpx
|
||||
```
|
||||
|
||||
## Included Files
|
||||
|
||||
### Templates
|
||||
|
||||
- `templates/api_wrapper.py` - REST API wrapper with auth header support
|
||||
- `templates/database_server.py` - read-only SQLite query server
|
||||
- `templates/file_processor.py` - text-file inspection and search server
|
||||
|
||||
### Scripts
|
||||
|
||||
- `scripts/scaffold_fastmcp.py` - copy a starter template and replace the server name placeholder
|
||||
|
||||
### References
|
||||
|
||||
- `references/fastmcp-cli.md` - FastMCP CLI workflow, installation targets, and deployment checks
|
||||
|
||||
## Workflow
|
||||
|
||||
### 1. Pick the Smallest Viable Server Shape
|
||||
|
||||
Choose the narrowest useful surface area first:
|
||||
|
||||
- API wrapper: start with 1-3 high-value endpoints, not the whole API
|
||||
- database server: expose read-only introspection and a constrained query path
|
||||
- file processor: expose deterministic operations with explicit path arguments
|
||||
- prompts/resources: add only when the client needs reusable prompt templates or discoverable documents
|
||||
|
||||
Prefer a thin server with good names, docstrings, and schemas over a large server with vague tools.
|
||||
|
||||
### 2. Scaffold from a Template
|
||||
|
||||
Copy a template directly or use the scaffold helper:
|
||||
|
||||
```bash
|
||||
python ~/.hermes/skills/mcp/fastmcp/scripts/scaffold_fastmcp.py \
|
||||
--template api_wrapper \
|
||||
--name "Acme API" \
|
||||
--output ./acme_server.py
|
||||
```
|
||||
|
||||
Available templates:
|
||||
|
||||
```bash
|
||||
python ~/.hermes/skills/mcp/fastmcp/scripts/scaffold_fastmcp.py --list
|
||||
```
|
||||
|
||||
If copying manually, replace `__SERVER_NAME__` with a real server name.
|
||||
|
||||
### 3. Implement Tools First
|
||||
|
||||
Start with `@mcp.tool` functions before adding resources or prompts.
|
||||
|
||||
Rules for tool design:
|
||||
|
||||
- Give every tool a concrete verb-based name
|
||||
- Write docstrings as user-facing tool descriptions
|
||||
- Keep parameters explicit and typed
|
||||
- Return structured JSON-safe data where possible
|
||||
- Validate unsafe inputs early
|
||||
- Prefer read-only behavior by default for first versions
|
||||
|
||||
Good tool examples:
|
||||
|
||||
- `get_customer`
|
||||
- `search_tickets`
|
||||
- `describe_table`
|
||||
- `summarize_text_file`
|
||||
|
||||
Weak tool examples:
|
||||
|
||||
- `run`
|
||||
- `process`
|
||||
- `do_thing`
|
||||
|
||||
### 4. Add Resources and Prompts Only When They Help
|
||||
|
||||
Add `@mcp.resource` when the client benefits from fetching stable read-only content such as schemas, policy docs, or generated reports.
|
||||
|
||||
Add `@mcp.prompt` when the server should provide a reusable prompt template for a known workflow.
|
||||
|
||||
Do not turn every document into a prompt. Prefer:
|
||||
|
||||
- tools for actions
|
||||
- resources for data/document retrieval
|
||||
- prompts for reusable LLM instructions
|
||||
|
||||
### 5. Test the Server Before Integrating It Anywhere
|
||||
|
||||
Use the FastMCP CLI for local validation:
|
||||
|
||||
```bash
|
||||
fastmcp inspect acme_server.py:mcp
|
||||
fastmcp list acme_server.py --json
|
||||
fastmcp call acme_server.py search_resources query=router limit=5 --json
|
||||
```
|
||||
|
||||
For fast iterative debugging, run the server locally:
|
||||
|
||||
```bash
|
||||
fastmcp run acme_server.py:mcp
|
||||
```
|
||||
|
||||
To test HTTP transport locally:
|
||||
|
||||
```bash
|
||||
fastmcp run acme_server.py:mcp --transport http --host 127.0.0.1 --port 8000
|
||||
fastmcp list http://127.0.0.1:8000/mcp --json
|
||||
fastmcp call http://127.0.0.1:8000/mcp search_resources query=router --json
|
||||
```
|
||||
|
||||
Always run at least one real `fastmcp call` against each new tool before claiming the server works.
|
||||
|
||||
### 6. Install into a Client When Local Validation Passes
|
||||
|
||||
FastMCP can register the server with supported MCP clients:
|
||||
|
||||
```bash
|
||||
fastmcp install claude-code acme_server.py
|
||||
fastmcp install claude-desktop acme_server.py
|
||||
fastmcp install cursor acme_server.py -e .
|
||||
```
|
||||
|
||||
Use `fastmcp discover` to inspect named MCP servers already configured on the machine.
|
||||
|
||||
When the goal is Hermes integration, either:
|
||||
|
||||
- configure the server in `~/.hermes/config.yaml` using the `native-mcp` skill, or
|
||||
- keep using FastMCP CLI commands during development until the interface stabilizes
|
||||
|
||||
### 7. Deploy After the Local Contract Is Stable
|
||||
|
||||
For managed hosting, Prefect Horizon is the path FastMCP documents most directly. Before deployment:
|
||||
|
||||
```bash
|
||||
fastmcp inspect acme_server.py:mcp
|
||||
```
|
||||
|
||||
Make sure the repo contains:
|
||||
|
||||
- a Python file with the FastMCP server object
|
||||
- `requirements.txt` or `pyproject.toml`
|
||||
- any environment-variable documentation needed for deployment
|
||||
|
||||
For generic HTTP hosting, validate the HTTP transport locally first, then deploy on any Python-compatible platform that can expose the server port.
|
||||
|
||||
## Common Patterns
|
||||
|
||||
### API Wrapper Pattern
|
||||
|
||||
Use when exposing a REST or HTTP API as MCP tools.
|
||||
|
||||
Recommended first slice:
|
||||
|
||||
- one read path
|
||||
- one list/search path
|
||||
- optional health check
|
||||
|
||||
Implementation notes:
|
||||
|
||||
- keep auth in environment variables, not hardcoded
|
||||
- centralize request logic in one helper
|
||||
- surface API errors with concise context
|
||||
- normalize inconsistent upstream payloads before returning them
|
||||
|
||||
Start from `templates/api_wrapper.py`.
|
||||
|
||||
### Database Pattern
|
||||
|
||||
Use when exposing safe query and inspection capabilities.
|
||||
|
||||
Recommended first slice:
|
||||
|
||||
- `list_tables`
|
||||
- `describe_table`
|
||||
- one constrained read query tool
|
||||
|
||||
Implementation notes:
|
||||
|
||||
- default to read-only DB access
|
||||
- reject non-`SELECT` SQL in early versions
|
||||
- limit row counts
|
||||
- return rows plus column names
|
||||
|
||||
Start from `templates/database_server.py`.
|
||||
|
||||
### File Processor Pattern
|
||||
|
||||
Use when the server needs to inspect or transform files on demand.
|
||||
|
||||
Recommended first slice:
|
||||
|
||||
- summarize file contents
|
||||
- search within files
|
||||
- extract deterministic metadata
|
||||
|
||||
Implementation notes:
|
||||
|
||||
- accept explicit file paths
|
||||
- check for missing files and encoding failures
|
||||
- cap previews and result counts
|
||||
- avoid shelling out unless a specific external tool is required
|
||||
|
||||
Start from `templates/file_processor.py`.
|
||||
|
||||
## Quality Bar
|
||||
|
||||
Before handing off a FastMCP server, verify all of the following:
|
||||
|
||||
- server imports cleanly
|
||||
- `fastmcp inspect <file.py:mcp>` succeeds
|
||||
- `fastmcp list <server spec> --json` succeeds
|
||||
- every new tool has at least one real `fastmcp call`
|
||||
- environment variables are documented
|
||||
- the tool surface is small enough to understand without guesswork
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### FastMCP command missing
|
||||
|
||||
Install the package in the active environment:
|
||||
|
||||
```bash
|
||||
pip install fastmcp
|
||||
fastmcp version
|
||||
```
|
||||
|
||||
### `fastmcp inspect` fails
|
||||
|
||||
Check that:
|
||||
|
||||
- the file imports without side effects that crash
|
||||
- the FastMCP instance is named correctly in `<file.py:object>`
|
||||
- optional dependencies from the template are installed
|
||||
|
||||
### Tool works in Python but not through CLI
|
||||
|
||||
Run:
|
||||
|
||||
```bash
|
||||
fastmcp list server.py --json
|
||||
fastmcp call server.py your_tool_name --json
|
||||
```
|
||||
|
||||
This usually exposes naming mismatches, missing required arguments, or non-serializable return values.
|
||||
|
||||
### Hermes cannot see the deployed server
|
||||
|
||||
The server-building part may be correct while the Hermes config is not. Load the `native-mcp` skill and configure the server in `~/.hermes/config.yaml`, then restart Hermes.
|
||||
|
||||
## References
|
||||
|
||||
For CLI details, install targets, and deployment checks, read `references/fastmcp-cli.md`.
|
||||
@@ -0,0 +1,110 @@
|
||||
# FastMCP CLI Reference
|
||||
|
||||
Use this file when the task needs exact FastMCP CLI workflows rather than the higher-level guidance in `SKILL.md`.
|
||||
|
||||
## Install and Verify
|
||||
|
||||
```bash
|
||||
pip install fastmcp
|
||||
fastmcp version
|
||||
```
|
||||
|
||||
FastMCP documents `pip install fastmcp` and `fastmcp version` as the baseline installation and verification path.
|
||||
|
||||
## Run a Server
|
||||
|
||||
Run a server object from a Python file:
|
||||
|
||||
```bash
|
||||
fastmcp run server.py:mcp
|
||||
```
|
||||
|
||||
Run the same server over HTTP:
|
||||
|
||||
```bash
|
||||
fastmcp run server.py:mcp --transport http --host 127.0.0.1 --port 8000
|
||||
```
|
||||
|
||||
## Inspect a Server
|
||||
|
||||
Inspect what FastMCP will expose:
|
||||
|
||||
```bash
|
||||
fastmcp inspect server.py:mcp
|
||||
```
|
||||
|
||||
This is also the check FastMCP recommends before deploying to Prefect Horizon.
|
||||
|
||||
## List and Call Tools
|
||||
|
||||
List tools from a Python file:
|
||||
|
||||
```bash
|
||||
fastmcp list server.py --json
|
||||
```
|
||||
|
||||
List tools from an HTTP endpoint:
|
||||
|
||||
```bash
|
||||
fastmcp list http://127.0.0.1:8000/mcp --json
|
||||
```
|
||||
|
||||
Call a tool with key-value arguments:
|
||||
|
||||
```bash
|
||||
fastmcp call server.py search_resources query=router limit=5 --json
|
||||
```
|
||||
|
||||
Call a tool with a full JSON input payload:
|
||||
|
||||
```bash
|
||||
fastmcp call server.py create_item '{"name": "Widget", "tags": ["sale"]}' --json
|
||||
```
|
||||
|
||||
## Discover Named MCP Servers
|
||||
|
||||
Find named servers already configured in local MCP-aware tools:
|
||||
|
||||
```bash
|
||||
fastmcp discover
|
||||
```
|
||||
|
||||
FastMCP documents name-based resolution for Claude Desktop, Claude Code, Cursor, Gemini, Goose, and `./mcp.json`.
|
||||
|
||||
## Install into MCP Clients
|
||||
|
||||
Register a server with common clients:
|
||||
|
||||
```bash
|
||||
fastmcp install claude-code server.py
|
||||
fastmcp install claude-desktop server.py
|
||||
fastmcp install cursor server.py -e .
|
||||
```
|
||||
|
||||
FastMCP notes that client installs run in isolated environments, so declare dependencies explicitly when needed with flags such as `--with`, `--env-file`, or editable installs.
|
||||
|
||||
## Deployment Checks
|
||||
|
||||
### Prefect Horizon
|
||||
|
||||
Before pushing to Horizon:
|
||||
|
||||
```bash
|
||||
fastmcp inspect server.py:mcp
|
||||
```
|
||||
|
||||
FastMCP’s Horizon docs expect:
|
||||
|
||||
- a GitHub repo
|
||||
- a Python file containing the FastMCP server object
|
||||
- dependencies declared in `requirements.txt` or `pyproject.toml`
|
||||
- an entrypoint like `main.py:mcp`
|
||||
|
||||
### Generic HTTP Hosting
|
||||
|
||||
Before shipping to any other host:
|
||||
|
||||
1. Start the server locally with HTTP transport.
|
||||
2. Verify `fastmcp list` against the local `/mcp` URL.
|
||||
3. Verify at least one `fastmcp call`.
|
||||
4. Document required environment variables.
|
||||
@@ -0,0 +1,56 @@
|
||||
#!/usr/bin/env python3
|
||||
"""Copy a FastMCP starter template into a working file."""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import argparse
|
||||
from pathlib import Path
|
||||
|
||||
|
||||
SCRIPT_DIR = Path(__file__).resolve().parent
|
||||
SKILL_DIR = SCRIPT_DIR.parent
|
||||
TEMPLATE_DIR = SKILL_DIR / "templates"
|
||||
PLACEHOLDER = "__SERVER_NAME__"
|
||||
|
||||
|
||||
def list_templates() -> list[str]:
|
||||
return sorted(path.stem for path in TEMPLATE_DIR.glob("*.py"))
|
||||
|
||||
|
||||
def render_template(template_name: str, server_name: str) -> str:
|
||||
template_path = TEMPLATE_DIR / f"{template_name}.py"
|
||||
if not template_path.exists():
|
||||
available = ", ".join(list_templates())
|
||||
raise SystemExit(f"Unknown template '{template_name}'. Available: {available}")
|
||||
return template_path.read_text(encoding="utf-8").replace(PLACEHOLDER, server_name)
|
||||
|
||||
|
||||
def main() -> int:
|
||||
parser = argparse.ArgumentParser(description=__doc__)
|
||||
parser.add_argument("--template", help="Template name without .py suffix")
|
||||
parser.add_argument("--name", help="FastMCP server display name")
|
||||
parser.add_argument("--output", help="Destination Python file path")
|
||||
parser.add_argument("--force", action="store_true", help="Overwrite an existing output file")
|
||||
parser.add_argument("--list", action="store_true", help="List available templates and exit")
|
||||
args = parser.parse_args()
|
||||
|
||||
if args.list:
|
||||
for name in list_templates():
|
||||
print(name)
|
||||
return 0
|
||||
|
||||
if not args.template or not args.name or not args.output:
|
||||
parser.error("--template, --name, and --output are required unless --list is used")
|
||||
|
||||
output_path = Path(args.output).expanduser()
|
||||
if output_path.exists() and not args.force:
|
||||
raise SystemExit(f"Refusing to overwrite existing file: {output_path}")
|
||||
|
||||
output_path.parent.mkdir(parents=True, exist_ok=True)
|
||||
output_path.write_text(render_template(args.template, args.name), encoding="utf-8")
|
||||
print(f"Wrote {output_path}")
|
||||
return 0
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
raise SystemExit(main())
|
||||
@@ -0,0 +1,54 @@
|
||||
from __future__ import annotations
|
||||
|
||||
import os
|
||||
from typing import Any
|
||||
|
||||
import httpx
|
||||
from fastmcp import FastMCP
|
||||
|
||||
|
||||
mcp = FastMCP("__SERVER_NAME__")
|
||||
|
||||
API_BASE_URL = os.getenv("API_BASE_URL", "https://api.example.com")
|
||||
API_TOKEN = os.getenv("API_TOKEN")
|
||||
REQUEST_TIMEOUT = float(os.getenv("API_TIMEOUT_SECONDS", "20"))
|
||||
|
||||
|
||||
def _headers() -> dict[str, str]:
|
||||
headers = {"Accept": "application/json"}
|
||||
if API_TOKEN:
|
||||
headers["Authorization"] = f"Bearer {API_TOKEN}"
|
||||
return headers
|
||||
|
||||
|
||||
def _request(method: str, path: str, *, params: dict[str, Any] | None = None) -> Any:
|
||||
url = f"{API_BASE_URL.rstrip('/')}/{path.lstrip('/')}"
|
||||
with httpx.Client(timeout=REQUEST_TIMEOUT, headers=_headers()) as client:
|
||||
response = client.request(method, url, params=params)
|
||||
response.raise_for_status()
|
||||
return response.json()
|
||||
|
||||
|
||||
@mcp.tool
|
||||
def health_check() -> dict[str, Any]:
|
||||
"""Check whether the upstream API is reachable."""
|
||||
payload = _request("GET", "/health")
|
||||
return {"base_url": API_BASE_URL, "result": payload}
|
||||
|
||||
|
||||
@mcp.tool
|
||||
def get_resource(resource_id: str) -> dict[str, Any]:
|
||||
"""Fetch one resource by ID from the upstream API."""
|
||||
payload = _request("GET", f"/resources/{resource_id}")
|
||||
return {"resource_id": resource_id, "data": payload}
|
||||
|
||||
|
||||
@mcp.tool
|
||||
def search_resources(query: str, limit: int = 10) -> dict[str, Any]:
|
||||
"""Search upstream resources by query string."""
|
||||
payload = _request("GET", "/resources", params={"q": query, "limit": limit})
|
||||
return {"query": query, "limit": limit, "results": payload}
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
mcp.run()
|
||||
@@ -0,0 +1,77 @@
|
||||
from __future__ import annotations
|
||||
|
||||
import os
|
||||
import re
|
||||
import sqlite3
|
||||
from typing import Any
|
||||
|
||||
from fastmcp import FastMCP
|
||||
|
||||
|
||||
mcp = FastMCP("__SERVER_NAME__")
|
||||
|
||||
DATABASE_PATH = os.getenv("SQLITE_PATH", "./app.db")
|
||||
MAX_ROWS = int(os.getenv("SQLITE_MAX_ROWS", "200"))
|
||||
TABLE_NAME_RE = re.compile(r"^[A-Za-z_][A-Za-z0-9_]*$")
|
||||
|
||||
|
||||
def _connect() -> sqlite3.Connection:
|
||||
return sqlite3.connect(f"file:{DATABASE_PATH}?mode=ro", uri=True)
|
||||
|
||||
|
||||
def _reject_mutation(sql: str) -> None:
|
||||
normalized = sql.strip().lower()
|
||||
if not normalized.startswith("select"):
|
||||
raise ValueError("Only SELECT queries are allowed")
|
||||
|
||||
|
||||
def _validate_table_name(table_name: str) -> str:
|
||||
if not TABLE_NAME_RE.fullmatch(table_name):
|
||||
raise ValueError("Invalid table name")
|
||||
return table_name
|
||||
|
||||
|
||||
@mcp.tool
|
||||
def list_tables() -> list[str]:
|
||||
"""List user-defined SQLite tables."""
|
||||
with _connect() as conn:
|
||||
rows = conn.execute(
|
||||
"SELECT name FROM sqlite_master WHERE type='table' AND name NOT LIKE 'sqlite_%' ORDER BY name"
|
||||
).fetchall()
|
||||
return [row[0] for row in rows]
|
||||
|
||||
|
||||
@mcp.tool
|
||||
def describe_table(table_name: str) -> list[dict[str, Any]]:
|
||||
"""Describe columns for a SQLite table."""
|
||||
safe_table_name = _validate_table_name(table_name)
|
||||
with _connect() as conn:
|
||||
rows = conn.execute(f"PRAGMA table_info({safe_table_name})").fetchall()
|
||||
return [
|
||||
{
|
||||
"cid": row[0],
|
||||
"name": row[1],
|
||||
"type": row[2],
|
||||
"notnull": bool(row[3]),
|
||||
"default": row[4],
|
||||
"pk": bool(row[5]),
|
||||
}
|
||||
for row in rows
|
||||
]
|
||||
|
||||
|
||||
@mcp.tool
|
||||
def query(sql: str, limit: int = 50) -> dict[str, Any]:
|
||||
"""Run a read-only SELECT query and return rows plus column names."""
|
||||
_reject_mutation(sql)
|
||||
safe_limit = max(0, min(limit, MAX_ROWS))
|
||||
wrapped_sql = f"SELECT * FROM ({sql.strip().rstrip(';')}) LIMIT {safe_limit}"
|
||||
with _connect() as conn:
|
||||
cursor = conn.execute(wrapped_sql)
|
||||
columns = [column[0] for column in cursor.description or []]
|
||||
rows = [dict(zip(columns, row)) for row in cursor.fetchall()]
|
||||
return {"limit": safe_limit, "columns": columns, "rows": rows}
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
mcp.run()
|
||||
@@ -0,0 +1,55 @@
|
||||
from __future__ import annotations
|
||||
|
||||
from pathlib import Path
|
||||
from typing import Any
|
||||
|
||||
from fastmcp import FastMCP
|
||||
|
||||
|
||||
mcp = FastMCP("__SERVER_NAME__")
|
||||
|
||||
|
||||
def _read_text(path: str) -> str:
|
||||
file_path = Path(path).expanduser()
|
||||
try:
|
||||
return file_path.read_text(encoding="utf-8")
|
||||
except FileNotFoundError as exc:
|
||||
raise ValueError(f"File not found: {file_path}") from exc
|
||||
except UnicodeDecodeError as exc:
|
||||
raise ValueError(f"File is not valid UTF-8 text: {file_path}") from exc
|
||||
|
||||
|
||||
@mcp.tool
|
||||
def summarize_text_file(path: str, preview_chars: int = 1200) -> dict[str, int | str]:
|
||||
"""Return basic metadata and a preview for a UTF-8 text file."""
|
||||
file_path = Path(path).expanduser()
|
||||
text = _read_text(path)
|
||||
return {
|
||||
"path": str(file_path),
|
||||
"characters": len(text),
|
||||
"lines": len(text.splitlines()),
|
||||
"preview": text[:preview_chars],
|
||||
}
|
||||
|
||||
|
||||
@mcp.tool
|
||||
def search_text_file(path: str, needle: str, max_matches: int = 20) -> dict[str, Any]:
|
||||
"""Find matching lines in a UTF-8 text file."""
|
||||
file_path = Path(path).expanduser()
|
||||
matches: list[dict[str, Any]] = []
|
||||
for line_number, line in enumerate(_read_text(path).splitlines(), start=1):
|
||||
if needle.lower() in line.lower():
|
||||
matches.append({"line_number": line_number, "line": line})
|
||||
if len(matches) >= max_matches:
|
||||
break
|
||||
return {"path": str(file_path), "needle": needle, "matches": matches}
|
||||
|
||||
|
||||
@mcp.resource("file://{path}")
|
||||
def read_file_resource(path: str) -> str:
|
||||
"""Expose a text file as a resource."""
|
||||
return _read_text(path)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
mcp.run()
|
||||
+2
-2
@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
|
||||
|
||||
[project]
|
||||
name = "hermes-agent"
|
||||
version = "0.3.0"
|
||||
version = "0.4.0"
|
||||
description = "The self-improving AI agent — creates skills from experience, improves them during use, and runs anywhere"
|
||||
readme = "README.md"
|
||||
requires-python = ">=3.11"
|
||||
@@ -92,7 +92,7 @@ hermes-agent = "run_agent:main"
|
||||
hermes-acp = "acp_adapter.entry:main"
|
||||
|
||||
[tool.setuptools]
|
||||
py-modules = ["run_agent", "model_tools", "toolsets", "batch_runner", "trajectory_compressor", "toolset_distributions", "cli", "hermes_constants", "hermes_state", "hermes_time", "mini_swe_runner", "rl_cli", "utils"]
|
||||
py-modules = ["run_agent", "model_tools", "toolsets", "batch_runner", "trajectory_compressor", "toolset_distributions", "cli", "hermes_constants", "hermes_state", "hermes_time", "mini_swe_runner", "minisweagent_path", "rl_cli", "utils"]
|
||||
|
||||
[tool.setuptools.packages.find]
|
||||
include = ["agent", "tools", "tools.*", "hermes_cli", "gateway", "gateway.*", "cron", "honcho_integration", "acp_adapter"]
|
||||
|
||||
+390
-67
@@ -85,7 +85,7 @@ from agent.model_metadata import (
|
||||
)
|
||||
from agent.context_compressor import ContextCompressor
|
||||
from agent.prompt_caching import apply_anthropic_cache_control
|
||||
from agent.prompt_builder import build_skills_system_prompt, build_context_files_prompt
|
||||
from agent.prompt_builder import build_skills_system_prompt, build_context_files_prompt, load_soul_md
|
||||
from agent.usage_pricing import estimate_usage_cost, normalize_usage
|
||||
from agent.display import (
|
||||
KawaiiSpinner, build_tool_preview as _build_tool_preview,
|
||||
@@ -372,6 +372,10 @@ class AIAgent:
|
||||
api_key: str = None,
|
||||
provider: str = None,
|
||||
api_mode: str = None,
|
||||
acp_command: str = None,
|
||||
acp_args: list[str] | None = None,
|
||||
command: str = None,
|
||||
args: list[str] | None = None,
|
||||
model: str = "anthropic/claude-opus-4.6", # OpenRouter format
|
||||
max_iterations: int = 90, # Default tool-calling iterations (shared with subagents)
|
||||
tool_delay: float = 1.0,
|
||||
@@ -396,6 +400,7 @@ class AIAgent:
|
||||
clarify_callback: callable = None,
|
||||
step_callback: callable = None,
|
||||
stream_delta_callback: callable = None,
|
||||
status_callback: callable = None,
|
||||
max_tokens: int = None,
|
||||
reasoning_config: Dict[str, Any] = None,
|
||||
prefill_messages: List[Dict[str, Any]] = None,
|
||||
@@ -477,6 +482,8 @@ class AIAgent:
|
||||
self.base_url = base_url or OPENROUTER_BASE_URL
|
||||
provider_name = provider.strip().lower() if isinstance(provider, str) and provider.strip() else None
|
||||
self.provider = provider_name or "openrouter"
|
||||
self.acp_command = acp_command or command
|
||||
self.acp_args = list(acp_args or args or [])
|
||||
if api_mode in {"chat_completions", "codex_responses", "anthropic_messages"}:
|
||||
self.api_mode = api_mode
|
||||
elif self.provider == "openai-codex":
|
||||
@@ -487,9 +494,20 @@ class AIAgent:
|
||||
elif self.provider == "anthropic" or (provider_name is None and "api.anthropic.com" in self._base_url_lower):
|
||||
self.api_mode = "anthropic_messages"
|
||||
self.provider = "anthropic"
|
||||
elif self._base_url_lower.rstrip("/").endswith("/anthropic"):
|
||||
# Third-party Anthropic-compatible endpoints (e.g. MiniMax, DashScope)
|
||||
# use a URL convention ending in /anthropic. Auto-detect these so the
|
||||
# Anthropic Messages API adapter is used instead of chat completions.
|
||||
self.api_mode = "anthropic_messages"
|
||||
else:
|
||||
self.api_mode = "chat_completions"
|
||||
|
||||
# Direct OpenAI sessions use the Responses API path. GPT-5.x tool
|
||||
# calls with reasoning are rejected on /v1/chat/completions, and
|
||||
# Hermes is a tool-using client by default.
|
||||
if self.api_mode == "chat_completions" and self._is_direct_openai_url():
|
||||
self.api_mode = "codex_responses"
|
||||
|
||||
# Pre-warm OpenRouter model metadata cache in a background thread.
|
||||
# fetch_model_metadata() is cached for 1 hour; this avoids a blocking
|
||||
# HTTP request on the first API response when pricing is estimated.
|
||||
@@ -505,8 +523,13 @@ class AIAgent:
|
||||
self.clarify_callback = clarify_callback
|
||||
self.step_callback = step_callback
|
||||
self.stream_delta_callback = stream_delta_callback
|
||||
self.status_callback = status_callback
|
||||
self._last_reported_tool = None # Track for "new tool" mode
|
||||
|
||||
# Tool execution state — allows _vprint during tool execution
|
||||
# even when stream consumers are registered (no tokens streaming then)
|
||||
self._executing_tools = False
|
||||
|
||||
# Interrupt mechanism for breaking out of tool loops
|
||||
self._interrupt_requested = False
|
||||
self._interrupt_message = None # Optional message that triggered interrupt
|
||||
@@ -550,6 +573,12 @@ class AIAgent:
|
||||
self._budget_warning_threshold = 0.9 # 90% — urgent, respond now
|
||||
self._budget_pressure_enabled = True
|
||||
|
||||
# Context pressure warnings: notify the USER (not the LLM) as context
|
||||
# fills up. Purely informational — displayed in CLI output and sent via
|
||||
# status_callback for gateway platforms. Does NOT inject into messages.
|
||||
self._context_50_warned = False
|
||||
self._context_70_warned = False
|
||||
|
||||
# Persistent error log -- always writes WARNING+ to ~/.hermes/logs/errors.log
|
||||
# so tool failures, API errors, etc. are inspectable after the fact.
|
||||
# In gateway mode, each incoming message creates a new AIAgent instance,
|
||||
@@ -671,6 +700,9 @@ class AIAgent:
|
||||
# Explicit credentials from CLI/gateway — construct directly.
|
||||
# The runtime provider resolver already handled auth for us.
|
||||
client_kwargs = {"api_key": api_key, "base_url": base_url}
|
||||
if self.provider == "copilot-acp":
|
||||
client_kwargs["command"] = self.acp_command
|
||||
client_kwargs["args"] = self.acp_args
|
||||
effective_base = base_url
|
||||
if "openrouter" in effective_base.lower():
|
||||
client_kwargs["default_headers"] = {
|
||||
@@ -678,6 +710,10 @@ class AIAgent:
|
||||
"X-OpenRouter-Title": "Hermes Agent",
|
||||
"X-OpenRouter-Categories": "productivity,cli-agent",
|
||||
}
|
||||
elif "api.githubcopilot.com" in effective_base.lower():
|
||||
from hermes_cli.models import copilot_default_headers
|
||||
|
||||
client_kwargs["default_headers"] = copilot_default_headers()
|
||||
elif "api.kimi.com" in effective_base.lower():
|
||||
client_kwargs["default_headers"] = {
|
||||
"User-Agent": "KimiCLI/1.3",
|
||||
@@ -951,6 +987,39 @@ class AIAgent:
|
||||
compression_threshold = float(_compression_cfg.get("threshold", 0.50))
|
||||
compression_enabled = str(_compression_cfg.get("enabled", True)).lower() in ("true", "1", "yes")
|
||||
compression_summary_model = _compression_cfg.get("summary_model") or None
|
||||
|
||||
# Read explicit context_length override from model config
|
||||
_model_cfg = _agent_cfg.get("model", {})
|
||||
if isinstance(_model_cfg, dict):
|
||||
_config_context_length = _model_cfg.get("context_length")
|
||||
else:
|
||||
_config_context_length = None
|
||||
if _config_context_length is not None:
|
||||
try:
|
||||
_config_context_length = int(_config_context_length)
|
||||
except (TypeError, ValueError):
|
||||
_config_context_length = None
|
||||
|
||||
# Check custom_providers per-model context_length
|
||||
if _config_context_length is None:
|
||||
_custom_providers = _agent_cfg.get("custom_providers")
|
||||
if isinstance(_custom_providers, list):
|
||||
for _cp_entry in _custom_providers:
|
||||
if not isinstance(_cp_entry, dict):
|
||||
continue
|
||||
_cp_url = (_cp_entry.get("base_url") or "").rstrip("/")
|
||||
if _cp_url and _cp_url == self.base_url.rstrip("/"):
|
||||
_cp_models = _cp_entry.get("models", {})
|
||||
if isinstance(_cp_models, dict):
|
||||
_cp_model_cfg = _cp_models.get(self.model, {})
|
||||
if isinstance(_cp_model_cfg, dict):
|
||||
_cp_ctx = _cp_model_cfg.get("context_length")
|
||||
if _cp_ctx is not None:
|
||||
try:
|
||||
_config_context_length = int(_cp_ctx)
|
||||
except (TypeError, ValueError):
|
||||
pass
|
||||
break
|
||||
|
||||
self.context_compressor = ContextCompressor(
|
||||
model=self.model,
|
||||
@@ -962,6 +1031,8 @@ class AIAgent:
|
||||
quiet_mode=self.quiet_mode,
|
||||
base_url=self.base_url,
|
||||
api_key=getattr(self, "api_key", ""),
|
||||
config_context_length=_config_context_length,
|
||||
provider=self.provider,
|
||||
)
|
||||
self.compression_enabled = compression_enabled
|
||||
self._user_turn_count = 0
|
||||
@@ -985,6 +1056,46 @@ class AIAgent:
|
||||
print(f"📊 Context limit: {self.context_compressor.context_length:,} tokens (compress at {int(compression_threshold*100)}% = {self.context_compressor.threshold_tokens:,})")
|
||||
else:
|
||||
print(f"📊 Context limit: {self.context_compressor.context_length:,} tokens (auto-compression disabled)")
|
||||
|
||||
def reset_session_state(self):
|
||||
"""Reset all session-scoped token counters to 0 for a fresh session.
|
||||
|
||||
This method encapsulates the reset logic for all session-level metrics
|
||||
including:
|
||||
- Token usage counters (input, output, total, prompt, completion)
|
||||
- Cache read/write tokens
|
||||
- API call count
|
||||
- Reasoning tokens
|
||||
- Estimated cost tracking
|
||||
- Context compressor internal counters
|
||||
|
||||
The method safely handles optional attributes (e.g., context compressor)
|
||||
using ``hasattr`` checks.
|
||||
|
||||
This keeps the counter reset logic DRY and maintainable in one place
|
||||
rather than scattering it across multiple methods.
|
||||
"""
|
||||
# Token usage counters
|
||||
self.session_total_tokens = 0
|
||||
self.session_input_tokens = 0
|
||||
self.session_output_tokens = 0
|
||||
self.session_prompt_tokens = 0
|
||||
self.session_completion_tokens = 0
|
||||
self.session_cache_read_tokens = 0
|
||||
self.session_cache_write_tokens = 0
|
||||
self.session_reasoning_tokens = 0
|
||||
self.session_api_calls = 0
|
||||
self.session_estimated_cost_usd = 0.0
|
||||
self.session_cost_status = "unknown"
|
||||
self.session_cost_source = "none"
|
||||
|
||||
# Context compressor internal counters (if present)
|
||||
if hasattr(self, "context_compressor") and self.context_compressor:
|
||||
self.context_compressor.last_prompt_tokens = 0
|
||||
self.context_compressor.last_completion_tokens = 0
|
||||
self.context_compressor.last_total_tokens = 0
|
||||
self.context_compressor.compression_count = 0
|
||||
self.context_compressor._context_probed = False
|
||||
|
||||
@staticmethod
|
||||
def _safe_print(*args, **kwargs):
|
||||
@@ -1000,15 +1111,24 @@ class AIAgent:
|
||||
pass
|
||||
|
||||
def _vprint(self, *args, force: bool = False, **kwargs):
|
||||
"""Verbose print — suppressed when streaming TTS is active.
|
||||
"""Verbose print — suppressed when actively streaming tokens.
|
||||
|
||||
Pass ``force=True`` for error/warning messages that should always be
|
||||
shown even during streaming playback (TTS or display).
|
||||
|
||||
During tool execution (``_executing_tools`` is True), printing is
|
||||
allowed even with stream consumers registered because no tokens
|
||||
are being streamed at that point.
|
||||
"""
|
||||
if not force and self._has_stream_consumers():
|
||||
if not force and self._has_stream_consumers() and not self._executing_tools:
|
||||
return
|
||||
self._safe_print(*args, **kwargs)
|
||||
|
||||
def _is_direct_openai_url(self, base_url: str = None) -> bool:
|
||||
"""Return True when a base URL targets OpenAI's native API."""
|
||||
url = (base_url or self._base_url_lower).lower()
|
||||
return "api.openai.com" in url and "openrouter" not in url
|
||||
|
||||
def _max_tokens_param(self, value: int) -> dict:
|
||||
"""Return the correct max tokens kwarg for the current provider.
|
||||
|
||||
@@ -1016,41 +1136,44 @@ class AIAgent:
|
||||
'max_completion_tokens'. OpenRouter, local models, and older
|
||||
OpenAI models use 'max_tokens'.
|
||||
"""
|
||||
_is_direct_openai = (
|
||||
"api.openai.com" in self._base_url_lower
|
||||
and "openrouter" not in self._base_url_lower
|
||||
)
|
||||
if _is_direct_openai:
|
||||
if self._is_direct_openai_url():
|
||||
return {"max_completion_tokens": value}
|
||||
return {"max_tokens": value}
|
||||
|
||||
def _has_content_after_think_block(self, content: str) -> bool:
|
||||
"""
|
||||
Check if content has actual text after any <think></think> blocks.
|
||||
|
||||
Check if content has actual text after any reasoning/thinking blocks.
|
||||
|
||||
This detects cases where the model only outputs reasoning but no actual
|
||||
response, which indicates an incomplete generation that should be retried.
|
||||
|
||||
Must stay in sync with _strip_think_blocks() tag variants.
|
||||
|
||||
Args:
|
||||
content: The assistant message content to check
|
||||
|
||||
|
||||
Returns:
|
||||
True if there's meaningful content after think blocks, False otherwise
|
||||
"""
|
||||
if not content:
|
||||
return False
|
||||
|
||||
# Remove all <think>...</think> blocks (including nested ones, non-greedy)
|
||||
cleaned = re.sub(r'<think>.*?</think>', '', content, flags=re.DOTALL)
|
||||
|
||||
|
||||
# Remove all reasoning tag variants (must match _strip_think_blocks)
|
||||
cleaned = self._strip_think_blocks(content)
|
||||
|
||||
# Check if there's any non-whitespace content remaining
|
||||
return bool(cleaned.strip())
|
||||
|
||||
def _strip_think_blocks(self, content: str) -> str:
|
||||
"""Remove <think>...</think> blocks from content, returning only visible text."""
|
||||
"""Remove reasoning/thinking blocks from content, returning only visible text."""
|
||||
if not content:
|
||||
return ""
|
||||
return re.sub(r'<think>.*?</think>', '', content, flags=re.DOTALL)
|
||||
# Strip all reasoning tag variants: <think>, <thinking>, <THINKING>,
|
||||
# <reasoning>, <REASONING_SCRATCHPAD>
|
||||
content = re.sub(r'<think>.*?</think>', '', content, flags=re.DOTALL)
|
||||
content = re.sub(r'<thinking>.*?</thinking>', '', content, flags=re.DOTALL | re.IGNORECASE)
|
||||
content = re.sub(r'<reasoning>.*?</reasoning>', '', content, flags=re.DOTALL)
|
||||
content = re.sub(r'<REASONING_SCRATCHPAD>.*?</REASONING_SCRATCHPAD>', '', content, flags=re.DOTALL)
|
||||
return content
|
||||
|
||||
def _looks_like_codex_intermediate_ack(
|
||||
self,
|
||||
@@ -1935,28 +2058,38 @@ class AIAgent:
|
||||
is stable across all turns in a session, maximizing prefix cache hits.
|
||||
"""
|
||||
# Layers (in order):
|
||||
# 1. Default agent identity (always present)
|
||||
# 1. Agent identity — SOUL.md when available, else DEFAULT_AGENT_IDENTITY
|
||||
# 2. User / gateway system prompt (if provided)
|
||||
# 3. Persistent memory (frozen snapshot)
|
||||
# 4. Skills guidance (if skills tools are loaded)
|
||||
# 5. Context files (SOUL.md, AGENTS.md, .cursorrules)
|
||||
# 5. Context files (AGENTS.md, .cursorrules — SOUL.md excluded here when used as identity)
|
||||
# 6. Current date & time (frozen at build time)
|
||||
# 7. Platform-specific formatting hint
|
||||
# If an AI peer name is configured in Honcho, personalise the identity line.
|
||||
_ai_peer_name = (
|
||||
self._honcho_config.ai_peer
|
||||
if self._honcho_config and self._honcho_config.ai_peer != "hermes"
|
||||
else None
|
||||
)
|
||||
if _ai_peer_name:
|
||||
_identity = DEFAULT_AGENT_IDENTITY.replace(
|
||||
"You are Hermes Agent",
|
||||
f"You are {_ai_peer_name}",
|
||||
1,
|
||||
|
||||
# Try SOUL.md as primary identity (unless context files are skipped)
|
||||
_soul_loaded = False
|
||||
if not self.skip_context_files:
|
||||
_soul_content = load_soul_md()
|
||||
if _soul_content:
|
||||
prompt_parts = [_soul_content]
|
||||
_soul_loaded = True
|
||||
|
||||
if not _soul_loaded:
|
||||
# Fallback to hardcoded identity
|
||||
_ai_peer_name = (
|
||||
self._honcho_config.ai_peer
|
||||
if self._honcho_config and self._honcho_config.ai_peer != "hermes"
|
||||
else None
|
||||
)
|
||||
else:
|
||||
_identity = DEFAULT_AGENT_IDENTITY
|
||||
prompt_parts = [_identity]
|
||||
if _ai_peer_name:
|
||||
_identity = DEFAULT_AGENT_IDENTITY.replace(
|
||||
"You are Hermes Agent",
|
||||
f"You are {_ai_peer_name}",
|
||||
1,
|
||||
)
|
||||
else:
|
||||
_identity = DEFAULT_AGENT_IDENTITY
|
||||
prompt_parts = [_identity]
|
||||
|
||||
# Tool-aware behavioral guidance: only inject when the tools are loaded
|
||||
tool_guidance = []
|
||||
@@ -2052,7 +2185,7 @@ class AIAgent:
|
||||
prompt_parts.append(skills_prompt)
|
||||
|
||||
if not self.skip_context_files:
|
||||
context_files_prompt = build_context_files_prompt()
|
||||
context_files_prompt = build_context_files_prompt(skip_soul=_soul_loaded)
|
||||
if context_files_prompt:
|
||||
prompt_parts.append(context_files_prompt)
|
||||
|
||||
@@ -2061,6 +2194,10 @@ class AIAgent:
|
||||
timestamp_line = f"Conversation started: {now.strftime('%A, %B %d, %Y %I:%M %p')}"
|
||||
if self.pass_session_id and self.session_id:
|
||||
timestamp_line += f"\nSession ID: {self.session_id}"
|
||||
if self.model:
|
||||
timestamp_line += f"\nModel: {self.model}"
|
||||
if self.provider:
|
||||
timestamp_line += f"\nProvider: {self.provider}"
|
||||
prompt_parts.append(timestamp_line)
|
||||
|
||||
platform_key = (self.platform or "").lower().strip()
|
||||
@@ -2311,13 +2448,22 @@ class AIAgent:
|
||||
# Replay encrypted reasoning items from previous turns
|
||||
# so the API can maintain coherent reasoning chains.
|
||||
codex_reasoning = msg.get("codex_reasoning_items")
|
||||
has_codex_reasoning = False
|
||||
if isinstance(codex_reasoning, list):
|
||||
for ri in codex_reasoning:
|
||||
if isinstance(ri, dict) and ri.get("encrypted_content"):
|
||||
items.append(ri)
|
||||
has_codex_reasoning = True
|
||||
|
||||
if content_text.strip():
|
||||
items.append({"role": "assistant", "content": content_text})
|
||||
elif has_codex_reasoning:
|
||||
# The Responses API requires a following item after each
|
||||
# reasoning item (otherwise: missing_following_item error).
|
||||
# When the assistant produced only reasoning with no visible
|
||||
# content, emit an empty assistant message as the required
|
||||
# following item.
|
||||
items.append({"role": "assistant", "content": ""})
|
||||
|
||||
tool_calls = msg.get("tool_calls")
|
||||
if isinstance(tool_calls, list):
|
||||
@@ -2759,6 +2905,14 @@ class AIAgent:
|
||||
finish_reason = "tool_calls"
|
||||
elif has_incomplete_items or (saw_commentary_phase and not saw_final_answer_phase):
|
||||
finish_reason = "incomplete"
|
||||
elif reasoning_items_raw and not final_text:
|
||||
# Response contains only reasoning (encrypted thinking state) with
|
||||
# no visible content or tool calls. The model is still thinking and
|
||||
# needs another turn to produce the actual answer. Marking this as
|
||||
# "stop" would send it into the empty-content retry loop which burns
|
||||
# 3 retries then fails — treat it as incomplete instead so the Codex
|
||||
# continuation path handles it correctly.
|
||||
finish_reason = "incomplete"
|
||||
else:
|
||||
finish_reason = "stop"
|
||||
return assistant_message, finish_reason
|
||||
@@ -2789,10 +2943,23 @@ class AIAgent:
|
||||
|
||||
if isinstance(client, Mock):
|
||||
return False
|
||||
if bool(getattr(client, "is_closed", False)):
|
||||
return True
|
||||
http_client = getattr(client, "_client", None)
|
||||
return bool(getattr(http_client, "is_closed", False))
|
||||
|
||||
def _create_openai_client(self, client_kwargs: dict, *, reason: str, shared: bool) -> Any:
|
||||
if self.provider == "copilot-acp" or str(client_kwargs.get("base_url", "")).startswith("acp://copilot"):
|
||||
from agent.copilot_acp_client import CopilotACPClient
|
||||
|
||||
client = CopilotACPClient(**client_kwargs)
|
||||
logger.info(
|
||||
"Copilot ACP client created (%s, shared=%s) %s",
|
||||
reason,
|
||||
shared,
|
||||
self._client_log_context(),
|
||||
)
|
||||
return client
|
||||
client = OpenAI(**client_kwargs)
|
||||
logger.info(
|
||||
"OpenAI client created (%s, shared=%s) %s",
|
||||
@@ -3432,13 +3599,15 @@ class AIAgent:
|
||||
fb_provider)
|
||||
return False
|
||||
|
||||
# Determine api_mode from provider
|
||||
# Determine api_mode from provider / base URL
|
||||
fb_api_mode = "chat_completions"
|
||||
fb_base_url = str(fb_client.base_url)
|
||||
if fb_provider == "openai-codex":
|
||||
fb_api_mode = "codex_responses"
|
||||
elif fb_provider == "anthropic":
|
||||
elif fb_provider == "anthropic" or fb_base_url.rstrip("/").lower().endswith("/anthropic"):
|
||||
fb_api_mode = "anthropic_messages"
|
||||
fb_base_url = str(fb_client.base_url)
|
||||
elif self._is_direct_openai_url(fb_base_url):
|
||||
fb_api_mode = "codex_responses"
|
||||
|
||||
old_model = self.model
|
||||
self.model = fb_model
|
||||
@@ -3652,6 +3821,11 @@ class AIAgent:
|
||||
if not instructions:
|
||||
instructions = DEFAULT_AGENT_IDENTITY
|
||||
|
||||
is_github_responses = (
|
||||
"models.github.ai" in self.base_url.lower()
|
||||
or "api.githubcopilot.com" in self.base_url.lower()
|
||||
)
|
||||
|
||||
# Resolve reasoning effort: config > default (medium)
|
||||
reasoning_effort = "medium"
|
||||
reasoning_enabled = True
|
||||
@@ -3669,13 +3843,23 @@ class AIAgent:
|
||||
"tool_choice": "auto",
|
||||
"parallel_tool_calls": True,
|
||||
"store": False,
|
||||
"prompt_cache_key": self.session_id,
|
||||
}
|
||||
|
||||
if not is_github_responses:
|
||||
kwargs["prompt_cache_key"] = self.session_id
|
||||
|
||||
if reasoning_enabled:
|
||||
kwargs["reasoning"] = {"effort": reasoning_effort, "summary": "auto"}
|
||||
kwargs["include"] = ["reasoning.encrypted_content"]
|
||||
else:
|
||||
if is_github_responses:
|
||||
# Copilot's Responses route advertises reasoning-effort support,
|
||||
# but not OpenAI-specific prompt cache or encrypted reasoning
|
||||
# fields. Keep the payload to the documented subset.
|
||||
github_reasoning = self._github_models_reasoning_extra_body()
|
||||
if github_reasoning is not None:
|
||||
kwargs["reasoning"] = github_reasoning
|
||||
else:
|
||||
kwargs["reasoning"] = {"effort": reasoning_effort, "summary": "auto"}
|
||||
kwargs["include"] = ["reasoning.encrypted_content"]
|
||||
elif not is_github_responses:
|
||||
kwargs["include"] = []
|
||||
|
||||
if self.max_tokens is not None:
|
||||
@@ -3746,6 +3930,10 @@ class AIAgent:
|
||||
extra_body = {}
|
||||
|
||||
_is_openrouter = "openrouter" in self._base_url_lower
|
||||
_is_github_models = (
|
||||
"models.github.ai" in self._base_url_lower
|
||||
or "api.githubcopilot.com" in self._base_url_lower
|
||||
)
|
||||
|
||||
# Provider preferences (only, ignore, order, sort) are OpenRouter-
|
||||
# specific. Only send to OpenRouter-compatible endpoints.
|
||||
@@ -3756,19 +3944,24 @@ class AIAgent:
|
||||
_is_nous = "nousresearch" in self._base_url_lower
|
||||
|
||||
if self._supports_reasoning_extra_body():
|
||||
if self.reasoning_config is not None:
|
||||
rc = dict(self.reasoning_config)
|
||||
# Nous Portal requires reasoning enabled — don't send
|
||||
# enabled=false to it (would cause 400).
|
||||
if _is_nous and rc.get("enabled") is False:
|
||||
pass # omit reasoning entirely for Nous when disabled
|
||||
else:
|
||||
extra_body["reasoning"] = rc
|
||||
if _is_github_models:
|
||||
github_reasoning = self._github_models_reasoning_extra_body()
|
||||
if github_reasoning is not None:
|
||||
extra_body["reasoning"] = github_reasoning
|
||||
else:
|
||||
extra_body["reasoning"] = {
|
||||
"enabled": True,
|
||||
"effort": "medium"
|
||||
}
|
||||
if self.reasoning_config is not None:
|
||||
rc = dict(self.reasoning_config)
|
||||
# Nous Portal requires reasoning enabled — don't send
|
||||
# enabled=false to it (would cause 400).
|
||||
if _is_nous and rc.get("enabled") is False:
|
||||
pass # omit reasoning entirely for Nous when disabled
|
||||
else:
|
||||
extra_body["reasoning"] = rc
|
||||
else:
|
||||
extra_body["reasoning"] = {
|
||||
"enabled": True,
|
||||
"effort": "medium"
|
||||
}
|
||||
|
||||
# Nous Portal product attribution
|
||||
if _is_nous:
|
||||
@@ -3790,6 +3983,13 @@ class AIAgent:
|
||||
return True
|
||||
if "ai-gateway.vercel.sh" in self._base_url_lower:
|
||||
return True
|
||||
if "models.github.ai" in self._base_url_lower or "api.githubcopilot.com" in self._base_url_lower:
|
||||
try:
|
||||
from hermes_cli.models import github_model_reasoning_efforts
|
||||
|
||||
return bool(github_model_reasoning_efforts(self.model))
|
||||
except Exception:
|
||||
return False
|
||||
if "openrouter" not in self._base_url_lower:
|
||||
return False
|
||||
if "api.mistral.ai" in self._base_url_lower:
|
||||
@@ -3806,6 +4006,38 @@ class AIAgent:
|
||||
)
|
||||
return any(model.startswith(prefix) for prefix in reasoning_model_prefixes)
|
||||
|
||||
def _github_models_reasoning_extra_body(self) -> dict | None:
|
||||
"""Format reasoning payload for GitHub Models/OpenAI-compatible routes."""
|
||||
try:
|
||||
from hermes_cli.models import github_model_reasoning_efforts
|
||||
except Exception:
|
||||
return None
|
||||
|
||||
supported_efforts = github_model_reasoning_efforts(self.model)
|
||||
if not supported_efforts:
|
||||
return None
|
||||
|
||||
if self.reasoning_config and isinstance(self.reasoning_config, dict):
|
||||
if self.reasoning_config.get("enabled") is False:
|
||||
return None
|
||||
requested_effort = str(
|
||||
self.reasoning_config.get("effort", "medium")
|
||||
).strip().lower()
|
||||
else:
|
||||
requested_effort = "medium"
|
||||
|
||||
if requested_effort == "xhigh" and "high" in supported_efforts:
|
||||
requested_effort = "high"
|
||||
elif requested_effort not in supported_efforts:
|
||||
if requested_effort == "minimal" and "low" in supported_efforts:
|
||||
requested_effort = "low"
|
||||
elif "medium" in supported_efforts:
|
||||
requested_effort = "medium"
|
||||
else:
|
||||
requested_effort = supported_efforts[0]
|
||||
|
||||
return {"effort": requested_effort}
|
||||
|
||||
def _build_assistant_message(self, assistant_message, finish_reason: str) -> dict:
|
||||
"""Build a normalized assistant message dict from an API response message.
|
||||
|
||||
@@ -4162,6 +4394,10 @@ class AIAgent:
|
||||
except Exception as e:
|
||||
logger.debug("Session DB compression split failed: %s", e)
|
||||
|
||||
# Reset context pressure warnings — usage drops after compaction
|
||||
self._context_50_warned = False
|
||||
self._context_70_warned = False
|
||||
|
||||
return compressed, new_system_prompt
|
||||
|
||||
def _execute_tool_calls(self, assistant_message, messages: list, effective_task_id: str, api_call_count: int = 0) -> None:
|
||||
@@ -4173,14 +4409,19 @@ class AIAgent:
|
||||
"""
|
||||
tool_calls = assistant_message.tool_calls
|
||||
|
||||
if not _should_parallelize_tool_batch(tool_calls):
|
||||
return self._execute_tool_calls_sequential(
|
||||
# Allow _vprint during tool execution even with stream consumers
|
||||
self._executing_tools = True
|
||||
try:
|
||||
if not _should_parallelize_tool_batch(tool_calls):
|
||||
return self._execute_tool_calls_sequential(
|
||||
assistant_message, messages, effective_task_id, api_call_count
|
||||
)
|
||||
|
||||
return self._execute_tool_calls_concurrent(
|
||||
assistant_message, messages, effective_task_id, api_call_count
|
||||
)
|
||||
|
||||
return self._execute_tool_calls_concurrent(
|
||||
assistant_message, messages, effective_task_id, api_call_count
|
||||
)
|
||||
finally:
|
||||
self._executing_tools = False
|
||||
|
||||
def _invoke_tool(self, function_name: str, function_args: dict, effective_task_id: str) -> str:
|
||||
"""Invoke a single tool and return the result string. No display logic.
|
||||
@@ -4737,6 +4978,45 @@ class AIAgent:
|
||||
)
|
||||
return None
|
||||
|
||||
def _emit_context_pressure(self, compaction_progress: float, compressor) -> None:
|
||||
"""Notify the user that context is approaching the compaction threshold.
|
||||
|
||||
Args:
|
||||
compaction_progress: How close to compaction (0.0–1.0, where 1.0 = fires).
|
||||
compressor: The ContextCompressor instance (for threshold/context info).
|
||||
|
||||
Purely user-facing — does NOT modify the message stream.
|
||||
For CLI: prints a formatted line with a progress bar.
|
||||
For gateway: fires status_callback so the platform can send a chat message.
|
||||
"""
|
||||
from agent.display import format_context_pressure, format_context_pressure_gateway
|
||||
|
||||
threshold_pct = compressor.threshold_tokens / compressor.context_length if compressor.context_length else 0.5
|
||||
|
||||
# CLI output — always shown (these are user-facing status notifications,
|
||||
# not verbose debug output, so they bypass quiet_mode).
|
||||
# Gateway users also get the callback below.
|
||||
if self.platform in (None, "cli"):
|
||||
line = format_context_pressure(
|
||||
compaction_progress=compaction_progress,
|
||||
threshold_tokens=compressor.threshold_tokens,
|
||||
threshold_percent=threshold_pct,
|
||||
compression_enabled=self.compression_enabled,
|
||||
)
|
||||
self._safe_print(line)
|
||||
|
||||
# Gateway / external consumers
|
||||
if self.status_callback:
|
||||
try:
|
||||
msg = format_context_pressure_gateway(
|
||||
compaction_progress=compaction_progress,
|
||||
threshold_percent=threshold_pct,
|
||||
compression_enabled=self.compression_enabled,
|
||||
)
|
||||
self.status_callback("context_pressure", msg)
|
||||
except Exception:
|
||||
logger.debug("status_callback error in context pressure", exc_info=True)
|
||||
|
||||
def _handle_max_iterations(self, messages: list, api_call_count: int) -> str:
|
||||
"""Request a summary when max iterations are reached. Returns the final response text."""
|
||||
print(f"⚠️ Reached maximum iterations ({self.max_iterations}). Requesting summary...")
|
||||
@@ -5237,14 +5517,17 @@ class AIAgent:
|
||||
self._vprint(f"\n{self.log_prefix}🔄 Making API call #{api_call_count}/{self.max_iterations}...")
|
||||
self._vprint(f"{self.log_prefix} 📊 Request size: {len(api_messages)} messages, ~{approx_tokens:,} tokens (~{total_chars:,} chars)")
|
||||
self._vprint(f"{self.log_prefix} 🔧 Available tools: {len(self.tools) if self.tools else 0}")
|
||||
elif not self._has_stream_consumers():
|
||||
# Animated thinking spinner in quiet mode (skip during streaming)
|
||||
else:
|
||||
# Animated thinking spinner in quiet mode
|
||||
face = random.choice(KawaiiSpinner.KAWAII_THINKING)
|
||||
verb = random.choice(KawaiiSpinner.THINKING_VERBS)
|
||||
if self.thinking_callback:
|
||||
# CLI TUI mode: use prompt_toolkit widget instead of raw spinner
|
||||
# (works in both streaming and non-streaming modes)
|
||||
self.thinking_callback(f"{face} {verb}...")
|
||||
else:
|
||||
elif not self._has_stream_consumers():
|
||||
# Raw KawaiiSpinner only when no streaming consumers
|
||||
# (would conflict with streamed token output)
|
||||
spinner_type = random.choice(['brain', 'sparkle', 'pulse', 'moon', 'star'])
|
||||
thinking_spinner = KawaiiSpinner(f"{face} {verb}...", spinner_type=spinner_type)
|
||||
thinking_spinner.start()
|
||||
@@ -6093,15 +6376,24 @@ class AIAgent:
|
||||
interim_msg = self._build_assistant_message(assistant_message, finish_reason)
|
||||
interim_has_content = bool((interim_msg.get("content") or "").strip())
|
||||
interim_has_reasoning = bool(interim_msg.get("reasoning", "").strip()) if isinstance(interim_msg.get("reasoning"), str) else False
|
||||
interim_has_codex_reasoning = bool(interim_msg.get("codex_reasoning_items"))
|
||||
|
||||
if interim_has_content or interim_has_reasoning:
|
||||
if interim_has_content or interim_has_reasoning or interim_has_codex_reasoning:
|
||||
last_msg = messages[-1] if messages else None
|
||||
# Duplicate detection: two consecutive incomplete assistant
|
||||
# messages with identical content AND reasoning are collapsed.
|
||||
# For reasoning-only messages (codex_reasoning_items differ but
|
||||
# visible content/reasoning are both empty), we also compare
|
||||
# the encrypted items to avoid silently dropping new state.
|
||||
last_codex_items = last_msg.get("codex_reasoning_items") if isinstance(last_msg, dict) else None
|
||||
interim_codex_items = interim_msg.get("codex_reasoning_items")
|
||||
duplicate_interim = (
|
||||
isinstance(last_msg, dict)
|
||||
and last_msg.get("role") == "assistant"
|
||||
and last_msg.get("finish_reason") == "incomplete"
|
||||
and (last_msg.get("content") or "") == (interim_msg.get("content") or "")
|
||||
and (last_msg.get("reasoning") or "") == (interim_msg.get("reasoning") or "")
|
||||
and last_codex_items == interim_codex_items
|
||||
)
|
||||
if not duplicate_interim:
|
||||
messages.append(interim_msg)
|
||||
@@ -6300,6 +6592,23 @@ class AIAgent:
|
||||
+ _compressor.last_completion_tokens
|
||||
+ _new_chars // 3 # conservative: JSON-heavy tool results ≈ 3 chars/token
|
||||
)
|
||||
|
||||
# ── Context pressure warnings (user-facing only) ──────────
|
||||
# Notify the user (NOT the LLM) as context approaches the
|
||||
# compaction threshold. Thresholds are relative to where
|
||||
# compaction fires, not the raw context window.
|
||||
# Does not inject into messages — just prints to CLI output
|
||||
# and fires status_callback for gateway platforms.
|
||||
if _compressor.threshold_tokens > 0:
|
||||
_compaction_progress = _estimated_next_prompt / _compressor.threshold_tokens
|
||||
if _compaction_progress >= 0.85 and not self._context_70_warned:
|
||||
self._context_70_warned = True
|
||||
self._context_50_warned = True # skip first tier if we jumped past it
|
||||
self._emit_context_pressure(_compaction_progress, _compressor)
|
||||
elif _compaction_progress >= 0.60 and not self._context_50_warned:
|
||||
self._context_50_warned = True
|
||||
self._emit_context_pressure(_compaction_progress, _compressor)
|
||||
|
||||
if self.compression_enabled and _compressor.should_compress(_estimated_next_prompt):
|
||||
messages, active_system_prompt = self._compress_context(
|
||||
messages, system_message,
|
||||
@@ -6385,7 +6694,21 @@ class AIAgent:
|
||||
self._response_was_previewed = True
|
||||
break
|
||||
|
||||
# No fallback -- append the empty message as-is
|
||||
# No fallback -- if reasoning_text exists, the model put its
|
||||
# entire response inside <think> tags; use that as the content.
|
||||
if reasoning_text:
|
||||
self._vprint(f"{self.log_prefix}Using reasoning as response content (model wrapped entire response in think tags).", force=True)
|
||||
final_response = reasoning_text
|
||||
empty_msg = {
|
||||
"role": "assistant",
|
||||
"content": final_response,
|
||||
"reasoning": reasoning_text,
|
||||
"finish_reason": finish_reason,
|
||||
}
|
||||
messages.append(empty_msg)
|
||||
break
|
||||
|
||||
# Truly empty -- no reasoning and no content
|
||||
empty_msg = {
|
||||
"role": "assistant",
|
||||
"content": final_response,
|
||||
@@ -6393,10 +6716,10 @@ class AIAgent:
|
||||
"finish_reason": finish_reason,
|
||||
}
|
||||
messages.append(empty_msg)
|
||||
|
||||
|
||||
self._cleanup_task_resources(effective_task_id)
|
||||
self._persist_session(messages, conversation_history)
|
||||
|
||||
|
||||
return {
|
||||
"final_response": final_response or None,
|
||||
"messages": messages,
|
||||
|
||||
@@ -18,12 +18,13 @@
|
||||
* node bridge.js --port 3000 --session ~/.hermes/whatsapp/session
|
||||
*/
|
||||
|
||||
import { makeWASocket, useMultiFileAuthState, DisconnectReason, fetchLatestBaileysVersion } from '@whiskeysockets/baileys';
|
||||
import { makeWASocket, useMultiFileAuthState, DisconnectReason, fetchLatestBaileysVersion, downloadMediaMessage } from '@whiskeysockets/baileys';
|
||||
import express from 'express';
|
||||
import { Boom } from '@hapi/boom';
|
||||
import pino from 'pino';
|
||||
import path from 'path';
|
||||
import { mkdirSync, readFileSync, existsSync } from 'fs';
|
||||
import { mkdirSync, readFileSync, writeFileSync, existsSync, readdirSync } from 'fs';
|
||||
import { randomBytes } from 'crypto';
|
||||
import qrcode from 'qrcode-terminal';
|
||||
|
||||
// Parse CLI args
|
||||
@@ -41,6 +42,7 @@ const WHATSAPP_DEBUG =
|
||||
|
||||
const PORT = parseInt(getArg('port', '3000'), 10);
|
||||
const SESSION_DIR = getArg('session', path.join(process.env.HOME || '~', '.hermes', 'whatsapp', 'session'));
|
||||
const IMAGE_CACHE_DIR = path.join(process.env.HOME || '~', '.hermes', 'image_cache');
|
||||
const PAIR_ONLY = args.includes('--pair-only');
|
||||
const WHATSAPP_MODE = getArg('mode', process.env.WHATSAPP_MODE || 'self-chat'); // "bot" or "self-chat"
|
||||
const ALLOWED_USERS = (process.env.WHATSAPP_ALLOWED_USERS || '').split(',').map(s => s.trim()).filter(Boolean);
|
||||
@@ -55,6 +57,22 @@ function formatOutgoingMessage(message) {
|
||||
|
||||
mkdirSync(SESSION_DIR, { recursive: true });
|
||||
|
||||
// Build LID → phone reverse map from session files (lid-mapping-{phone}.json)
|
||||
function buildLidMap() {
|
||||
const map = {};
|
||||
try {
|
||||
for (const f of readdirSync(SESSION_DIR)) {
|
||||
const m = f.match(/^lid-mapping-(\d+)\.json$/);
|
||||
if (!m) continue;
|
||||
const phone = m[1];
|
||||
const lid = JSON.parse(readFileSync(path.join(SESSION_DIR, f), 'utf8'));
|
||||
if (lid) map[String(lid)] = phone;
|
||||
}
|
||||
} catch {}
|
||||
return map;
|
||||
}
|
||||
let lidToPhone = buildLidMap();
|
||||
|
||||
const logger = pino({ level: 'warn' });
|
||||
|
||||
// Message queue for polling
|
||||
@@ -80,9 +98,16 @@ async function startSocket() {
|
||||
browser: ['Hermes Agent', 'Chrome', '120.0'],
|
||||
syncFullHistory: false,
|
||||
markOnlineOnConnect: false,
|
||||
// Required for Baileys 7.x: without this, incoming messages that need
|
||||
// E2EE session re-establishment are silently dropped (msg.message === null)
|
||||
getMessage: async (key) => {
|
||||
// We don't maintain a message store, so return a placeholder.
|
||||
// This is enough for Baileys to complete the retry handshake.
|
||||
return { conversation: '' };
|
||||
},
|
||||
});
|
||||
|
||||
sock.ev.on('creds.update', saveCreds);
|
||||
sock.ev.on('creds.update', () => { saveCreds(); lidToPhone = buildLidMap(); });
|
||||
|
||||
sock.ev.on('connection.update', (update) => {
|
||||
const { connection, lastDisconnect, qr } = update;
|
||||
@@ -120,7 +145,7 @@ async function startSocket() {
|
||||
}
|
||||
});
|
||||
|
||||
sock.ev.on('messages.upsert', ({ messages, type }) => {
|
||||
sock.ev.on('messages.upsert', async ({ messages, type }) => {
|
||||
// In self-chat mode, your own messages commonly arrive as 'append' rather
|
||||
// than 'notify'. Accept both and filter agent echo-backs below.
|
||||
if (type !== 'notify' && type !== 'append') return;
|
||||
@@ -163,9 +188,10 @@ async function startSocket() {
|
||||
if (!isSelfChat) continue;
|
||||
}
|
||||
|
||||
// Check allowlist for messages from others
|
||||
if (!msg.key.fromMe && ALLOWED_USERS.length > 0 && !ALLOWED_USERS.includes(senderNumber)) {
|
||||
continue;
|
||||
// Check allowlist for messages from others (resolve LID → phone if needed)
|
||||
if (!msg.key.fromMe && ALLOWED_USERS.length > 0) {
|
||||
const resolvedNumber = lidToPhone[senderNumber] || senderNumber;
|
||||
if (!ALLOWED_USERS.includes(resolvedNumber)) continue;
|
||||
}
|
||||
|
||||
// Extract message body
|
||||
@@ -182,6 +208,18 @@ async function startSocket() {
|
||||
body = msg.message.imageMessage.caption || '';
|
||||
hasMedia = true;
|
||||
mediaType = 'image';
|
||||
try {
|
||||
const buf = await downloadMediaMessage(msg, 'buffer', {}, { logger, reuploadRequest: sock.updateMediaMessage });
|
||||
const mime = msg.message.imageMessage.mimetype || 'image/jpeg';
|
||||
const extMap = { 'image/jpeg': '.jpg', 'image/png': '.png', 'image/webp': '.webp', 'image/gif': '.gif' };
|
||||
const ext = extMap[mime] || '.jpg';
|
||||
mkdirSync(IMAGE_CACHE_DIR, { recursive: true });
|
||||
const filePath = path.join(IMAGE_CACHE_DIR, `img_${randomBytes(6).toString('hex')}${ext}`);
|
||||
writeFileSync(filePath, buf);
|
||||
mediaUrls.push(filePath);
|
||||
} catch (err) {
|
||||
console.error('[bridge] Failed to download image:', err.message);
|
||||
}
|
||||
} else if (msg.message.videoMessage) {
|
||||
body = msg.message.videoMessage.caption || '';
|
||||
hasMedia = true;
|
||||
@@ -195,6 +233,11 @@ async function startSocket() {
|
||||
mediaType = 'document';
|
||||
}
|
||||
|
||||
// For media without caption, use a placeholder so the API message is never empty
|
||||
if (hasMedia && !body) {
|
||||
body = `[${mediaType} received]`;
|
||||
}
|
||||
|
||||
// Ignore Hermes' own reply messages in self-chat mode to avoid loops.
|
||||
if (msg.key.fromMe && ((REPLY_PREFIX && body.startsWith(REPLY_PREFIX)) || recentlySentIds.has(msg.key.id))) {
|
||||
if (WHATSAPP_DEBUG) {
|
||||
@@ -433,7 +476,7 @@ if (PAIR_ONLY) {
|
||||
console.log();
|
||||
startSocket();
|
||||
} else {
|
||||
app.listen(PORT, () => {
|
||||
app.listen(PORT, '127.0.0.1', () => {
|
||||
console.log(`🌉 WhatsApp bridge listening on port ${PORT} (mode: ${WHATSAPP_MODE})`);
|
||||
console.log(`📁 Session stored in: ${SESSION_DIR}`);
|
||||
if (ALLOWED_USERS.length > 0) {
|
||||
|
||||
@@ -16,7 +16,7 @@ Use this skill when a user asks about configuring Hermes, enabling features, set
|
||||
- API keys: `~/.hermes/.env`
|
||||
- Skills: `~/.hermes/skills/`
|
||||
- Hermes install: `~/.hermes/hermes-agent/`
|
||||
- Venv: `~/.hermes/hermes-agent/.venv/` (or `venv/`)
|
||||
- Venv: `~/.hermes/hermes-agent/venv/`
|
||||
|
||||
## CLI Overview
|
||||
|
||||
@@ -98,7 +98,7 @@ The interactive setup wizard walks through:
|
||||
Run it from terminal:
|
||||
```bash
|
||||
cd ~/.hermes/hermes-agent
|
||||
source .venv/bin/activate
|
||||
source venv/bin/activate
|
||||
python -m hermes_cli.main setup
|
||||
```
|
||||
|
||||
@@ -140,7 +140,7 @@ Voice messages from Telegram/Discord/WhatsApp/Slack/Signal are auto-transcribed
|
||||
|
||||
```bash
|
||||
cd ~/.hermes/hermes-agent
|
||||
source .venv/bin/activate # or: source venv/bin/activate
|
||||
source venv/bin/activate
|
||||
pip install faster-whisper
|
||||
```
|
||||
|
||||
@@ -189,7 +189,7 @@ Hermes can reply with voice when users send voice messages.
|
||||
|
||||
```bash
|
||||
cd ~/.hermes/hermes-agent
|
||||
source .venv/bin/activate
|
||||
source venv/bin/activate
|
||||
python -m hermes_cli.main tools
|
||||
```
|
||||
|
||||
@@ -217,7 +217,7 @@ Use `/reset` in the chat to start a fresh session with the new toolset. Tool cha
|
||||
Some tools need extra packages:
|
||||
|
||||
```bash
|
||||
cd ~/.hermes/hermes-agent && source .venv/bin/activate
|
||||
cd ~/.hermes/hermes-agent && source venv/bin/activate
|
||||
|
||||
pip install faster-whisper # Local STT (voice transcription)
|
||||
pip install browserbase # Browser automation
|
||||
|
||||
@@ -0,0 +1,80 @@
|
||||
---
|
||||
name: huggingface-hub
|
||||
description: Hugging Face Hub CLI (hf) — search, download, and upload models and datasets, manage repos, query datasets with SQL, deploy inference endpoints, manage Spaces and buckets.
|
||||
version: 1.0.0
|
||||
author: Hugging Face
|
||||
license: MIT
|
||||
tags: [huggingface, hf, models, datasets, hub, mlops]
|
||||
---
|
||||
|
||||
# Hugging Face CLI (`hf`) Reference Guide
|
||||
|
||||
The `hf` command is the modern command-line interface for interacting with the Hugging Face Hub, providing tools to manage repositories, models, datasets, and Spaces.
|
||||
|
||||
> **IMPORTANT:** The `hf` command replaces the now deprecated `huggingface-cli` command.
|
||||
|
||||
## Quick Start
|
||||
* **Installation:** `curl -LsSf https://hf.co/cli/install.sh | bash -s`
|
||||
* **Help:** Use `hf --help` to view all available functions and real-world examples.
|
||||
* **Authentication:** Recommended via `HF_TOKEN` environment variable or the `--token` flag.
|
||||
|
||||
---
|
||||
|
||||
## Core Commands
|
||||
|
||||
### General Operations
|
||||
* `hf download REPO_ID`: Download files from the Hub.
|
||||
* `hf upload REPO_ID`: Upload files/folders (recommended for single-commit).
|
||||
* `hf upload-large-folder REPO_ID LOCAL_PATH`: Recommended for resumable uploads of large directories.
|
||||
* `hf sync`: Sync files between a local directory and a bucket.
|
||||
* `hf env` / `hf version`: View environment and version details.
|
||||
|
||||
### Authentication (`hf auth`)
|
||||
* `login` / `logout`: Manage sessions using tokens from [huggingface.co/settings/tokens](https://huggingface.co/settings/tokens).
|
||||
* `list` / `switch`: Manage and toggle between multiple stored access tokens.
|
||||
* `whoami`: Identify the currently logged-in account.
|
||||
|
||||
### Repository Management (`hf repos`)
|
||||
* `create` / `delete`: Create or permanently remove repositories.
|
||||
* `duplicate`: Clone a model, dataset, or Space to a new ID.
|
||||
* `move`: Transfer a repository between namespaces.
|
||||
* `branch` / `tag`: Manage Git-like references.
|
||||
* `delete-files`: Remove specific files using patterns.
|
||||
|
||||
---
|
||||
|
||||
## Specialized Hub Interactions
|
||||
|
||||
### Datasets & Models
|
||||
* **Datasets:** `hf datasets list`, `info`, and `parquet` (list parquet URLs).
|
||||
* **SQL Queries:** `hf datasets sql SQL` — Execute raw SQL via DuckDB against dataset parquet URLs.
|
||||
* **Models:** `hf models list` and `info`.
|
||||
* **Papers:** `hf papers list` — View daily papers.
|
||||
|
||||
### Discussions & Pull Requests (`hf discussions`)
|
||||
* Manage the lifecycle of Hub contributions: `list`, `create`, `info`, `comment`, `close`, `reopen`, and `rename`.
|
||||
* `diff`: View changes in a PR.
|
||||
* `merge`: Finalize pull requests.
|
||||
|
||||
### Infrastructure & Compute
|
||||
* **Endpoints:** Deploy and manage Inference Endpoints (`deploy`, `pause`, `resume`, `scale-to-zero`, `catalog`).
|
||||
* **Jobs:** Run compute tasks on HF infrastructure. Includes `hf jobs uv` for running Python scripts with inline dependencies and `stats` for resource monitoring.
|
||||
* **Spaces:** Manage interactive apps. Includes `dev-mode` and `hot-reload` for Python files without full restarts.
|
||||
|
||||
### Storage & Automation
|
||||
* **Buckets:** Full S3-like bucket management (`create`, `cp`, `mv`, `rm`, `sync`).
|
||||
* **Cache:** Manage local storage with `list`, `prune` (remove detached revisions), and `verify` (checksum checks).
|
||||
* **Webhooks:** Automate workflows by managing Hub webhooks (`create`, `watch`, `enable`/`disable`).
|
||||
* **Collections:** Organize Hub items into collections (`add-item`, `update`, `list`).
|
||||
|
||||
---
|
||||
|
||||
## Advanced Usage & Tips
|
||||
|
||||
### Global Flags
|
||||
* `--format json`: Produces machine-readable output for automation.
|
||||
* `-q` / `--quiet`: Limits output to IDs only.
|
||||
|
||||
### Extensions & Skills
|
||||
* **Extensions:** Extend CLI functionality via GitHub repositories using `hf extensions install REPO_ID`.
|
||||
* **Skills:** Manage AI assistant skills with `hf skills add`.
|
||||
@@ -12,7 +12,7 @@ training server.
|
||||
|
||||
```bash
|
||||
cd ~/.hermes/hermes-agent
|
||||
source .venv/bin/activate
|
||||
source venv/bin/activate
|
||||
|
||||
python environments/your_env.py process \
|
||||
--env.total_steps 1 \
|
||||
|
||||
+172
-1
@@ -1,15 +1,21 @@
|
||||
"""Tests for acp_adapter.session — SessionManager and SessionState."""
|
||||
|
||||
import json
|
||||
import pytest
|
||||
from unittest.mock import MagicMock
|
||||
|
||||
from acp_adapter.session import SessionManager, SessionState
|
||||
from hermes_state import SessionDB
|
||||
|
||||
|
||||
def _mock_agent():
|
||||
return MagicMock(name="MockAIAgent")
|
||||
|
||||
|
||||
@pytest.fixture()
|
||||
def manager():
|
||||
"""SessionManager with a mock agent factory (avoids needing API keys)."""
|
||||
return SessionManager(agent_factory=lambda: MagicMock(name="MockAIAgent"))
|
||||
return SessionManager(agent_factory=_mock_agent)
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
@@ -110,3 +116,168 @@ class TestListAndCleanup:
|
||||
assert manager.get_session(state.session_id) is None
|
||||
# Removing again returns False
|
||||
assert manager.remove_session(state.session_id) is False
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# persistence — sessions survive process restarts (via SessionDB)
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
class TestPersistence:
|
||||
"""Verify that sessions are persisted to SessionDB and can be restored."""
|
||||
|
||||
def test_create_session_writes_to_db(self, manager):
|
||||
state = manager.create_session(cwd="/project")
|
||||
db = manager._get_db()
|
||||
assert db is not None
|
||||
row = db.get_session(state.session_id)
|
||||
assert row is not None
|
||||
assert row["source"] == "acp"
|
||||
# cwd stored in model_config JSON
|
||||
mc = json.loads(row["model_config"])
|
||||
assert mc["cwd"] == "/project"
|
||||
|
||||
def test_get_session_restores_from_db(self, manager):
|
||||
"""Simulate process restart: create session, drop from memory, get again."""
|
||||
state = manager.create_session(cwd="/work")
|
||||
state.history.append({"role": "user", "content": "hello"})
|
||||
state.history.append({"role": "assistant", "content": "hi there"})
|
||||
manager.save_session(state.session_id)
|
||||
|
||||
sid = state.session_id
|
||||
|
||||
# Drop from in-memory store (simulates process restart).
|
||||
with manager._lock:
|
||||
del manager._sessions[sid]
|
||||
|
||||
# get_session should transparently restore from DB.
|
||||
restored = manager.get_session(sid)
|
||||
assert restored is not None
|
||||
assert restored.session_id == sid
|
||||
assert restored.cwd == "/work"
|
||||
assert len(restored.history) == 2
|
||||
assert restored.history[0]["content"] == "hello"
|
||||
assert restored.history[1]["content"] == "hi there"
|
||||
# Agent should have been recreated.
|
||||
assert restored.agent is not None
|
||||
|
||||
def test_save_session_updates_db(self, manager):
|
||||
state = manager.create_session()
|
||||
state.history.append({"role": "user", "content": "test"})
|
||||
manager.save_session(state.session_id)
|
||||
|
||||
db = manager._get_db()
|
||||
messages = db.get_messages_as_conversation(state.session_id)
|
||||
assert len(messages) == 1
|
||||
assert messages[0]["content"] == "test"
|
||||
|
||||
def test_remove_session_deletes_from_db(self, manager):
|
||||
state = manager.create_session()
|
||||
db = manager._get_db()
|
||||
assert db.get_session(state.session_id) is not None
|
||||
manager.remove_session(state.session_id)
|
||||
assert db.get_session(state.session_id) is None
|
||||
|
||||
def test_cleanup_removes_all_from_db(self, manager):
|
||||
s1 = manager.create_session()
|
||||
s2 = manager.create_session()
|
||||
db = manager._get_db()
|
||||
assert db.get_session(s1.session_id) is not None
|
||||
assert db.get_session(s2.session_id) is not None
|
||||
manager.cleanup()
|
||||
assert db.get_session(s1.session_id) is None
|
||||
assert db.get_session(s2.session_id) is None
|
||||
|
||||
def test_list_sessions_includes_db_only(self, manager):
|
||||
"""Sessions only in DB (not in memory) appear in list_sessions."""
|
||||
state = manager.create_session(cwd="/db-only")
|
||||
sid = state.session_id
|
||||
|
||||
# Drop from memory.
|
||||
with manager._lock:
|
||||
del manager._sessions[sid]
|
||||
|
||||
listing = manager.list_sessions()
|
||||
ids = {s["session_id"] for s in listing}
|
||||
assert sid in ids
|
||||
|
||||
def test_fork_restores_source_from_db(self, manager):
|
||||
"""Forking a session that is only in DB should work."""
|
||||
original = manager.create_session()
|
||||
original.history.append({"role": "user", "content": "context"})
|
||||
manager.save_session(original.session_id)
|
||||
|
||||
# Drop original from memory.
|
||||
with manager._lock:
|
||||
del manager._sessions[original.session_id]
|
||||
|
||||
forked = manager.fork_session(original.session_id, cwd="/fork")
|
||||
assert forked is not None
|
||||
assert len(forked.history) == 1
|
||||
assert forked.history[0]["content"] == "context"
|
||||
assert forked.session_id != original.session_id
|
||||
|
||||
def test_update_cwd_restores_from_db(self, manager):
|
||||
state = manager.create_session(cwd="/old")
|
||||
sid = state.session_id
|
||||
|
||||
with manager._lock:
|
||||
del manager._sessions[sid]
|
||||
|
||||
updated = manager.update_cwd(sid, "/new")
|
||||
assert updated is not None
|
||||
assert updated.cwd == "/new"
|
||||
|
||||
# Should also be persisted in DB.
|
||||
db = manager._get_db()
|
||||
row = db.get_session(sid)
|
||||
mc = json.loads(row["model_config"])
|
||||
assert mc["cwd"] == "/new"
|
||||
|
||||
def test_only_restores_acp_sessions(self, manager):
|
||||
"""get_session should not restore non-ACP sessions from DB."""
|
||||
db = manager._get_db()
|
||||
# Manually create a CLI session in the DB.
|
||||
db.create_session(session_id="cli-session-123", source="cli", model="test")
|
||||
# Should not be found via ACP SessionManager.
|
||||
assert manager.get_session("cli-session-123") is None
|
||||
|
||||
def test_sessions_searchable_via_fts(self, manager):
|
||||
"""ACP sessions stored in SessionDB are searchable via FTS5."""
|
||||
state = manager.create_session()
|
||||
state.history.append({"role": "user", "content": "how do I configure nginx"})
|
||||
state.history.append({"role": "assistant", "content": "Here is the nginx config..."})
|
||||
manager.save_session(state.session_id)
|
||||
|
||||
db = manager._get_db()
|
||||
results = db.search_messages("nginx")
|
||||
assert len(results) > 0
|
||||
session_ids = {r["session_id"] for r in results}
|
||||
assert state.session_id in session_ids
|
||||
|
||||
def test_tool_calls_persisted(self, manager):
|
||||
"""Messages with tool_calls should round-trip through the DB."""
|
||||
state = manager.create_session()
|
||||
state.history.append({
|
||||
"role": "assistant",
|
||||
"content": None,
|
||||
"tool_calls": [{"id": "tc_1", "type": "function",
|
||||
"function": {"name": "terminal", "arguments": "{}"}}],
|
||||
})
|
||||
state.history.append({
|
||||
"role": "tool",
|
||||
"content": "output here",
|
||||
"tool_call_id": "tc_1",
|
||||
"name": "terminal",
|
||||
})
|
||||
manager.save_session(state.session_id)
|
||||
|
||||
# Drop from memory, restore from DB.
|
||||
with manager._lock:
|
||||
del manager._sessions[state.session_id]
|
||||
|
||||
restored = manager.get_session(state.session_id)
|
||||
assert restored is not None
|
||||
assert len(restored.history) == 2
|
||||
assert restored.history[0].get("tool_calls") is not None
|
||||
assert restored.history[1].get("tool_call_id") == "tc_1"
|
||||
|
||||
@@ -248,6 +248,31 @@ class TestVisionClientFallback:
|
||||
assert client.__class__.__name__ == "AnthropicAuxiliaryClient"
|
||||
assert model == "claude-haiku-4-5-20251001"
|
||||
|
||||
def test_resolve_provider_client_copilot_uses_runtime_credentials(self, monkeypatch):
|
||||
monkeypatch.delenv("GITHUB_TOKEN", raising=False)
|
||||
monkeypatch.delenv("GH_TOKEN", raising=False)
|
||||
|
||||
with (
|
||||
patch(
|
||||
"hermes_cli.auth.resolve_api_key_provider_credentials",
|
||||
return_value={
|
||||
"provider": "copilot",
|
||||
"api_key": "gh-cli-token",
|
||||
"base_url": "https://api.githubcopilot.com",
|
||||
"source": "gh auth token",
|
||||
},
|
||||
),
|
||||
patch("agent.auxiliary_client.OpenAI") as mock_openai,
|
||||
):
|
||||
client, model = resolve_provider_client("copilot", model="gpt-5.4")
|
||||
|
||||
assert client is not None
|
||||
assert model == "gpt-5.4"
|
||||
call_kwargs = mock_openai.call_args.kwargs
|
||||
assert call_kwargs["api_key"] == "gh-cli-token"
|
||||
assert call_kwargs["base_url"] == "https://api.githubcopilot.com"
|
||||
assert call_kwargs["default_headers"]["Editor-Version"]
|
||||
|
||||
def test_vision_auto_uses_anthropic_when_no_higher_priority_backend(self, monkeypatch):
|
||||
monkeypatch.setenv("ANTHROPIC_API_KEY", "sk-ant-api03-key")
|
||||
with (
|
||||
|
||||
@@ -22,6 +22,7 @@ from unittest.mock import patch, MagicMock
|
||||
from agent.model_metadata import (
|
||||
CONTEXT_PROBE_TIERS,
|
||||
DEFAULT_CONTEXT_LENGTHS,
|
||||
_strip_provider_prefix,
|
||||
estimate_tokens_rough,
|
||||
estimate_messages_tokens_rough,
|
||||
get_model_context_length,
|
||||
@@ -105,9 +106,14 @@ class TestEstimateMessagesTokensRough:
|
||||
# =========================================================================
|
||||
|
||||
class TestDefaultContextLengths:
|
||||
def test_claude_models_200k(self):
|
||||
def test_claude_models_context_lengths(self):
|
||||
for key, value in DEFAULT_CONTEXT_LENGTHS.items():
|
||||
if "claude" in key:
|
||||
if "claude" not in key:
|
||||
continue
|
||||
# Claude 4.6 models have 1M context
|
||||
if "4.6" in key or "4-6" in key:
|
||||
assert value == 1000000, f"{key} should be 1000000"
|
||||
else:
|
||||
assert value == 200000, f"{key} should be 200000"
|
||||
|
||||
def test_gpt4_models_128k_or_1m(self):
|
||||
@@ -218,6 +224,122 @@ class TestGetModelContextLength:
|
||||
|
||||
assert result == CONTEXT_PROBE_TIERS[0]
|
||||
|
||||
@patch("agent.model_metadata.fetch_model_metadata")
|
||||
@patch("agent.model_metadata.fetch_endpoint_model_metadata")
|
||||
def test_custom_endpoint_single_model_fallback(self, mock_endpoint_fetch, mock_fetch):
|
||||
"""Single-model servers: use the only model even if name doesn't match."""
|
||||
mock_fetch.return_value = {}
|
||||
mock_endpoint_fetch.return_value = {
|
||||
"Qwen3.5-9B-Q4_K_M.gguf": {"context_length": 131072}
|
||||
}
|
||||
|
||||
result = get_model_context_length(
|
||||
"qwen3.5:9b",
|
||||
base_url="http://myserver.example.com:8080/v1",
|
||||
api_key="test-key",
|
||||
)
|
||||
|
||||
assert result == 131072
|
||||
|
||||
@patch("agent.model_metadata.fetch_model_metadata")
|
||||
@patch("agent.model_metadata.fetch_endpoint_model_metadata")
|
||||
def test_custom_endpoint_fuzzy_substring_match(self, mock_endpoint_fetch, mock_fetch):
|
||||
"""Fuzzy match: configured model name is substring of endpoint model."""
|
||||
mock_fetch.return_value = {}
|
||||
mock_endpoint_fetch.return_value = {
|
||||
"org/llama-3.3-70b-instruct-fp8": {"context_length": 131072},
|
||||
"org/qwen-2.5-72b": {"context_length": 32768},
|
||||
}
|
||||
|
||||
result = get_model_context_length(
|
||||
"llama-3.3-70b-instruct",
|
||||
base_url="http://myserver.example.com:8080/v1",
|
||||
api_key="test-key",
|
||||
)
|
||||
|
||||
assert result == 131072
|
||||
|
||||
@patch("agent.model_metadata.fetch_model_metadata")
|
||||
def test_config_context_length_overrides_all(self, mock_fetch):
|
||||
"""Explicit config_context_length takes priority over everything."""
|
||||
mock_fetch.return_value = {
|
||||
"test/model": {"context_length": 200000}
|
||||
}
|
||||
|
||||
result = get_model_context_length(
|
||||
"test/model",
|
||||
config_context_length=65536,
|
||||
)
|
||||
|
||||
assert result == 65536
|
||||
|
||||
@patch("agent.model_metadata.fetch_model_metadata")
|
||||
def test_config_context_length_zero_is_ignored(self, mock_fetch):
|
||||
"""config_context_length=0 should be treated as unset."""
|
||||
mock_fetch.return_value = {}
|
||||
|
||||
result = get_model_context_length(
|
||||
"anthropic/claude-sonnet-4",
|
||||
config_context_length=0,
|
||||
)
|
||||
|
||||
assert result == 200000
|
||||
|
||||
@patch("agent.model_metadata.fetch_model_metadata")
|
||||
def test_config_context_length_none_is_ignored(self, mock_fetch):
|
||||
"""config_context_length=None should be treated as unset."""
|
||||
mock_fetch.return_value = {}
|
||||
|
||||
result = get_model_context_length(
|
||||
"anthropic/claude-sonnet-4",
|
||||
config_context_length=None,
|
||||
)
|
||||
|
||||
assert result == 200000
|
||||
|
||||
|
||||
# =========================================================================
|
||||
# _strip_provider_prefix — Ollama model:tag vs provider:model
|
||||
# =========================================================================
|
||||
|
||||
class TestStripProviderPrefix:
|
||||
def test_known_provider_prefix_is_stripped(self):
|
||||
assert _strip_provider_prefix("local:my-model") == "my-model"
|
||||
assert _strip_provider_prefix("openrouter:anthropic/claude-sonnet-4") == "anthropic/claude-sonnet-4"
|
||||
assert _strip_provider_prefix("anthropic:claude-sonnet-4") == "claude-sonnet-4"
|
||||
|
||||
def test_ollama_model_tag_preserved(self):
|
||||
"""Ollama model:tag format must NOT be stripped."""
|
||||
assert _strip_provider_prefix("qwen3.5:27b") == "qwen3.5:27b"
|
||||
assert _strip_provider_prefix("llama3.3:70b") == "llama3.3:70b"
|
||||
assert _strip_provider_prefix("gemma2:9b") == "gemma2:9b"
|
||||
assert _strip_provider_prefix("codellama:13b-instruct-q4_0") == "codellama:13b-instruct-q4_0"
|
||||
|
||||
def test_http_urls_preserved(self):
|
||||
assert _strip_provider_prefix("http://example.com") == "http://example.com"
|
||||
assert _strip_provider_prefix("https://example.com") == "https://example.com"
|
||||
|
||||
def test_no_colon_returns_unchanged(self):
|
||||
assert _strip_provider_prefix("gpt-4o") == "gpt-4o"
|
||||
assert _strip_provider_prefix("anthropic/claude-sonnet-4") == "anthropic/claude-sonnet-4"
|
||||
|
||||
@patch("agent.model_metadata.fetch_model_metadata")
|
||||
def test_ollama_model_tag_not_mangled_in_context_lookup(self, mock_fetch):
|
||||
"""Ensure 'qwen3.5:27b' is NOT reduced to '27b' during context length lookup.
|
||||
|
||||
We mock a custom endpoint that knows 'qwen3.5:27b' — the full name
|
||||
must reach the endpoint metadata lookup intact.
|
||||
"""
|
||||
mock_fetch.return_value = {}
|
||||
with patch("agent.model_metadata.fetch_endpoint_model_metadata") as mock_ep, \
|
||||
patch("agent.model_metadata._is_custom_endpoint", return_value=True):
|
||||
mock_ep.return_value = {"qwen3.5:27b": {"context_length": 32768}}
|
||||
result = get_model_context_length(
|
||||
"qwen3.5:27b",
|
||||
base_url="http://localhost:11434/v1",
|
||||
)
|
||||
assert result == 32768
|
||||
|
||||
|
||||
# =========================================================================
|
||||
# fetch_model_metadata — caching, TTL, slugs, failures
|
||||
@@ -350,35 +472,35 @@ class TestContextProbeTiers:
|
||||
for i in range(len(CONTEXT_PROBE_TIERS) - 1):
|
||||
assert CONTEXT_PROBE_TIERS[i] > CONTEXT_PROBE_TIERS[i + 1]
|
||||
|
||||
def test_first_tier_is_2m(self):
|
||||
assert CONTEXT_PROBE_TIERS[0] == 2_000_000
|
||||
def test_first_tier_is_128k(self):
|
||||
assert CONTEXT_PROBE_TIERS[0] == 128_000
|
||||
|
||||
def test_last_tier_is_32k(self):
|
||||
assert CONTEXT_PROBE_TIERS[-1] == 32_000
|
||||
def test_last_tier_is_8k(self):
|
||||
assert CONTEXT_PROBE_TIERS[-1] == 8_000
|
||||
|
||||
|
||||
class TestGetNextProbeTier:
|
||||
def test_from_2m(self):
|
||||
assert get_next_probe_tier(2_000_000) == 1_000_000
|
||||
|
||||
def test_from_1m(self):
|
||||
assert get_next_probe_tier(1_000_000) == 512_000
|
||||
|
||||
def test_from_128k(self):
|
||||
assert get_next_probe_tier(128_000) == 64_000
|
||||
|
||||
def test_from_32k_returns_none(self):
|
||||
assert get_next_probe_tier(32_000) is None
|
||||
def test_from_64k(self):
|
||||
assert get_next_probe_tier(64_000) == 32_000
|
||||
|
||||
def test_from_32k(self):
|
||||
assert get_next_probe_tier(32_000) == 16_000
|
||||
|
||||
def test_from_8k_returns_none(self):
|
||||
assert get_next_probe_tier(8_000) is None
|
||||
|
||||
def test_from_below_min_returns_none(self):
|
||||
assert get_next_probe_tier(16_000) is None
|
||||
assert get_next_probe_tier(4_000) is None
|
||||
|
||||
def test_from_arbitrary_value(self):
|
||||
assert get_next_probe_tier(300_000) == 200_000
|
||||
assert get_next_probe_tier(100_000) == 64_000
|
||||
|
||||
def test_above_max_tier(self):
|
||||
"""Value above 2M should return 2M."""
|
||||
assert get_next_probe_tier(5_000_000) == 2_000_000
|
||||
"""Value above 128K should return 128K."""
|
||||
assert get_next_probe_tier(500_000) == 128_000
|
||||
|
||||
def test_zero_returns_none(self):
|
||||
assert get_next_probe_tier(0) is None
|
||||
|
||||
@@ -0,0 +1,197 @@
|
||||
"""Tests for agent.models_dev — models.dev registry integration."""
|
||||
import json
|
||||
from unittest.mock import patch, MagicMock
|
||||
|
||||
import pytest
|
||||
from agent.models_dev import (
|
||||
PROVIDER_TO_MODELS_DEV,
|
||||
_extract_context,
|
||||
fetch_models_dev,
|
||||
lookup_models_dev_context,
|
||||
)
|
||||
|
||||
|
||||
SAMPLE_REGISTRY = {
|
||||
"anthropic": {
|
||||
"id": "anthropic",
|
||||
"name": "Anthropic",
|
||||
"models": {
|
||||
"claude-opus-4-6": {
|
||||
"id": "claude-opus-4-6",
|
||||
"limit": {"context": 1000000, "output": 128000},
|
||||
},
|
||||
"claude-sonnet-4-6": {
|
||||
"id": "claude-sonnet-4-6",
|
||||
"limit": {"context": 1000000, "output": 64000},
|
||||
},
|
||||
"claude-sonnet-4-0": {
|
||||
"id": "claude-sonnet-4-0",
|
||||
"limit": {"context": 200000, "output": 64000},
|
||||
},
|
||||
},
|
||||
},
|
||||
"github-copilot": {
|
||||
"id": "github-copilot",
|
||||
"name": "GitHub Copilot",
|
||||
"models": {
|
||||
"claude-opus-4.6": {
|
||||
"id": "claude-opus-4.6",
|
||||
"limit": {"context": 128000, "output": 32000},
|
||||
},
|
||||
},
|
||||
},
|
||||
"kilo": {
|
||||
"id": "kilo",
|
||||
"name": "Kilo Gateway",
|
||||
"models": {
|
||||
"anthropic/claude-sonnet-4.6": {
|
||||
"id": "anthropic/claude-sonnet-4.6",
|
||||
"limit": {"context": 1000000, "output": 128000},
|
||||
},
|
||||
},
|
||||
},
|
||||
"deepseek": {
|
||||
"id": "deepseek",
|
||||
"name": "DeepSeek",
|
||||
"models": {
|
||||
"deepseek-chat": {
|
||||
"id": "deepseek-chat",
|
||||
"limit": {"context": 128000, "output": 8192},
|
||||
},
|
||||
},
|
||||
},
|
||||
"audio-only": {
|
||||
"id": "audio-only",
|
||||
"models": {
|
||||
"tts-model": {
|
||||
"id": "tts-model",
|
||||
"limit": {"context": 0, "output": 0},
|
||||
},
|
||||
},
|
||||
},
|
||||
}
|
||||
|
||||
|
||||
class TestProviderMapping:
|
||||
def test_all_mapped_providers_are_strings(self):
|
||||
for hermes_id, mdev_id in PROVIDER_TO_MODELS_DEV.items():
|
||||
assert isinstance(hermes_id, str)
|
||||
assert isinstance(mdev_id, str)
|
||||
|
||||
def test_known_providers_mapped(self):
|
||||
assert PROVIDER_TO_MODELS_DEV["anthropic"] == "anthropic"
|
||||
assert PROVIDER_TO_MODELS_DEV["copilot"] == "github-copilot"
|
||||
assert PROVIDER_TO_MODELS_DEV["kilocode"] == "kilo"
|
||||
assert PROVIDER_TO_MODELS_DEV["ai-gateway"] == "vercel"
|
||||
|
||||
def test_unmapped_provider_not_in_dict(self):
|
||||
assert "nous" not in PROVIDER_TO_MODELS_DEV
|
||||
assert "openai-codex" not in PROVIDER_TO_MODELS_DEV
|
||||
|
||||
|
||||
class TestExtractContext:
|
||||
def test_valid_entry(self):
|
||||
assert _extract_context({"limit": {"context": 128000}}) == 128000
|
||||
|
||||
def test_zero_context_returns_none(self):
|
||||
assert _extract_context({"limit": {"context": 0}}) is None
|
||||
|
||||
def test_missing_limit_returns_none(self):
|
||||
assert _extract_context({"id": "test"}) is None
|
||||
|
||||
def test_missing_context_returns_none(self):
|
||||
assert _extract_context({"limit": {"output": 8192}}) is None
|
||||
|
||||
def test_non_dict_returns_none(self):
|
||||
assert _extract_context("not a dict") is None
|
||||
|
||||
def test_float_context_coerced_to_int(self):
|
||||
assert _extract_context({"limit": {"context": 131072.0}}) == 131072
|
||||
|
||||
|
||||
class TestLookupModelsDevContext:
|
||||
@patch("agent.models_dev.fetch_models_dev")
|
||||
def test_exact_match(self, mock_fetch):
|
||||
mock_fetch.return_value = SAMPLE_REGISTRY
|
||||
assert lookup_models_dev_context("anthropic", "claude-opus-4-6") == 1000000
|
||||
|
||||
@patch("agent.models_dev.fetch_models_dev")
|
||||
def test_case_insensitive_match(self, mock_fetch):
|
||||
mock_fetch.return_value = SAMPLE_REGISTRY
|
||||
assert lookup_models_dev_context("anthropic", "Claude-Opus-4-6") == 1000000
|
||||
|
||||
@patch("agent.models_dev.fetch_models_dev")
|
||||
def test_provider_not_mapped(self, mock_fetch):
|
||||
mock_fetch.return_value = SAMPLE_REGISTRY
|
||||
assert lookup_models_dev_context("nous", "some-model") is None
|
||||
|
||||
@patch("agent.models_dev.fetch_models_dev")
|
||||
def test_model_not_found(self, mock_fetch):
|
||||
mock_fetch.return_value = SAMPLE_REGISTRY
|
||||
assert lookup_models_dev_context("anthropic", "nonexistent-model") is None
|
||||
|
||||
@patch("agent.models_dev.fetch_models_dev")
|
||||
def test_provider_aware_context(self, mock_fetch):
|
||||
"""Same model, different context per provider."""
|
||||
mock_fetch.return_value = SAMPLE_REGISTRY
|
||||
# Anthropic direct: 1M
|
||||
assert lookup_models_dev_context("anthropic", "claude-opus-4-6") == 1000000
|
||||
# GitHub Copilot: only 128K for same model
|
||||
assert lookup_models_dev_context("copilot", "claude-opus-4.6") == 128000
|
||||
|
||||
@patch("agent.models_dev.fetch_models_dev")
|
||||
def test_zero_context_filtered(self, mock_fetch):
|
||||
mock_fetch.return_value = SAMPLE_REGISTRY
|
||||
# audio-only is not a mapped provider, but test the filtering directly
|
||||
data = SAMPLE_REGISTRY["audio-only"]["models"]["tts-model"]
|
||||
assert _extract_context(data) is None
|
||||
|
||||
@patch("agent.models_dev.fetch_models_dev")
|
||||
def test_empty_registry(self, mock_fetch):
|
||||
mock_fetch.return_value = {}
|
||||
assert lookup_models_dev_context("anthropic", "claude-opus-4-6") is None
|
||||
|
||||
|
||||
class TestFetchModelsDev:
|
||||
@patch("agent.models_dev.requests.get")
|
||||
def test_fetch_success(self, mock_get):
|
||||
mock_resp = MagicMock()
|
||||
mock_resp.status_code = 200
|
||||
mock_resp.json.return_value = SAMPLE_REGISTRY
|
||||
mock_resp.raise_for_status = MagicMock()
|
||||
mock_get.return_value = mock_resp
|
||||
|
||||
# Clear caches
|
||||
import agent.models_dev as md
|
||||
md._models_dev_cache = {}
|
||||
md._models_dev_cache_time = 0
|
||||
|
||||
with patch.object(md, "_save_disk_cache"):
|
||||
result = fetch_models_dev(force_refresh=True)
|
||||
|
||||
assert "anthropic" in result
|
||||
assert len(result) == len(SAMPLE_REGISTRY)
|
||||
|
||||
@patch("agent.models_dev.requests.get")
|
||||
def test_fetch_failure_returns_stale_cache(self, mock_get):
|
||||
mock_get.side_effect = Exception("network error")
|
||||
|
||||
import agent.models_dev as md
|
||||
md._models_dev_cache = SAMPLE_REGISTRY
|
||||
md._models_dev_cache_time = 0 # expired
|
||||
|
||||
with patch.object(md, "_load_disk_cache", return_value=SAMPLE_REGISTRY):
|
||||
result = fetch_models_dev(force_refresh=True)
|
||||
|
||||
assert "anthropic" in result
|
||||
|
||||
@patch("agent.models_dev.requests.get")
|
||||
def test_in_memory_cache_used(self, mock_get):
|
||||
import agent.models_dev as md
|
||||
import time
|
||||
md._models_dev_cache = SAMPLE_REGISTRY
|
||||
md._models_dev_cache_time = time.time() # fresh
|
||||
|
||||
result = fetch_models_dev()
|
||||
mock_get.assert_not_called()
|
||||
assert result == SAMPLE_REGISTRY
|
||||
+80
-1
@@ -2,7 +2,7 @@
|
||||
|
||||
import json
|
||||
import pytest
|
||||
from datetime import datetime, timedelta
|
||||
from datetime import datetime, timedelta, timezone
|
||||
from pathlib import Path
|
||||
from unittest.mock import patch
|
||||
|
||||
@@ -122,11 +122,29 @@ class TestComputeNextRun:
|
||||
schedule = {"kind": "once", "run_at": future}
|
||||
assert compute_next_run(schedule) == future
|
||||
|
||||
def test_once_recent_past_within_grace_returns_time(self, monkeypatch):
|
||||
now = datetime(2026, 3, 18, 4, 22, 3, tzinfo=timezone.utc)
|
||||
run_at = "2026-03-18T04:22:00+00:00"
|
||||
monkeypatch.setattr("cron.jobs._hermes_now", lambda: now)
|
||||
|
||||
schedule = {"kind": "once", "run_at": run_at}
|
||||
|
||||
assert compute_next_run(schedule) == run_at
|
||||
|
||||
def test_once_past_returns_none(self):
|
||||
past = (datetime.now() - timedelta(hours=1)).isoformat()
|
||||
schedule = {"kind": "once", "run_at": past}
|
||||
assert compute_next_run(schedule) is None
|
||||
|
||||
def test_once_with_last_run_returns_none_even_within_grace(self, monkeypatch):
|
||||
now = datetime(2026, 3, 18, 4, 22, 3, tzinfo=timezone.utc)
|
||||
run_at = "2026-03-18T04:22:00+00:00"
|
||||
monkeypatch.setattr("cron.jobs._hermes_now", lambda: now)
|
||||
|
||||
schedule = {"kind": "once", "run_at": run_at}
|
||||
|
||||
assert compute_next_run(schedule, last_run_at=now.isoformat()) is None
|
||||
|
||||
def test_interval_first_run(self):
|
||||
schedule = {"kind": "interval", "minutes": 60}
|
||||
result = compute_next_run(schedule)
|
||||
@@ -347,6 +365,67 @@ class TestGetDueJobs:
|
||||
due = get_due_jobs()
|
||||
assert len(due) == 0
|
||||
|
||||
def test_broken_recent_one_shot_without_next_run_is_recovered(self, tmp_cron_dir, monkeypatch):
|
||||
now = datetime(2026, 3, 18, 4, 22, 30, tzinfo=timezone.utc)
|
||||
monkeypatch.setattr("cron.jobs._hermes_now", lambda: now)
|
||||
|
||||
run_at = "2026-03-18T04:22:00+00:00"
|
||||
save_jobs(
|
||||
[{
|
||||
"id": "oneshot-recover",
|
||||
"name": "Recover me",
|
||||
"prompt": "Word of the day",
|
||||
"schedule": {"kind": "once", "run_at": run_at, "display": "once at 2026-03-18 04:22"},
|
||||
"schedule_display": "once at 2026-03-18 04:22",
|
||||
"repeat": {"times": 1, "completed": 0},
|
||||
"enabled": True,
|
||||
"state": "scheduled",
|
||||
"paused_at": None,
|
||||
"paused_reason": None,
|
||||
"created_at": "2026-03-18T04:21:00+00:00",
|
||||
"next_run_at": None,
|
||||
"last_run_at": None,
|
||||
"last_status": None,
|
||||
"last_error": None,
|
||||
"deliver": "local",
|
||||
"origin": None,
|
||||
}]
|
||||
)
|
||||
|
||||
due = get_due_jobs()
|
||||
|
||||
assert [job["id"] for job in due] == ["oneshot-recover"]
|
||||
assert get_job("oneshot-recover")["next_run_at"] == run_at
|
||||
|
||||
def test_broken_stale_one_shot_without_next_run_is_not_recovered(self, tmp_cron_dir, monkeypatch):
|
||||
now = datetime(2026, 3, 18, 4, 30, 0, tzinfo=timezone.utc)
|
||||
monkeypatch.setattr("cron.jobs._hermes_now", lambda: now)
|
||||
|
||||
save_jobs(
|
||||
[{
|
||||
"id": "oneshot-stale",
|
||||
"name": "Too old",
|
||||
"prompt": "Word of the day",
|
||||
"schedule": {"kind": "once", "run_at": "2026-03-18T04:22:00+00:00", "display": "once at 2026-03-18 04:22"},
|
||||
"schedule_display": "once at 2026-03-18 04:22",
|
||||
"repeat": {"times": 1, "completed": 0},
|
||||
"enabled": True,
|
||||
"state": "scheduled",
|
||||
"paused_at": None,
|
||||
"paused_reason": None,
|
||||
"created_at": "2026-03-18T04:21:00+00:00",
|
||||
"next_run_at": None,
|
||||
"last_run_at": None,
|
||||
"last_status": None,
|
||||
"last_error": None,
|
||||
"deliver": "local",
|
||||
"origin": None,
|
||||
}]
|
||||
)
|
||||
|
||||
assert get_due_jobs() == []
|
||||
assert get_job("oneshot-stale")["next_run_at"] is None
|
||||
|
||||
|
||||
class TestSaveJobOutput:
|
||||
def test_creates_output_file(self, tmp_cron_dir):
|
||||
|
||||
@@ -7,7 +7,7 @@ from unittest.mock import AsyncMock, patch, MagicMock
|
||||
|
||||
import pytest
|
||||
|
||||
from cron.scheduler import _resolve_origin, _resolve_delivery_target, _deliver_result, run_job, SILENT_MARKER
|
||||
from cron.scheduler import _resolve_origin, _resolve_delivery_target, _deliver_result, run_job, SILENT_MARKER, _build_job_prompt
|
||||
|
||||
|
||||
class TestResolveOrigin:
|
||||
@@ -532,14 +532,53 @@ class TestBuildJobPromptSilentHint:
|
||||
"""Verify _build_job_prompt always injects [SILENT] guidance."""
|
||||
|
||||
def test_hint_always_present(self):
|
||||
from cron.scheduler import _build_job_prompt
|
||||
job = {"prompt": "Check for updates"}
|
||||
result = _build_job_prompt(job)
|
||||
assert "[SILENT]" in result
|
||||
assert "Check for updates" in result
|
||||
|
||||
def test_hint_present_even_without_prompt(self):
|
||||
from cron.scheduler import _build_job_prompt
|
||||
job = {"prompt": ""}
|
||||
result = _build_job_prompt(job)
|
||||
assert "[SILENT]" in result
|
||||
|
||||
|
||||
class TestBuildJobPromptMissingSkill:
|
||||
"""Verify that a missing skill logs a warning and does not crash the job."""
|
||||
|
||||
def _missing_skill_view(self, name: str) -> str:
|
||||
return json.dumps({"success": False, "error": f"Skill '{name}' not found."})
|
||||
|
||||
def test_missing_skill_does_not_raise(self):
|
||||
"""Job should run even when a referenced skill is not installed."""
|
||||
with patch("tools.skills_tool.skill_view", side_effect=self._missing_skill_view):
|
||||
result = _build_job_prompt({"skills": ["ghost-skill"], "prompt": "do something"})
|
||||
# prompt is preserved even though skill was skipped
|
||||
assert "do something" in result
|
||||
|
||||
def test_missing_skill_injects_user_notice_into_prompt(self):
|
||||
"""A system notice about the missing skill is injected into the prompt."""
|
||||
with patch("tools.skills_tool.skill_view", side_effect=self._missing_skill_view):
|
||||
result = _build_job_prompt({"skills": ["ghost-skill"], "prompt": "do something"})
|
||||
assert "ghost-skill" in result
|
||||
assert "not found" in result.lower() or "skipped" in result.lower()
|
||||
|
||||
def test_missing_skill_logs_warning(self, caplog):
|
||||
"""A warning is logged when a skill cannot be found."""
|
||||
with caplog.at_level(logging.WARNING, logger="cron.scheduler"):
|
||||
with patch("tools.skills_tool.skill_view", side_effect=self._missing_skill_view):
|
||||
_build_job_prompt({"name": "My Job", "skills": ["ghost-skill"], "prompt": "do something"})
|
||||
assert any("ghost-skill" in record.message for record in caplog.records)
|
||||
|
||||
def test_valid_skill_loaded_alongside_missing(self):
|
||||
"""A valid skill is still loaded when another skill in the list is missing."""
|
||||
|
||||
def _mixed_skill_view(name: str) -> str:
|
||||
if name == "real-skill":
|
||||
return json.dumps({"success": True, "content": "Real skill content."})
|
||||
return json.dumps({"success": False, "error": f"Skill '{name}' not found."})
|
||||
|
||||
with patch("tools.skills_tool.skill_view", side_effect=_mixed_skill_view):
|
||||
result = _build_job_prompt({"skills": ["ghost-skill", "real-skill"], "prompt": "go"})
|
||||
assert "Real skill content." in result
|
||||
assert "go" in result
|
||||
|
||||
@@ -0,0 +1,240 @@
|
||||
"""Tests for /approve and /deny gateway commands.
|
||||
|
||||
Verifies that dangerous command approvals require explicit /approve or /deny
|
||||
slash commands, not bare "yes"/"no" text matching.
|
||||
"""
|
||||
|
||||
import time
|
||||
from types import SimpleNamespace
|
||||
from unittest.mock import AsyncMock, MagicMock, patch
|
||||
|
||||
import pytest
|
||||
|
||||
from gateway.config import GatewayConfig, Platform, PlatformConfig
|
||||
from gateway.platforms.base import MessageEvent
|
||||
from gateway.session import SessionEntry, SessionSource, build_session_key
|
||||
|
||||
|
||||
def _make_source() -> SessionSource:
|
||||
return SessionSource(
|
||||
platform=Platform.TELEGRAM,
|
||||
user_id="u1",
|
||||
chat_id="c1",
|
||||
user_name="tester",
|
||||
chat_type="dm",
|
||||
)
|
||||
|
||||
|
||||
def _make_event(text: str) -> MessageEvent:
|
||||
return MessageEvent(
|
||||
text=text,
|
||||
source=_make_source(),
|
||||
message_id="m1",
|
||||
)
|
||||
|
||||
|
||||
def _make_runner():
|
||||
from gateway.run import GatewayRunner
|
||||
|
||||
runner = object.__new__(GatewayRunner)
|
||||
runner.config = GatewayConfig(
|
||||
platforms={Platform.TELEGRAM: PlatformConfig(enabled=True, token="***")}
|
||||
)
|
||||
adapter = MagicMock()
|
||||
adapter.send = AsyncMock()
|
||||
runner.adapters = {Platform.TELEGRAM: adapter}
|
||||
runner._voice_mode = {}
|
||||
runner.hooks = SimpleNamespace(emit=AsyncMock(), loaded_hooks=False)
|
||||
runner.session_store = MagicMock()
|
||||
runner._running_agents = {}
|
||||
runner._pending_messages = {}
|
||||
runner._pending_approvals = {}
|
||||
runner._session_db = None
|
||||
runner._reasoning_config = None
|
||||
runner._provider_routing = {}
|
||||
runner._fallback_model = None
|
||||
runner._show_reasoning = False
|
||||
runner._is_user_authorized = lambda _source: True
|
||||
runner._set_session_env = lambda _context: None
|
||||
return runner
|
||||
|
||||
|
||||
def _make_pending_approval(command="sudo rm -rf /tmp/test", pattern_key="sudo"):
|
||||
return {
|
||||
"command": command,
|
||||
"pattern_key": pattern_key,
|
||||
"pattern_keys": [pattern_key],
|
||||
"description": "sudo command",
|
||||
"timestamp": time.time(),
|
||||
}
|
||||
|
||||
|
||||
# ------------------------------------------------------------------
|
||||
# /approve command
|
||||
# ------------------------------------------------------------------
|
||||
|
||||
|
||||
class TestApproveCommand:
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_approve_executes_pending_command(self):
|
||||
"""Basic /approve executes the pending command."""
|
||||
runner = _make_runner()
|
||||
source = _make_source()
|
||||
session_key = runner._session_key_for_source(source)
|
||||
runner._pending_approvals[session_key] = _make_pending_approval()
|
||||
|
||||
event = _make_event("/approve")
|
||||
with patch("tools.terminal_tool.terminal_tool", return_value="done") as mock_term:
|
||||
result = await runner._handle_approve_command(event)
|
||||
|
||||
assert "✅ Command approved and executed" in result
|
||||
mock_term.assert_called_once_with(command="sudo rm -rf /tmp/test", force=True)
|
||||
assert session_key not in runner._pending_approvals
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_approve_session_remembers_pattern(self):
|
||||
"""/approve session approves the pattern for the session."""
|
||||
runner = _make_runner()
|
||||
source = _make_source()
|
||||
session_key = runner._session_key_for_source(source)
|
||||
runner._pending_approvals[session_key] = _make_pending_approval()
|
||||
|
||||
event = _make_event("/approve session")
|
||||
with (
|
||||
patch("tools.terminal_tool.terminal_tool", return_value="done"),
|
||||
patch("tools.approval.approve_session") as mock_session,
|
||||
):
|
||||
result = await runner._handle_approve_command(event)
|
||||
|
||||
assert "pattern approved for this session" in result
|
||||
mock_session.assert_called_once_with(session_key, "sudo")
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_approve_always_approves_permanently(self):
|
||||
"""/approve always approves the pattern permanently."""
|
||||
runner = _make_runner()
|
||||
source = _make_source()
|
||||
session_key = runner._session_key_for_source(source)
|
||||
runner._pending_approvals[session_key] = _make_pending_approval()
|
||||
|
||||
event = _make_event("/approve always")
|
||||
with (
|
||||
patch("tools.terminal_tool.terminal_tool", return_value="done"),
|
||||
patch("tools.approval.approve_permanent") as mock_perm,
|
||||
):
|
||||
result = await runner._handle_approve_command(event)
|
||||
|
||||
assert "pattern approved permanently" in result
|
||||
mock_perm.assert_called_once_with("sudo")
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_approve_no_pending(self):
|
||||
"""/approve with no pending approval returns helpful message."""
|
||||
runner = _make_runner()
|
||||
event = _make_event("/approve")
|
||||
result = await runner._handle_approve_command(event)
|
||||
assert "No pending command" in result
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_approve_expired(self):
|
||||
"""/approve on a timed-out approval rejects it."""
|
||||
runner = _make_runner()
|
||||
source = _make_source()
|
||||
session_key = runner._session_key_for_source(source)
|
||||
approval = _make_pending_approval()
|
||||
approval["timestamp"] = time.time() - 600 # 10 minutes ago
|
||||
runner._pending_approvals[session_key] = approval
|
||||
|
||||
event = _make_event("/approve")
|
||||
result = await runner._handle_approve_command(event)
|
||||
|
||||
assert "expired" in result
|
||||
assert session_key not in runner._pending_approvals
|
||||
|
||||
|
||||
# ------------------------------------------------------------------
|
||||
# /deny command
|
||||
# ------------------------------------------------------------------
|
||||
|
||||
|
||||
class TestDenyCommand:
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_deny_clears_pending(self):
|
||||
"""/deny clears the pending approval."""
|
||||
runner = _make_runner()
|
||||
source = _make_source()
|
||||
session_key = runner._session_key_for_source(source)
|
||||
runner._pending_approvals[session_key] = _make_pending_approval()
|
||||
|
||||
event = _make_event("/deny")
|
||||
result = await runner._handle_deny_command(event)
|
||||
|
||||
assert "❌ Command denied" in result
|
||||
assert session_key not in runner._pending_approvals
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_deny_no_pending(self):
|
||||
"""/deny with no pending approval returns helpful message."""
|
||||
runner = _make_runner()
|
||||
event = _make_event("/deny")
|
||||
result = await runner._handle_deny_command(event)
|
||||
assert "No pending command" in result
|
||||
|
||||
|
||||
# ------------------------------------------------------------------
|
||||
# Bare "yes" must NOT trigger approval
|
||||
# ------------------------------------------------------------------
|
||||
|
||||
|
||||
class TestBareTextNoLongerApproves:
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_yes_does_not_execute_pending_command(self):
|
||||
"""Saying 'yes' in normal conversation must not execute a pending command.
|
||||
|
||||
This is the core bug from issue #1888: bare text matching against
|
||||
'yes'/'no' could intercept unrelated user messages.
|
||||
"""
|
||||
runner = _make_runner()
|
||||
source = _make_source()
|
||||
session_key = runner._session_key_for_source(source)
|
||||
runner._pending_approvals[session_key] = _make_pending_approval()
|
||||
|
||||
# Simulate the user saying "yes" as a normal message.
|
||||
# The old code would have executed the pending command.
|
||||
# Now it should fall through to normal processing (agent handles it).
|
||||
event = _make_event("yes")
|
||||
|
||||
# The approval should still be pending — "yes" is not /approve
|
||||
# We can't easily run _handle_message end-to-end, but we CAN verify
|
||||
# the old text-matching block no longer exists by confirming the
|
||||
# approval is untouched after the command dispatch section.
|
||||
# The key assertion is that _pending_approvals is NOT consumed.
|
||||
assert session_key in runner._pending_approvals
|
||||
|
||||
|
||||
# ------------------------------------------------------------------
|
||||
# Approval hint appended to response
|
||||
# ------------------------------------------------------------------
|
||||
|
||||
|
||||
class TestApprovalHint:
|
||||
|
||||
def test_approval_hint_appended_to_response(self):
|
||||
"""When a pending approval is collected, structured instructions
|
||||
should be appended to the agent response."""
|
||||
# This tests the approval collection logic at the end of _handle_message.
|
||||
# We verify the hint format directly.
|
||||
cmd = "sudo rm -rf /tmp/dangerous"
|
||||
cmd_preview = cmd
|
||||
hint = (
|
||||
f"\n\n⚠️ **Dangerous command requires approval:**\n"
|
||||
f"```\n{cmd_preview}\n```\n"
|
||||
f"Reply `/approve` to execute, `/approve session` to approve this pattern "
|
||||
f"for the session, or `/deny` to cancel."
|
||||
)
|
||||
assert "/approve" in hint
|
||||
assert "/deny" in hint
|
||||
assert cmd in hint
|
||||
@@ -115,6 +115,22 @@ class TestGatewayConfigRoundtrip:
|
||||
assert restored.quick_commands == {"limits": {"type": "exec", "command": "echo ok"}}
|
||||
assert restored.group_sessions_per_user is False
|
||||
|
||||
def test_roundtrip_preserves_unauthorized_dm_behavior(self):
|
||||
config = GatewayConfig(
|
||||
unauthorized_dm_behavior="ignore",
|
||||
platforms={
|
||||
Platform.WHATSAPP: PlatformConfig(
|
||||
enabled=True,
|
||||
extra={"unauthorized_dm_behavior": "pair"},
|
||||
),
|
||||
},
|
||||
)
|
||||
|
||||
restored = GatewayConfig.from_dict(config.to_dict())
|
||||
|
||||
assert restored.unauthorized_dm_behavior == "ignore"
|
||||
assert restored.platforms[Platform.WHATSAPP].extra["unauthorized_dm_behavior"] == "pair"
|
||||
|
||||
|
||||
class TestLoadGatewayConfig:
|
||||
def test_bridges_quick_commands_from_config_yaml(self, tmp_path, monkeypatch):
|
||||
@@ -158,3 +174,21 @@ class TestLoadGatewayConfig:
|
||||
config = load_gateway_config()
|
||||
|
||||
assert config.quick_commands == {}
|
||||
|
||||
def test_bridges_unauthorized_dm_behavior_from_config_yaml(self, tmp_path, monkeypatch):
|
||||
hermes_home = tmp_path / ".hermes"
|
||||
hermes_home.mkdir()
|
||||
config_path = hermes_home / "config.yaml"
|
||||
config_path.write_text(
|
||||
"unauthorized_dm_behavior: ignore\n"
|
||||
"whatsapp:\n"
|
||||
" unauthorized_dm_behavior: pair\n",
|
||||
encoding="utf-8",
|
||||
)
|
||||
|
||||
monkeypatch.setenv("HERMES_HOME", str(hermes_home))
|
||||
|
||||
config = load_gateway_config()
|
||||
|
||||
assert config.unauthorized_dm_behavior == "ignore"
|
||||
assert config.platforms[Platform.WHATSAPP].extra["unauthorized_dm_behavior"] == "pair"
|
||||
|
||||
@@ -0,0 +1,267 @@
|
||||
"""Tests for the session race guard that prevents concurrent agent runs.
|
||||
|
||||
The sentinel-based guard ensures that when _handle_message passes the
|
||||
"is an agent already running?" check and proceeds to the slow async
|
||||
setup path (vision enrichment, STT, hooks, session hygiene), a second
|
||||
message for the same session is correctly recognized as "already running"
|
||||
and routed through the interrupt/queue path instead of spawning a
|
||||
duplicate agent.
|
||||
"""
|
||||
|
||||
import asyncio
|
||||
from unittest.mock import AsyncMock, MagicMock, patch
|
||||
|
||||
import pytest
|
||||
|
||||
from gateway.config import GatewayConfig, Platform, PlatformConfig
|
||||
from gateway.platforms.base import MessageEvent, MessageType
|
||||
from gateway.run import GatewayRunner, _AGENT_PENDING_SENTINEL
|
||||
from gateway.session import SessionSource, build_session_key
|
||||
|
||||
|
||||
class _FakeAdapter:
|
||||
"""Minimal adapter stub for testing."""
|
||||
|
||||
def __init__(self):
|
||||
self._pending_messages = {}
|
||||
|
||||
async def send(self, chat_id, text, **kwargs):
|
||||
pass
|
||||
|
||||
|
||||
def _make_runner():
|
||||
runner = object.__new__(GatewayRunner)
|
||||
runner.config = GatewayConfig(
|
||||
platforms={Platform.TELEGRAM: PlatformConfig(enabled=True, token="***")}
|
||||
)
|
||||
runner.adapters = {Platform.TELEGRAM: _FakeAdapter()}
|
||||
runner._running_agents = {}
|
||||
runner._pending_messages = {}
|
||||
runner._pending_approvals = {}
|
||||
runner._voice_mode = {}
|
||||
runner._is_user_authorized = lambda _source: True
|
||||
return runner
|
||||
|
||||
|
||||
def _make_event(text="hello", chat_id="12345"):
|
||||
source = SessionSource(
|
||||
platform=Platform.TELEGRAM, chat_id=chat_id, chat_type="dm"
|
||||
)
|
||||
return MessageEvent(text=text, message_type=MessageType.TEXT, source=source)
|
||||
|
||||
|
||||
# ------------------------------------------------------------------
|
||||
# Test 1: Sentinel is placed before _handle_message_with_agent runs
|
||||
# ------------------------------------------------------------------
|
||||
@pytest.mark.asyncio
|
||||
async def test_sentinel_placed_before_agent_setup():
|
||||
"""After passing the 'not running' guard, the sentinel must be
|
||||
written into _running_agents *before* any await, so that a
|
||||
concurrent message sees the session as occupied."""
|
||||
runner = _make_runner()
|
||||
event = _make_event()
|
||||
session_key = build_session_key(event.source)
|
||||
|
||||
# Patch _handle_message_with_agent to capture state at entry
|
||||
sentinel_was_set = False
|
||||
|
||||
async def mock_inner(self_inner, ev, src, qk):
|
||||
nonlocal sentinel_was_set
|
||||
sentinel_was_set = runner._running_agents.get(qk) is _AGENT_PENDING_SENTINEL
|
||||
return "ok"
|
||||
|
||||
with patch.object(GatewayRunner, "_handle_message_with_agent", mock_inner):
|
||||
await runner._handle_message(event)
|
||||
|
||||
assert sentinel_was_set, (
|
||||
"Sentinel must be in _running_agents when _handle_message_with_agent starts"
|
||||
)
|
||||
|
||||
|
||||
# ------------------------------------------------------------------
|
||||
# Test 2: Sentinel is cleaned up after _handle_message_with_agent
|
||||
# ------------------------------------------------------------------
|
||||
@pytest.mark.asyncio
|
||||
async def test_sentinel_cleaned_up_after_handler_returns():
|
||||
"""If _handle_message_with_agent returns normally, the sentinel
|
||||
must be removed so the session is not permanently locked."""
|
||||
runner = _make_runner()
|
||||
event = _make_event()
|
||||
session_key = build_session_key(event.source)
|
||||
|
||||
async def mock_inner(self_inner, ev, src, qk):
|
||||
return "ok"
|
||||
|
||||
with patch.object(GatewayRunner, "_handle_message_with_agent", mock_inner):
|
||||
await runner._handle_message(event)
|
||||
|
||||
assert session_key not in runner._running_agents, (
|
||||
"Sentinel must be removed after handler completes"
|
||||
)
|
||||
|
||||
|
||||
# ------------------------------------------------------------------
|
||||
# Test 3: Sentinel cleaned up on exception
|
||||
# ------------------------------------------------------------------
|
||||
@pytest.mark.asyncio
|
||||
async def test_sentinel_cleaned_up_on_exception():
|
||||
"""If _handle_message_with_agent raises, the sentinel must still
|
||||
be cleaned up so the session is not permanently locked."""
|
||||
runner = _make_runner()
|
||||
event = _make_event()
|
||||
session_key = build_session_key(event.source)
|
||||
|
||||
async def mock_inner(self_inner, ev, src, qk):
|
||||
raise RuntimeError("boom")
|
||||
|
||||
with patch.object(GatewayRunner, "_handle_message_with_agent", mock_inner):
|
||||
with pytest.raises(RuntimeError, match="boom"):
|
||||
await runner._handle_message(event)
|
||||
|
||||
assert session_key not in runner._running_agents, (
|
||||
"Sentinel must be removed even if handler raises"
|
||||
)
|
||||
|
||||
|
||||
# ------------------------------------------------------------------
|
||||
# Test 4: Second message during sentinel sees "already running"
|
||||
# ------------------------------------------------------------------
|
||||
@pytest.mark.asyncio
|
||||
async def test_second_message_during_sentinel_queued_not_duplicate():
|
||||
"""While the sentinel is set (agent setup in progress), a second
|
||||
message for the same session must hit the 'already running' branch
|
||||
and be queued — not start a second agent."""
|
||||
runner = _make_runner()
|
||||
event1 = _make_event(text="first message")
|
||||
event2 = _make_event(text="second message")
|
||||
session_key = build_session_key(event1.source)
|
||||
|
||||
barrier = asyncio.Event()
|
||||
|
||||
async def slow_inner(self_inner, ev, src, qk):
|
||||
# Simulate slow setup — wait until test tells us to proceed
|
||||
await barrier.wait()
|
||||
return "ok"
|
||||
|
||||
with patch.object(GatewayRunner, "_handle_message_with_agent", slow_inner):
|
||||
# Start first message (will block at barrier)
|
||||
task1 = asyncio.create_task(runner._handle_message(event1))
|
||||
# Yield so task1 enters slow_inner and sentinel is set
|
||||
await asyncio.sleep(0)
|
||||
|
||||
# Verify sentinel is set
|
||||
assert runner._running_agents.get(session_key) is _AGENT_PENDING_SENTINEL
|
||||
|
||||
# Second message should see "already running" and be queued
|
||||
result2 = await runner._handle_message(event2)
|
||||
assert result2 is None, "Second message should return None (queued)"
|
||||
|
||||
# The second message should have been queued in adapter pending
|
||||
adapter = runner.adapters[Platform.TELEGRAM]
|
||||
assert session_key in adapter._pending_messages, (
|
||||
"Second message should be queued as pending"
|
||||
)
|
||||
assert adapter._pending_messages[session_key] is event2
|
||||
|
||||
# Let first message complete
|
||||
barrier.set()
|
||||
await task1
|
||||
|
||||
|
||||
# ------------------------------------------------------------------
|
||||
# Test 5: Sentinel not placed for command messages
|
||||
# ------------------------------------------------------------------
|
||||
@pytest.mark.asyncio
|
||||
async def test_command_messages_do_not_leave_sentinel():
|
||||
"""Slash commands (/help, /status, etc.) return early from
|
||||
_handle_message. They must NOT leave a sentinel behind."""
|
||||
runner = _make_runner()
|
||||
source = SessionSource(
|
||||
platform=Platform.TELEGRAM, chat_id="12345", chat_type="dm"
|
||||
)
|
||||
event = MessageEvent(
|
||||
text="/help", message_type=MessageType.TEXT, source=source
|
||||
)
|
||||
session_key = build_session_key(source)
|
||||
|
||||
# Mock the help handler to avoid needing full runner setup
|
||||
runner._handle_help_command = AsyncMock(return_value="Help text")
|
||||
# Need hooks for command emission
|
||||
runner.hooks = MagicMock()
|
||||
runner.hooks.emit = AsyncMock()
|
||||
|
||||
await runner._handle_message(event)
|
||||
|
||||
assert session_key not in runner._running_agents, (
|
||||
"Command handlers must not leave sentinel in _running_agents"
|
||||
)
|
||||
|
||||
|
||||
# ------------------------------------------------------------------
|
||||
# Test 6: /stop during sentinel returns helpful message
|
||||
# ------------------------------------------------------------------
|
||||
@pytest.mark.asyncio
|
||||
async def test_stop_during_sentinel_returns_message():
|
||||
"""If /stop arrives while the sentinel is set (agent still starting),
|
||||
it should return a helpful message instead of crashing or queuing."""
|
||||
runner = _make_runner()
|
||||
event1 = _make_event(text="hello")
|
||||
session_key = build_session_key(event1.source)
|
||||
|
||||
barrier = asyncio.Event()
|
||||
|
||||
async def slow_inner(self_inner, ev, src, qk):
|
||||
await barrier.wait()
|
||||
return "ok"
|
||||
|
||||
with patch.object(GatewayRunner, "_handle_message_with_agent", slow_inner):
|
||||
task1 = asyncio.create_task(runner._handle_message(event1))
|
||||
await asyncio.sleep(0)
|
||||
|
||||
# Sentinel should be set
|
||||
assert runner._running_agents.get(session_key) is _AGENT_PENDING_SENTINEL
|
||||
|
||||
# Send /stop — should get a message, not crash
|
||||
stop_event = _make_event(text="/stop")
|
||||
result = await runner._handle_message(stop_event)
|
||||
assert result is not None, "/stop during sentinel should return a message"
|
||||
assert "starting up" in result.lower()
|
||||
|
||||
# Should NOT be queued as pending
|
||||
adapter = runner.adapters[Platform.TELEGRAM]
|
||||
assert session_key not in adapter._pending_messages
|
||||
|
||||
barrier.set()
|
||||
await task1
|
||||
|
||||
|
||||
# ------------------------------------------------------------------
|
||||
# Test 7: Shutdown skips sentinel entries
|
||||
# ------------------------------------------------------------------
|
||||
@pytest.mark.asyncio
|
||||
async def test_shutdown_skips_sentinel():
|
||||
"""During gateway shutdown, sentinel entries in _running_agents
|
||||
should be skipped without raising AttributeError."""
|
||||
runner = _make_runner()
|
||||
session_key = "telegram:dm:99999"
|
||||
|
||||
# Simulate a sentinel in _running_agents
|
||||
runner._running_agents[session_key] = _AGENT_PENDING_SENTINEL
|
||||
|
||||
# Also add a real agent mock to verify it still gets interrupted
|
||||
real_agent = MagicMock()
|
||||
runner._running_agents["telegram:dm:88888"] = real_agent
|
||||
|
||||
runner.adapters = {} # No adapters to disconnect
|
||||
runner._running = True
|
||||
runner._shutdown_event = asyncio.Event()
|
||||
runner._exit_reason = None
|
||||
runner._shutdown_all_gateway_honcho = lambda: None
|
||||
|
||||
with patch("gateway.status.remove_pid_file"), \
|
||||
patch("gateway.status.write_runtime_status"):
|
||||
await runner.stop()
|
||||
|
||||
# Real agent should have been interrupted
|
||||
real_agent.interrupt.assert_called_once()
|
||||
# Should not have raised on the sentinel
|
||||
@@ -0,0 +1,137 @@
|
||||
from types import SimpleNamespace
|
||||
from unittest.mock import AsyncMock, MagicMock
|
||||
|
||||
import pytest
|
||||
|
||||
from gateway.config import GatewayConfig, Platform, PlatformConfig
|
||||
from gateway.platforms.base import MessageEvent
|
||||
from gateway.session import SessionSource
|
||||
|
||||
|
||||
def _clear_auth_env(monkeypatch) -> None:
|
||||
for key in (
|
||||
"TELEGRAM_ALLOWED_USERS",
|
||||
"DISCORD_ALLOWED_USERS",
|
||||
"WHATSAPP_ALLOWED_USERS",
|
||||
"SLACK_ALLOWED_USERS",
|
||||
"SIGNAL_ALLOWED_USERS",
|
||||
"EMAIL_ALLOWED_USERS",
|
||||
"SMS_ALLOWED_USERS",
|
||||
"MATTERMOST_ALLOWED_USERS",
|
||||
"MATRIX_ALLOWED_USERS",
|
||||
"DINGTALK_ALLOWED_USERS",
|
||||
"GATEWAY_ALLOWED_USERS",
|
||||
"TELEGRAM_ALLOW_ALL_USERS",
|
||||
"DISCORD_ALLOW_ALL_USERS",
|
||||
"WHATSAPP_ALLOW_ALL_USERS",
|
||||
"SLACK_ALLOW_ALL_USERS",
|
||||
"SIGNAL_ALLOW_ALL_USERS",
|
||||
"EMAIL_ALLOW_ALL_USERS",
|
||||
"SMS_ALLOW_ALL_USERS",
|
||||
"MATTERMOST_ALLOW_ALL_USERS",
|
||||
"MATRIX_ALLOW_ALL_USERS",
|
||||
"DINGTALK_ALLOW_ALL_USERS",
|
||||
"GATEWAY_ALLOW_ALL_USERS",
|
||||
):
|
||||
monkeypatch.delenv(key, raising=False)
|
||||
|
||||
|
||||
def _make_event(platform: Platform, user_id: str, chat_id: str) -> MessageEvent:
|
||||
return MessageEvent(
|
||||
text="hello",
|
||||
message_id="m1",
|
||||
source=SessionSource(
|
||||
platform=platform,
|
||||
user_id=user_id,
|
||||
chat_id=chat_id,
|
||||
user_name="tester",
|
||||
chat_type="dm",
|
||||
),
|
||||
)
|
||||
|
||||
|
||||
def _make_runner(platform: Platform, config: GatewayConfig):
|
||||
from gateway.run import GatewayRunner
|
||||
|
||||
runner = object.__new__(GatewayRunner)
|
||||
runner.config = config
|
||||
adapter = SimpleNamespace(send=AsyncMock())
|
||||
runner.adapters = {platform: adapter}
|
||||
runner.pairing_store = MagicMock()
|
||||
runner.pairing_store.is_approved.return_value = False
|
||||
return runner, adapter
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_unauthorized_dm_pairs_by_default(monkeypatch):
|
||||
_clear_auth_env(monkeypatch)
|
||||
config = GatewayConfig(
|
||||
platforms={Platform.WHATSAPP: PlatformConfig(enabled=True)},
|
||||
)
|
||||
runner, adapter = _make_runner(Platform.WHATSAPP, config)
|
||||
runner.pairing_store.generate_code.return_value = "ABC12DEF"
|
||||
|
||||
result = await runner._handle_message(
|
||||
_make_event(
|
||||
Platform.WHATSAPP,
|
||||
"15551234567@s.whatsapp.net",
|
||||
"15551234567@s.whatsapp.net",
|
||||
)
|
||||
)
|
||||
|
||||
assert result is None
|
||||
runner.pairing_store.generate_code.assert_called_once_with(
|
||||
"whatsapp",
|
||||
"15551234567@s.whatsapp.net",
|
||||
"tester",
|
||||
)
|
||||
adapter.send.assert_awaited_once()
|
||||
assert "ABC12DEF" in adapter.send.await_args.args[1]
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_unauthorized_whatsapp_dm_can_be_ignored(monkeypatch):
|
||||
_clear_auth_env(monkeypatch)
|
||||
config = GatewayConfig(
|
||||
platforms={
|
||||
Platform.WHATSAPP: PlatformConfig(
|
||||
enabled=True,
|
||||
extra={"unauthorized_dm_behavior": "ignore"},
|
||||
),
|
||||
},
|
||||
)
|
||||
runner, adapter = _make_runner(Platform.WHATSAPP, config)
|
||||
|
||||
result = await runner._handle_message(
|
||||
_make_event(
|
||||
Platform.WHATSAPP,
|
||||
"15551234567@s.whatsapp.net",
|
||||
"15551234567@s.whatsapp.net",
|
||||
)
|
||||
)
|
||||
|
||||
assert result is None
|
||||
runner.pairing_store.generate_code.assert_not_called()
|
||||
adapter.send.assert_not_awaited()
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_global_ignore_suppresses_pairing_reply(monkeypatch):
|
||||
_clear_auth_env(monkeypatch)
|
||||
config = GatewayConfig(
|
||||
unauthorized_dm_behavior="ignore",
|
||||
platforms={Platform.TELEGRAM: PlatformConfig(enabled=True, token="***")},
|
||||
)
|
||||
runner, adapter = _make_runner(Platform.TELEGRAM, config)
|
||||
|
||||
result = await runner._handle_message(
|
||||
_make_event(
|
||||
Platform.TELEGRAM,
|
||||
"12345",
|
||||
"12345",
|
||||
)
|
||||
)
|
||||
|
||||
assert result is None
|
||||
runner.pairing_store.generate_code.assert_not_called()
|
||||
adapter.send.assert_not_awaited()
|
||||
@@ -0,0 +1,619 @@
|
||||
"""Unit tests for the generic webhook platform adapter.
|
||||
|
||||
Covers:
|
||||
- HMAC signature validation (GitHub, GitLab, generic)
|
||||
- Prompt rendering with dot-notation template variables
|
||||
- Event type filtering
|
||||
- HTTP handler behaviour (404, 202, health)
|
||||
- Idempotency cache (duplicate delivery IDs)
|
||||
- Rate limiting (fixed-window, per route)
|
||||
- Body size limits
|
||||
- INSECURE_NO_AUTH bypass
|
||||
- Session isolation for concurrent webhooks
|
||||
- Delivery info cleanup after send()
|
||||
- connect / disconnect lifecycle
|
||||
"""
|
||||
|
||||
import asyncio
|
||||
import hashlib
|
||||
import hmac
|
||||
import json
|
||||
import time
|
||||
from unittest.mock import AsyncMock, MagicMock, patch
|
||||
|
||||
import pytest
|
||||
from aiohttp import web
|
||||
from aiohttp.test_utils import TestClient, TestServer
|
||||
|
||||
from gateway.config import Platform, PlatformConfig
|
||||
from gateway.platforms.base import MessageEvent, MessageType, SendResult
|
||||
from gateway.platforms.webhook import (
|
||||
WebhookAdapter,
|
||||
_INSECURE_NO_AUTH,
|
||||
check_webhook_requirements,
|
||||
)
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Helpers
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
def _make_config(
|
||||
routes=None,
|
||||
secret="",
|
||||
rate_limit=30,
|
||||
max_body_bytes=1_048_576,
|
||||
host="0.0.0.0",
|
||||
port=0, # let OS pick a free port in tests
|
||||
):
|
||||
"""Build a PlatformConfig suitable for WebhookAdapter."""
|
||||
extra = {
|
||||
"host": host,
|
||||
"port": port,
|
||||
"routes": routes or {},
|
||||
"rate_limit": rate_limit,
|
||||
"max_body_bytes": max_body_bytes,
|
||||
}
|
||||
if secret:
|
||||
extra["secret"] = secret
|
||||
return PlatformConfig(enabled=True, extra=extra)
|
||||
|
||||
|
||||
def _make_adapter(routes=None, **kwargs):
|
||||
"""Create a WebhookAdapter with sensible defaults for testing."""
|
||||
config = _make_config(routes=routes, **kwargs)
|
||||
return WebhookAdapter(config)
|
||||
|
||||
|
||||
def _create_app(adapter: WebhookAdapter) -> web.Application:
|
||||
"""Build the aiohttp Application from the adapter (without starting a full server)."""
|
||||
app = web.Application()
|
||||
app.router.add_get("/health", adapter._handle_health)
|
||||
app.router.add_post("/webhooks/{route_name}", adapter._handle_webhook)
|
||||
return app
|
||||
|
||||
|
||||
def _mock_request(headers=None, body=b"", content_length=None, match_info=None):
|
||||
"""Build a lightweight mock aiohttp request for non-HTTP tests."""
|
||||
req = MagicMock()
|
||||
req.headers = headers or {}
|
||||
req.content_length = content_length if content_length is not None else len(body)
|
||||
req.match_info = match_info or {}
|
||||
req.method = "POST"
|
||||
|
||||
async def _read():
|
||||
return body
|
||||
|
||||
req.read = _read
|
||||
return req
|
||||
|
||||
|
||||
def _github_signature(body: bytes, secret: str) -> str:
|
||||
"""Compute X-Hub-Signature-256 for *body* using *secret*."""
|
||||
return "sha256=" + hmac.new(
|
||||
secret.encode(), body, hashlib.sha256
|
||||
).hexdigest()
|
||||
|
||||
|
||||
def _generic_signature(body: bytes, secret: str) -> str:
|
||||
"""Compute X-Webhook-Signature (plain HMAC-SHA256 hex) for *body*."""
|
||||
return hmac.new(secret.encode(), body, hashlib.sha256).hexdigest()
|
||||
|
||||
|
||||
# ===================================================================
|
||||
# Signature validation
|
||||
# ===================================================================
|
||||
|
||||
|
||||
class TestValidateSignature:
|
||||
"""Tests for WebhookAdapter._validate_signature."""
|
||||
|
||||
def test_validate_github_signature_valid(self):
|
||||
"""Valid X-Hub-Signature-256 is accepted."""
|
||||
adapter = _make_adapter()
|
||||
body = b'{"action": "opened"}'
|
||||
secret = "webhook-secret-42"
|
||||
sig = _github_signature(body, secret)
|
||||
req = _mock_request(headers={"X-Hub-Signature-256": sig})
|
||||
assert adapter._validate_signature(req, body, secret) is True
|
||||
|
||||
def test_validate_github_signature_invalid(self):
|
||||
"""Wrong X-Hub-Signature-256 is rejected."""
|
||||
adapter = _make_adapter()
|
||||
body = b'{"action": "opened"}'
|
||||
secret = "webhook-secret-42"
|
||||
req = _mock_request(headers={"X-Hub-Signature-256": "sha256=deadbeef"})
|
||||
assert adapter._validate_signature(req, body, secret) is False
|
||||
|
||||
def test_validate_gitlab_token(self):
|
||||
"""GitLab plain-token match via X-Gitlab-Token."""
|
||||
adapter = _make_adapter()
|
||||
secret = "gl-token-value"
|
||||
req = _mock_request(headers={"X-Gitlab-Token": secret})
|
||||
assert adapter._validate_signature(req, b"{}", secret) is True
|
||||
|
||||
def test_validate_gitlab_token_wrong(self):
|
||||
"""Wrong X-Gitlab-Token is rejected."""
|
||||
adapter = _make_adapter()
|
||||
req = _mock_request(headers={"X-Gitlab-Token": "wrong"})
|
||||
assert adapter._validate_signature(req, b"{}", "correct") is False
|
||||
|
||||
def test_validate_no_signature_with_secret_rejects(self):
|
||||
"""Secret configured but no recognised signature header → reject."""
|
||||
adapter = _make_adapter()
|
||||
req = _mock_request(headers={}) # no sig headers at all
|
||||
assert adapter._validate_signature(req, b"{}", "my-secret") is False
|
||||
|
||||
def test_validate_no_secret_allows_all(self):
|
||||
"""When the secret is empty/falsy, the validator is never even called
|
||||
by the handler (secret check is 'if secret and secret != _INSECURE...').
|
||||
Verify that an empty secret isn't accidentally passed to the validator."""
|
||||
# This tests the semantics: empty secret means skip validation entirely.
|
||||
# The handler code does: if secret and secret != _INSECURE_NO_AUTH: validate
|
||||
# So with an empty secret, _validate_signature is never reached.
|
||||
# We just verify the code path is correct by constructing an adapter
|
||||
# with no secret and confirming the route config resolves to "".
|
||||
adapter = _make_adapter(
|
||||
routes={"test": {"prompt": "hello"}},
|
||||
secret="",
|
||||
)
|
||||
# The route has no secret, global secret is empty
|
||||
route_secret = adapter._routes["test"].get("secret", adapter._global_secret)
|
||||
assert not route_secret # empty → validation is skipped in handler
|
||||
|
||||
def test_validate_generic_signature_valid(self):
|
||||
"""Valid X-Webhook-Signature (generic HMAC-SHA256 hex) is accepted."""
|
||||
adapter = _make_adapter()
|
||||
body = b'{"event": "push"}'
|
||||
secret = "generic-secret"
|
||||
sig = _generic_signature(body, secret)
|
||||
req = _mock_request(headers={"X-Webhook-Signature": sig})
|
||||
assert adapter._validate_signature(req, body, secret) is True
|
||||
|
||||
|
||||
# ===================================================================
|
||||
# Prompt rendering
|
||||
# ===================================================================
|
||||
|
||||
|
||||
class TestRenderPrompt:
|
||||
"""Tests for WebhookAdapter._render_prompt."""
|
||||
|
||||
def test_render_prompt_dot_notation(self):
|
||||
"""Dot-notation {pull_request.title} resolves nested keys."""
|
||||
adapter = _make_adapter()
|
||||
payload = {"pull_request": {"title": "Fix bug", "number": 42}}
|
||||
result = adapter._render_prompt(
|
||||
"PR #{pull_request.number}: {pull_request.title}",
|
||||
payload,
|
||||
"pull_request",
|
||||
"github",
|
||||
)
|
||||
assert result == "PR #42: Fix bug"
|
||||
|
||||
def test_render_prompt_missing_key_preserved(self):
|
||||
"""{nonexistent} is left as-is when key doesn't exist in payload."""
|
||||
adapter = _make_adapter()
|
||||
result = adapter._render_prompt(
|
||||
"Hello {nonexistent}!",
|
||||
{"action": "opened"},
|
||||
"push",
|
||||
"test",
|
||||
)
|
||||
assert "{nonexistent}" in result
|
||||
|
||||
def test_render_prompt_no_template_dumps_json(self):
|
||||
"""Empty template → JSON dump fallback with event/route context."""
|
||||
adapter = _make_adapter()
|
||||
payload = {"key": "value"}
|
||||
result = adapter._render_prompt("", payload, "push", "my-route")
|
||||
assert "push" in result
|
||||
assert "my-route" in result
|
||||
assert "key" in result
|
||||
|
||||
|
||||
# ===================================================================
|
||||
# Delivery extra rendering
|
||||
# ===================================================================
|
||||
|
||||
|
||||
class TestRenderDeliveryExtra:
|
||||
def test_render_delivery_extra_templates(self):
|
||||
"""String values in deliver_extra are rendered with payload data."""
|
||||
adapter = _make_adapter()
|
||||
extra = {"repo": "{repository.full_name}", "pr_number": "{number}", "static": 42}
|
||||
payload = {"repository": {"full_name": "org/repo"}, "number": 7}
|
||||
result = adapter._render_delivery_extra(extra, payload)
|
||||
assert result["repo"] == "org/repo"
|
||||
assert result["pr_number"] == "7"
|
||||
assert result["static"] == 42 # non-string left as-is
|
||||
|
||||
|
||||
# ===================================================================
|
||||
# Event filtering
|
||||
# ===================================================================
|
||||
|
||||
|
||||
class TestEventFilter:
|
||||
"""Tests for event type filtering in _handle_webhook."""
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_event_filter_accepts_matching(self):
|
||||
"""Matching event type passes through."""
|
||||
routes = {
|
||||
"gh": {
|
||||
"secret": _INSECURE_NO_AUTH,
|
||||
"events": ["pull_request"],
|
||||
"prompt": "PR: {action}",
|
||||
}
|
||||
}
|
||||
adapter = _make_adapter(routes=routes)
|
||||
# Stub handle_message to avoid running the agent
|
||||
adapter.handle_message = AsyncMock()
|
||||
|
||||
app = _create_app(adapter)
|
||||
async with TestClient(TestServer(app)) as cli:
|
||||
resp = await cli.post(
|
||||
"/webhooks/gh",
|
||||
json={"action": "opened"},
|
||||
headers={"X-GitHub-Event": "pull_request"},
|
||||
)
|
||||
assert resp.status == 202
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_event_filter_rejects_non_matching(self):
|
||||
"""Non-matching event type returns 200 with status=ignored."""
|
||||
routes = {
|
||||
"gh": {
|
||||
"secret": _INSECURE_NO_AUTH,
|
||||
"events": ["pull_request"],
|
||||
"prompt": "test",
|
||||
}
|
||||
}
|
||||
adapter = _make_adapter(routes=routes)
|
||||
|
||||
app = _create_app(adapter)
|
||||
async with TestClient(TestServer(app)) as cli:
|
||||
resp = await cli.post(
|
||||
"/webhooks/gh",
|
||||
json={"action": "opened"},
|
||||
headers={"X-GitHub-Event": "push"},
|
||||
)
|
||||
assert resp.status == 200
|
||||
data = await resp.json()
|
||||
assert data["status"] == "ignored"
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_event_filter_empty_allows_all(self):
|
||||
"""No events list → accept any event type."""
|
||||
routes = {
|
||||
"all": {
|
||||
"secret": _INSECURE_NO_AUTH,
|
||||
"prompt": "got it",
|
||||
}
|
||||
}
|
||||
adapter = _make_adapter(routes=routes)
|
||||
adapter.handle_message = AsyncMock()
|
||||
|
||||
app = _create_app(adapter)
|
||||
async with TestClient(TestServer(app)) as cli:
|
||||
resp = await cli.post(
|
||||
"/webhooks/all",
|
||||
json={"action": "any"},
|
||||
headers={"X-GitHub-Event": "whatever"},
|
||||
)
|
||||
assert resp.status == 202
|
||||
|
||||
|
||||
# ===================================================================
|
||||
# HTTP handling
|
||||
# ===================================================================
|
||||
|
||||
|
||||
class TestHTTPHandling:
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_unknown_route_returns_404(self):
|
||||
"""POST to an unknown route returns 404."""
|
||||
adapter = _make_adapter(routes={"real": {"secret": _INSECURE_NO_AUTH, "prompt": "x"}})
|
||||
app = _create_app(adapter)
|
||||
async with TestClient(TestServer(app)) as cli:
|
||||
resp = await cli.post("/webhooks/nonexistent", json={"a": 1})
|
||||
assert resp.status == 404
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_webhook_handler_returns_202(self):
|
||||
"""Valid request returns 202 Accepted."""
|
||||
routes = {"test": {"secret": _INSECURE_NO_AUTH, "prompt": "hi"}}
|
||||
adapter = _make_adapter(routes=routes)
|
||||
adapter.handle_message = AsyncMock()
|
||||
|
||||
app = _create_app(adapter)
|
||||
async with TestClient(TestServer(app)) as cli:
|
||||
resp = await cli.post("/webhooks/test", json={"data": "value"})
|
||||
assert resp.status == 202
|
||||
data = await resp.json()
|
||||
assert data["status"] == "accepted"
|
||||
assert data["route"] == "test"
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_health_endpoint(self):
|
||||
"""GET /health returns 200 with status=ok."""
|
||||
adapter = _make_adapter()
|
||||
app = _create_app(adapter)
|
||||
async with TestClient(TestServer(app)) as cli:
|
||||
resp = await cli.get("/health")
|
||||
assert resp.status == 200
|
||||
data = await resp.json()
|
||||
assert data["status"] == "ok"
|
||||
assert data["platform"] == "webhook"
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_connect_starts_server(self):
|
||||
"""connect() starts the HTTP listener and marks adapter as connected."""
|
||||
routes = {"r1": {"secret": _INSECURE_NO_AUTH, "prompt": "x"}}
|
||||
adapter = _make_adapter(routes=routes, port=0)
|
||||
# Use port 0 — the OS picks a free port, but aiohttp requires a real bind.
|
||||
# We just test that the method completes and marks connected.
|
||||
# Need to mock TCPSite to avoid actual binding.
|
||||
with patch("gateway.platforms.webhook.web.AppRunner") as MockRunner, \
|
||||
patch("gateway.platforms.webhook.web.TCPSite") as MockSite:
|
||||
mock_runner_inst = AsyncMock()
|
||||
MockRunner.return_value = mock_runner_inst
|
||||
mock_site_inst = AsyncMock()
|
||||
MockSite.return_value = mock_site_inst
|
||||
|
||||
result = await adapter.connect()
|
||||
assert result is True
|
||||
assert adapter.is_connected
|
||||
mock_runner_inst.setup.assert_awaited_once()
|
||||
mock_site_inst.start.assert_awaited_once()
|
||||
|
||||
await adapter.disconnect()
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_disconnect_cleans_up(self):
|
||||
"""disconnect() stops the server and marks adapter disconnected."""
|
||||
adapter = _make_adapter()
|
||||
# Simulate a runner that was previously set up
|
||||
mock_runner = AsyncMock()
|
||||
adapter._runner = mock_runner
|
||||
adapter._running = True
|
||||
|
||||
await adapter.disconnect()
|
||||
mock_runner.cleanup.assert_awaited_once()
|
||||
assert adapter._runner is None
|
||||
assert not adapter.is_connected
|
||||
|
||||
|
||||
# ===================================================================
|
||||
# Idempotency
|
||||
# ===================================================================
|
||||
|
||||
|
||||
class TestIdempotency:
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_duplicate_delivery_id_returns_200(self):
|
||||
"""Second request with same delivery ID returns 200 duplicate."""
|
||||
routes = {"idem": {"secret": _INSECURE_NO_AUTH, "prompt": "test"}}
|
||||
adapter = _make_adapter(routes=routes)
|
||||
adapter.handle_message = AsyncMock()
|
||||
|
||||
app = _create_app(adapter)
|
||||
async with TestClient(TestServer(app)) as cli:
|
||||
headers = {"X-GitHub-Delivery": "delivery-123"}
|
||||
resp1 = await cli.post("/webhooks/idem", json={"a": 1}, headers=headers)
|
||||
assert resp1.status == 202
|
||||
|
||||
resp2 = await cli.post("/webhooks/idem", json={"a": 1}, headers=headers)
|
||||
assert resp2.status == 200
|
||||
data = await resp2.json()
|
||||
assert data["status"] == "duplicate"
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_expired_delivery_id_allows_reprocess(self):
|
||||
"""After TTL expires, the same delivery ID is accepted again."""
|
||||
routes = {"idem": {"secret": _INSECURE_NO_AUTH, "prompt": "test"}}
|
||||
adapter = _make_adapter(routes=routes)
|
||||
adapter._idempotency_ttl = 1 # 1 second TTL for test speed
|
||||
adapter.handle_message = AsyncMock()
|
||||
|
||||
app = _create_app(adapter)
|
||||
async with TestClient(TestServer(app)) as cli:
|
||||
headers = {"X-GitHub-Delivery": "delivery-456"}
|
||||
|
||||
resp1 = await cli.post("/webhooks/idem", json={"x": 1}, headers=headers)
|
||||
assert resp1.status == 202
|
||||
|
||||
# Backdate the cache entry so it appears expired
|
||||
adapter._seen_deliveries["delivery-456"] = time.time() - 3700
|
||||
|
||||
resp2 = await cli.post("/webhooks/idem", json={"x": 1}, headers=headers)
|
||||
assert resp2.status == 202 # re-accepted
|
||||
|
||||
|
||||
# ===================================================================
|
||||
# Rate limiting
|
||||
# ===================================================================
|
||||
|
||||
|
||||
class TestRateLimiting:
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_rate_limit_rejects_excess(self):
|
||||
"""Exceeding the rate limit returns 429."""
|
||||
routes = {"limited": {"secret": _INSECURE_NO_AUTH, "prompt": "test"}}
|
||||
adapter = _make_adapter(routes=routes, rate_limit=2)
|
||||
adapter.handle_message = AsyncMock()
|
||||
|
||||
app = _create_app(adapter)
|
||||
async with TestClient(TestServer(app)) as cli:
|
||||
# Two requests within limit
|
||||
for i in range(2):
|
||||
resp = await cli.post(
|
||||
"/webhooks/limited",
|
||||
json={"n": i},
|
||||
headers={"X-GitHub-Delivery": f"d-{i}"},
|
||||
)
|
||||
assert resp.status == 202, f"Request {i} should be accepted"
|
||||
|
||||
# Third request should be rate-limited
|
||||
resp = await cli.post(
|
||||
"/webhooks/limited",
|
||||
json={"n": 99},
|
||||
headers={"X-GitHub-Delivery": "d-99"},
|
||||
)
|
||||
assert resp.status == 429
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_rate_limit_window_resets(self):
|
||||
"""After the 60-second window passes, requests are allowed again."""
|
||||
routes = {"limited": {"secret": _INSECURE_NO_AUTH, "prompt": "test"}}
|
||||
adapter = _make_adapter(routes=routes, rate_limit=1)
|
||||
adapter.handle_message = AsyncMock()
|
||||
|
||||
app = _create_app(adapter)
|
||||
async with TestClient(TestServer(app)) as cli:
|
||||
resp = await cli.post(
|
||||
"/webhooks/limited",
|
||||
json={"n": 1},
|
||||
headers={"X-GitHub-Delivery": "d-a"},
|
||||
)
|
||||
assert resp.status == 202
|
||||
|
||||
# Backdate all rate-limit timestamps to > 60 seconds ago
|
||||
adapter._rate_counts["limited"] = [time.time() - 120]
|
||||
|
||||
resp = await cli.post(
|
||||
"/webhooks/limited",
|
||||
json={"n": 2},
|
||||
headers={"X-GitHub-Delivery": "d-b"},
|
||||
)
|
||||
assert resp.status == 202 # allowed again
|
||||
|
||||
|
||||
# ===================================================================
|
||||
# Body size limit
|
||||
# ===================================================================
|
||||
|
||||
|
||||
class TestBodySize:
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_oversized_payload_rejected(self):
|
||||
"""Content-Length > max_body_bytes returns 413."""
|
||||
routes = {"big": {"secret": _INSECURE_NO_AUTH, "prompt": "test"}}
|
||||
adapter = _make_adapter(routes=routes, max_body_bytes=100)
|
||||
|
||||
app = _create_app(adapter)
|
||||
async with TestClient(TestServer(app)) as cli:
|
||||
large_payload = {"data": "x" * 200}
|
||||
resp = await cli.post(
|
||||
"/webhooks/big",
|
||||
json=large_payload,
|
||||
headers={"Content-Length": "999999"},
|
||||
)
|
||||
assert resp.status == 413
|
||||
|
||||
|
||||
# ===================================================================
|
||||
# INSECURE_NO_AUTH
|
||||
# ===================================================================
|
||||
|
||||
|
||||
class TestInsecureNoAuth:
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_insecure_no_auth_skips_validation(self):
|
||||
"""Setting secret to _INSECURE_NO_AUTH bypasses signature check."""
|
||||
routes = {"open": {"secret": _INSECURE_NO_AUTH, "prompt": "hello"}}
|
||||
adapter = _make_adapter(routes=routes)
|
||||
adapter.handle_message = AsyncMock()
|
||||
|
||||
app = _create_app(adapter)
|
||||
async with TestClient(TestServer(app)) as cli:
|
||||
# No signature header at all — should still be accepted
|
||||
resp = await cli.post("/webhooks/open", json={"test": True})
|
||||
assert resp.status == 202
|
||||
|
||||
|
||||
# ===================================================================
|
||||
# Session isolation
|
||||
# ===================================================================
|
||||
|
||||
|
||||
class TestSessionIsolation:
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_concurrent_webhooks_get_independent_sessions(self):
|
||||
"""Two events on the same route produce different session keys."""
|
||||
routes = {"ci": {"secret": _INSECURE_NO_AUTH, "prompt": "build"}}
|
||||
adapter = _make_adapter(routes=routes)
|
||||
|
||||
captured_events = []
|
||||
|
||||
async def _capture(event):
|
||||
captured_events.append(event)
|
||||
|
||||
adapter.handle_message = _capture
|
||||
|
||||
app = _create_app(adapter)
|
||||
async with TestClient(TestServer(app)) as cli:
|
||||
resp1 = await cli.post(
|
||||
"/webhooks/ci",
|
||||
json={"ref": "main"},
|
||||
headers={"X-GitHub-Delivery": "aaa-111"},
|
||||
)
|
||||
assert resp1.status == 202
|
||||
|
||||
resp2 = await cli.post(
|
||||
"/webhooks/ci",
|
||||
json={"ref": "dev"},
|
||||
headers={"X-GitHub-Delivery": "bbb-222"},
|
||||
)
|
||||
assert resp2.status == 202
|
||||
|
||||
# Wait for the async tasks to be created
|
||||
await asyncio.sleep(0.05)
|
||||
|
||||
assert len(captured_events) == 2
|
||||
ids = {ev.source.chat_id for ev in captured_events}
|
||||
assert len(ids) == 2, "Each delivery must have a unique session chat_id"
|
||||
|
||||
|
||||
# ===================================================================
|
||||
# Delivery info cleanup
|
||||
# ===================================================================
|
||||
|
||||
|
||||
class TestDeliveryCleanup:
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_delivery_info_cleaned_after_send(self):
|
||||
"""send() pops delivery_info so the entry doesn't leak memory."""
|
||||
adapter = _make_adapter()
|
||||
chat_id = "webhook:test:d-xyz"
|
||||
adapter._delivery_info[chat_id] = {
|
||||
"deliver": "log",
|
||||
"deliver_extra": {},
|
||||
"payload": {"x": 1},
|
||||
}
|
||||
|
||||
result = await adapter.send(chat_id, "Agent response here")
|
||||
assert result.success is True
|
||||
assert chat_id not in adapter._delivery_info
|
||||
|
||||
|
||||
# ===================================================================
|
||||
# check_webhook_requirements
|
||||
# ===================================================================
|
||||
|
||||
|
||||
class TestCheckRequirements:
|
||||
def test_returns_true_when_aiohttp_available(self):
|
||||
assert check_webhook_requirements() is True
|
||||
|
||||
@patch("gateway.platforms.webhook.AIOHTTP_AVAILABLE", False)
|
||||
def test_returns_false_without_aiohttp(self):
|
||||
assert check_webhook_requirements() is False
|
||||
@@ -0,0 +1,337 @@
|
||||
"""Integration tests for the generic webhook platform adapter.
|
||||
|
||||
These tests exercise end-to-end flows through the webhook adapter:
|
||||
1. GitHub PR webhook → agent MessageEvent created
|
||||
2. Skills config injects skill content into the prompt
|
||||
3. Cross-platform delivery routes to a mock Telegram adapter
|
||||
4. GitHub comment delivery invokes ``gh`` CLI (mocked subprocess)
|
||||
"""
|
||||
|
||||
import asyncio
|
||||
import hashlib
|
||||
import hmac
|
||||
import json
|
||||
from unittest.mock import AsyncMock, MagicMock, patch
|
||||
|
||||
import pytest
|
||||
from aiohttp import web
|
||||
from aiohttp.test_utils import TestClient, TestServer
|
||||
|
||||
from gateway.config import (
|
||||
GatewayConfig,
|
||||
HomeChannel,
|
||||
Platform,
|
||||
PlatformConfig,
|
||||
)
|
||||
from gateway.platforms.base import MessageEvent, MessageType, SendResult
|
||||
from gateway.platforms.webhook import WebhookAdapter, _INSECURE_NO_AUTH
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Helpers
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
def _make_adapter(routes, **extra_kw) -> WebhookAdapter:
|
||||
"""Create a WebhookAdapter with the given routes."""
|
||||
extra = {"host": "0.0.0.0", "port": 0, "routes": routes}
|
||||
extra.update(extra_kw)
|
||||
config = PlatformConfig(enabled=True, extra=extra)
|
||||
return WebhookAdapter(config)
|
||||
|
||||
|
||||
def _create_app(adapter: WebhookAdapter) -> web.Application:
|
||||
"""Build the aiohttp Application from the adapter."""
|
||||
app = web.Application()
|
||||
app.router.add_get("/health", adapter._handle_health)
|
||||
app.router.add_post("/webhooks/{route_name}", adapter._handle_webhook)
|
||||
return app
|
||||
|
||||
|
||||
def _github_signature(body: bytes, secret: str) -> str:
|
||||
"""Compute X-Hub-Signature-256 for *body* using *secret*."""
|
||||
return "sha256=" + hmac.new(
|
||||
secret.encode(), body, hashlib.sha256
|
||||
).hexdigest()
|
||||
|
||||
|
||||
# A realistic GitHub pull_request event payload (trimmed)
|
||||
GITHUB_PR_PAYLOAD = {
|
||||
"action": "opened",
|
||||
"number": 42,
|
||||
"pull_request": {
|
||||
"title": "Add webhook adapter",
|
||||
"body": "This PR adds a generic webhook platform adapter.",
|
||||
"html_url": "https://github.com/org/repo/pull/42",
|
||||
"user": {"login": "contributor"},
|
||||
"head": {"ref": "feature/webhooks"},
|
||||
"base": {"ref": "main"},
|
||||
},
|
||||
"repository": {
|
||||
"full_name": "org/repo",
|
||||
"html_url": "https://github.com/org/repo",
|
||||
},
|
||||
"sender": {"login": "contributor"},
|
||||
}
|
||||
|
||||
|
||||
# ===================================================================
|
||||
# Test 1: GitHub PR webhook triggers agent
|
||||
# ===================================================================
|
||||
|
||||
class TestGitHubPRWebhook:
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_github_pr_webhook_triggers_agent(self):
|
||||
"""POST with a realistic GitHub PR payload should:
|
||||
1. Return 202 Accepted
|
||||
2. Call handle_message with a MessageEvent
|
||||
3. The event text contains the rendered prompt
|
||||
4. The event source has chat_type 'webhook'
|
||||
"""
|
||||
secret = "gh-webhook-test-secret"
|
||||
routes = {
|
||||
"github-pr": {
|
||||
"secret": secret,
|
||||
"events": ["pull_request"],
|
||||
"prompt": (
|
||||
"Review PR #{number} by {sender.login}: "
|
||||
"{pull_request.title}\n\n{pull_request.body}"
|
||||
),
|
||||
"deliver": "log",
|
||||
}
|
||||
}
|
||||
adapter = _make_adapter(routes)
|
||||
|
||||
captured_events: list[MessageEvent] = []
|
||||
|
||||
async def _capture(event: MessageEvent):
|
||||
captured_events.append(event)
|
||||
|
||||
adapter.handle_message = _capture
|
||||
|
||||
app = _create_app(adapter)
|
||||
body = json.dumps(GITHUB_PR_PAYLOAD).encode()
|
||||
sig = _github_signature(body, secret)
|
||||
|
||||
async with TestClient(TestServer(app)) as cli:
|
||||
resp = await cli.post(
|
||||
"/webhooks/github-pr",
|
||||
data=body,
|
||||
headers={
|
||||
"Content-Type": "application/json",
|
||||
"X-GitHub-Event": "pull_request",
|
||||
"X-Hub-Signature-256": sig,
|
||||
"X-GitHub-Delivery": "gh-delivery-001",
|
||||
},
|
||||
)
|
||||
assert resp.status == 202
|
||||
data = await resp.json()
|
||||
assert data["status"] == "accepted"
|
||||
assert data["route"] == "github-pr"
|
||||
assert data["event"] == "pull_request"
|
||||
assert data["delivery_id"] == "gh-delivery-001"
|
||||
|
||||
# Let the asyncio.create_task fire
|
||||
await asyncio.sleep(0.05)
|
||||
|
||||
assert len(captured_events) == 1
|
||||
event = captured_events[0]
|
||||
assert "Review PR #42 by contributor" in event.text
|
||||
assert "Add webhook adapter" in event.text
|
||||
assert event.source.chat_type == "webhook"
|
||||
assert event.source.platform == Platform.WEBHOOK
|
||||
assert "github-pr" in event.source.chat_id
|
||||
assert event.message_id == "gh-delivery-001"
|
||||
|
||||
|
||||
# ===================================================================
|
||||
# Test 2: Skills injected into prompt
|
||||
# ===================================================================
|
||||
|
||||
class TestSkillsInjection:
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_skills_injected_into_prompt(self):
|
||||
"""When a route has skills: [code-review], the adapter should
|
||||
call build_skill_invocation_message() and use its output as the
|
||||
prompt instead of the raw template render."""
|
||||
routes = {
|
||||
"pr-review": {
|
||||
"secret": _INSECURE_NO_AUTH,
|
||||
"events": ["pull_request"],
|
||||
"prompt": "Review this PR: {pull_request.title}",
|
||||
"skills": ["code-review"],
|
||||
}
|
||||
}
|
||||
adapter = _make_adapter(routes)
|
||||
|
||||
captured_events: list[MessageEvent] = []
|
||||
|
||||
async def _capture(event: MessageEvent):
|
||||
captured_events.append(event)
|
||||
|
||||
adapter.handle_message = _capture
|
||||
|
||||
skill_content = (
|
||||
"You are a code reviewer. Review the following:\n"
|
||||
"Review this PR: Add webhook adapter"
|
||||
)
|
||||
|
||||
# The imports are lazy (inside the handler), so patch the source module
|
||||
with patch(
|
||||
"agent.skill_commands.build_skill_invocation_message",
|
||||
return_value=skill_content,
|
||||
) as mock_build, patch(
|
||||
"agent.skill_commands.get_skill_commands",
|
||||
return_value={"/code-review": {"name": "code-review"}},
|
||||
):
|
||||
app = _create_app(adapter)
|
||||
async with TestClient(TestServer(app)) as cli:
|
||||
resp = await cli.post(
|
||||
"/webhooks/pr-review",
|
||||
json=GITHUB_PR_PAYLOAD,
|
||||
headers={
|
||||
"X-GitHub-Event": "pull_request",
|
||||
"X-GitHub-Delivery": "skill-test-001",
|
||||
},
|
||||
)
|
||||
assert resp.status == 202
|
||||
|
||||
await asyncio.sleep(0.05)
|
||||
|
||||
assert len(captured_events) == 1
|
||||
event = captured_events[0]
|
||||
# The prompt should be the skill content, not the raw template
|
||||
assert "You are a code reviewer" in event.text
|
||||
mock_build.assert_called_once()
|
||||
|
||||
|
||||
# ===================================================================
|
||||
# Test 3: Cross-platform delivery (webhook → Telegram)
|
||||
# ===================================================================
|
||||
|
||||
class TestCrossPlatformDelivery:
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_cross_platform_delivery(self):
|
||||
"""When deliver='telegram', the response is routed to the
|
||||
Telegram adapter via gateway_runner.adapters."""
|
||||
routes = {
|
||||
"alerts": {
|
||||
"secret": _INSECURE_NO_AUTH,
|
||||
"prompt": "Alert: {message}",
|
||||
"deliver": "telegram",
|
||||
"deliver_extra": {"chat_id": "12345"},
|
||||
}
|
||||
}
|
||||
adapter = _make_adapter(routes)
|
||||
adapter.handle_message = AsyncMock()
|
||||
|
||||
# Set up a mock gateway runner with a mock Telegram adapter
|
||||
mock_tg_adapter = AsyncMock()
|
||||
mock_tg_adapter.send = AsyncMock(return_value=SendResult(success=True))
|
||||
|
||||
mock_runner = MagicMock()
|
||||
mock_runner.adapters = {Platform.TELEGRAM: mock_tg_adapter}
|
||||
mock_runner.config = GatewayConfig(
|
||||
platforms={Platform.TELEGRAM: PlatformConfig(enabled=True, token="fake")}
|
||||
)
|
||||
adapter.gateway_runner = mock_runner
|
||||
|
||||
# First, simulate a webhook POST to set up delivery_info
|
||||
app = _create_app(adapter)
|
||||
async with TestClient(TestServer(app)) as cli:
|
||||
resp = await cli.post(
|
||||
"/webhooks/alerts",
|
||||
json={"message": "Server is on fire!"},
|
||||
headers={"X-GitHub-Delivery": "alert-001"},
|
||||
)
|
||||
assert resp.status == 202
|
||||
|
||||
# The adapter should have stored delivery info
|
||||
chat_id = "webhook:alerts:alert-001"
|
||||
assert chat_id in adapter._delivery_info
|
||||
|
||||
# Now call send() as if the agent has finished
|
||||
result = await adapter.send(chat_id, "I've acknowledged the alert.")
|
||||
|
||||
assert result.success is True
|
||||
mock_tg_adapter.send.assert_awaited_once_with(
|
||||
"12345", "I've acknowledged the alert."
|
||||
)
|
||||
# Delivery info should be cleaned up
|
||||
assert chat_id not in adapter._delivery_info
|
||||
|
||||
|
||||
# ===================================================================
|
||||
# Test 4: GitHub comment delivery via gh CLI
|
||||
# ===================================================================
|
||||
|
||||
class TestGitHubCommentDelivery:
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_github_comment_delivery(self):
|
||||
"""When deliver='github_comment', the adapter invokes
|
||||
``gh pr comment`` via subprocess.run (mocked)."""
|
||||
routes = {
|
||||
"pr-bot": {
|
||||
"secret": _INSECURE_NO_AUTH,
|
||||
"prompt": "Review: {pull_request.title}",
|
||||
"deliver": "github_comment",
|
||||
"deliver_extra": {
|
||||
"repo": "{repository.full_name}",
|
||||
"pr_number": "{number}",
|
||||
},
|
||||
}
|
||||
}
|
||||
adapter = _make_adapter(routes)
|
||||
adapter.handle_message = AsyncMock()
|
||||
|
||||
# POST a webhook to set up delivery info
|
||||
app = _create_app(adapter)
|
||||
async with TestClient(TestServer(app)) as cli:
|
||||
resp = await cli.post(
|
||||
"/webhooks/pr-bot",
|
||||
json=GITHUB_PR_PAYLOAD,
|
||||
headers={
|
||||
"X-GitHub-Event": "pull_request",
|
||||
"X-GitHub-Delivery": "gh-comment-001",
|
||||
},
|
||||
)
|
||||
assert resp.status == 202
|
||||
|
||||
chat_id = "webhook:pr-bot:gh-comment-001"
|
||||
assert chat_id in adapter._delivery_info
|
||||
|
||||
# Verify deliver_extra was rendered with payload data
|
||||
delivery = adapter._delivery_info[chat_id]
|
||||
assert delivery["deliver_extra"]["repo"] == "org/repo"
|
||||
assert delivery["deliver_extra"]["pr_number"] == "42"
|
||||
|
||||
# Mock subprocess.run and call send()
|
||||
mock_result = MagicMock()
|
||||
mock_result.returncode = 0
|
||||
mock_result.stdout = "Comment posted"
|
||||
mock_result.stderr = ""
|
||||
|
||||
with patch(
|
||||
"gateway.platforms.webhook.subprocess.run",
|
||||
return_value=mock_result,
|
||||
) as mock_run:
|
||||
result = await adapter.send(
|
||||
chat_id, "LGTM! The code looks great."
|
||||
)
|
||||
|
||||
assert result.success is True
|
||||
mock_run.assert_called_once_with(
|
||||
[
|
||||
"gh", "pr", "comment", "42",
|
||||
"--repo", "org/repo",
|
||||
"--body", "LGTM! The code looks great.",
|
||||
],
|
||||
capture_output=True,
|
||||
text=True,
|
||||
timeout=30,
|
||||
)
|
||||
# Delivery info cleaned up
|
||||
assert chat_id not in adapter._delivery_info
|
||||
@@ -0,0 +1,208 @@
|
||||
"""Tests for hermes_cli.copilot_auth — Copilot token validation and resolution."""
|
||||
|
||||
import os
|
||||
import pytest
|
||||
from unittest.mock import patch, MagicMock
|
||||
|
||||
|
||||
class TestTokenValidation:
|
||||
"""Token type validation."""
|
||||
|
||||
def test_classic_pat_rejected(self):
|
||||
from hermes_cli.copilot_auth import validate_copilot_token
|
||||
valid, msg = validate_copilot_token("ghp_abcdefghijklmnop1234")
|
||||
assert valid is False
|
||||
assert "Classic Personal Access Tokens" in msg
|
||||
assert "ghp_" in msg
|
||||
|
||||
def test_oauth_token_accepted(self):
|
||||
from hermes_cli.copilot_auth import validate_copilot_token
|
||||
valid, msg = validate_copilot_token("gho_abcdefghijklmnop1234")
|
||||
assert valid is True
|
||||
|
||||
def test_fine_grained_pat_accepted(self):
|
||||
from hermes_cli.copilot_auth import validate_copilot_token
|
||||
valid, msg = validate_copilot_token("github_pat_abcdefghijklmnop1234")
|
||||
assert valid is True
|
||||
|
||||
def test_github_app_token_accepted(self):
|
||||
from hermes_cli.copilot_auth import validate_copilot_token
|
||||
valid, msg = validate_copilot_token("ghu_abcdefghijklmnop1234")
|
||||
assert valid is True
|
||||
|
||||
def test_empty_token_rejected(self):
|
||||
from hermes_cli.copilot_auth import validate_copilot_token
|
||||
valid, msg = validate_copilot_token("")
|
||||
assert valid is False
|
||||
|
||||
def test_is_classic_pat(self):
|
||||
from hermes_cli.copilot_auth import is_classic_pat
|
||||
assert is_classic_pat("ghp_abc123") is True
|
||||
assert is_classic_pat("gho_abc123") is False
|
||||
assert is_classic_pat("github_pat_abc") is False
|
||||
assert is_classic_pat("") is False
|
||||
|
||||
|
||||
class TestResolveToken:
|
||||
"""Token resolution with env var priority."""
|
||||
|
||||
def test_copilot_github_token_first_priority(self, monkeypatch):
|
||||
from hermes_cli.copilot_auth import resolve_copilot_token
|
||||
monkeypatch.setenv("COPILOT_GITHUB_TOKEN", "gho_copilot_first")
|
||||
monkeypatch.setenv("GH_TOKEN", "gho_gh_second")
|
||||
monkeypatch.setenv("GITHUB_TOKEN", "gho_github_third")
|
||||
token, source = resolve_copilot_token()
|
||||
assert token == "gho_copilot_first"
|
||||
assert source == "COPILOT_GITHUB_TOKEN"
|
||||
|
||||
def test_gh_token_second_priority(self, monkeypatch):
|
||||
from hermes_cli.copilot_auth import resolve_copilot_token
|
||||
monkeypatch.delenv("COPILOT_GITHUB_TOKEN", raising=False)
|
||||
monkeypatch.setenv("GH_TOKEN", "gho_gh_second")
|
||||
monkeypatch.setenv("GITHUB_TOKEN", "gho_github_third")
|
||||
token, source = resolve_copilot_token()
|
||||
assert token == "gho_gh_second"
|
||||
assert source == "GH_TOKEN"
|
||||
|
||||
def test_github_token_third_priority(self, monkeypatch):
|
||||
from hermes_cli.copilot_auth import resolve_copilot_token
|
||||
monkeypatch.delenv("COPILOT_GITHUB_TOKEN", raising=False)
|
||||
monkeypatch.delenv("GH_TOKEN", raising=False)
|
||||
monkeypatch.setenv("GITHUB_TOKEN", "gho_github_third")
|
||||
token, source = resolve_copilot_token()
|
||||
assert token == "gho_github_third"
|
||||
assert source == "GITHUB_TOKEN"
|
||||
|
||||
def test_classic_pat_in_env_skipped(self, monkeypatch):
|
||||
"""Classic PATs in env vars should be skipped, not returned."""
|
||||
from hermes_cli.copilot_auth import resolve_copilot_token
|
||||
monkeypatch.setenv("COPILOT_GITHUB_TOKEN", "ghp_classic_pat_nope")
|
||||
monkeypatch.delenv("GH_TOKEN", raising=False)
|
||||
monkeypatch.setenv("GITHUB_TOKEN", "gho_valid_oauth")
|
||||
token, source = resolve_copilot_token()
|
||||
# Should skip the ghp_ token and find the gho_ one
|
||||
assert token == "gho_valid_oauth"
|
||||
assert source == "GITHUB_TOKEN"
|
||||
|
||||
def test_gh_cli_fallback(self, monkeypatch):
|
||||
from hermes_cli.copilot_auth import resolve_copilot_token
|
||||
monkeypatch.delenv("COPILOT_GITHUB_TOKEN", raising=False)
|
||||
monkeypatch.delenv("GH_TOKEN", raising=False)
|
||||
monkeypatch.delenv("GITHUB_TOKEN", raising=False)
|
||||
with patch("hermes_cli.copilot_auth._try_gh_cli_token", return_value="gho_from_cli"):
|
||||
token, source = resolve_copilot_token()
|
||||
assert token == "gho_from_cli"
|
||||
assert source == "gh auth token"
|
||||
|
||||
def test_gh_cli_classic_pat_raises(self, monkeypatch):
|
||||
from hermes_cli.copilot_auth import resolve_copilot_token
|
||||
monkeypatch.delenv("COPILOT_GITHUB_TOKEN", raising=False)
|
||||
monkeypatch.delenv("GH_TOKEN", raising=False)
|
||||
monkeypatch.delenv("GITHUB_TOKEN", raising=False)
|
||||
with patch("hermes_cli.copilot_auth._try_gh_cli_token", return_value="ghp_classic"):
|
||||
with pytest.raises(ValueError, match="classic PAT"):
|
||||
resolve_copilot_token()
|
||||
|
||||
def test_no_token_returns_empty(self, monkeypatch):
|
||||
from hermes_cli.copilot_auth import resolve_copilot_token
|
||||
monkeypatch.delenv("COPILOT_GITHUB_TOKEN", raising=False)
|
||||
monkeypatch.delenv("GH_TOKEN", raising=False)
|
||||
monkeypatch.delenv("GITHUB_TOKEN", raising=False)
|
||||
with patch("hermes_cli.copilot_auth._try_gh_cli_token", return_value=None):
|
||||
token, source = resolve_copilot_token()
|
||||
assert token == ""
|
||||
assert source == ""
|
||||
|
||||
|
||||
class TestRequestHeaders:
|
||||
"""Copilot API header generation."""
|
||||
|
||||
def test_default_headers_include_openai_intent(self):
|
||||
from hermes_cli.copilot_auth import copilot_request_headers
|
||||
headers = copilot_request_headers()
|
||||
assert headers["Openai-Intent"] == "conversation-edits"
|
||||
assert headers["User-Agent"] == "HermesAgent/1.0"
|
||||
assert "Editor-Version" in headers
|
||||
|
||||
def test_agent_turn_sets_initiator(self):
|
||||
from hermes_cli.copilot_auth import copilot_request_headers
|
||||
headers = copilot_request_headers(is_agent_turn=True)
|
||||
assert headers["x-initiator"] == "agent"
|
||||
|
||||
def test_user_turn_sets_initiator(self):
|
||||
from hermes_cli.copilot_auth import copilot_request_headers
|
||||
headers = copilot_request_headers(is_agent_turn=False)
|
||||
assert headers["x-initiator"] == "user"
|
||||
|
||||
def test_vision_header(self):
|
||||
from hermes_cli.copilot_auth import copilot_request_headers
|
||||
headers = copilot_request_headers(is_vision=True)
|
||||
assert headers["Copilot-Vision-Request"] == "true"
|
||||
|
||||
def test_no_vision_header_by_default(self):
|
||||
from hermes_cli.copilot_auth import copilot_request_headers
|
||||
headers = copilot_request_headers()
|
||||
assert "Copilot-Vision-Request" not in headers
|
||||
|
||||
|
||||
class TestCopilotDefaultHeaders:
|
||||
"""The models.py copilot_default_headers uses copilot_auth."""
|
||||
|
||||
def test_includes_openai_intent(self):
|
||||
from hermes_cli.models import copilot_default_headers
|
||||
headers = copilot_default_headers()
|
||||
assert "Openai-Intent" in headers
|
||||
assert headers["Openai-Intent"] == "conversation-edits"
|
||||
|
||||
def test_includes_x_initiator(self):
|
||||
from hermes_cli.models import copilot_default_headers
|
||||
headers = copilot_default_headers()
|
||||
assert "x-initiator" in headers
|
||||
|
||||
|
||||
class TestApiModeSelection:
|
||||
"""API mode selection matching opencode's shouldUseCopilotResponsesApi."""
|
||||
|
||||
def test_gpt5_uses_responses(self):
|
||||
from hermes_cli.models import _should_use_copilot_responses_api
|
||||
assert _should_use_copilot_responses_api("gpt-5.4") is True
|
||||
assert _should_use_copilot_responses_api("gpt-5.4-mini") is True
|
||||
assert _should_use_copilot_responses_api("gpt-5.3-codex") is True
|
||||
assert _should_use_copilot_responses_api("gpt-5.2-codex") is True
|
||||
assert _should_use_copilot_responses_api("gpt-5.2") is True
|
||||
assert _should_use_copilot_responses_api("gpt-5.1-codex-max") is True
|
||||
|
||||
def test_gpt5_mini_excluded(self):
|
||||
from hermes_cli.models import _should_use_copilot_responses_api
|
||||
assert _should_use_copilot_responses_api("gpt-5-mini") is False
|
||||
|
||||
def test_gpt4_uses_chat(self):
|
||||
from hermes_cli.models import _should_use_copilot_responses_api
|
||||
assert _should_use_copilot_responses_api("gpt-4.1") is False
|
||||
assert _should_use_copilot_responses_api("gpt-4o") is False
|
||||
assert _should_use_copilot_responses_api("gpt-4o-mini") is False
|
||||
|
||||
def test_non_gpt_uses_chat(self):
|
||||
from hermes_cli.models import _should_use_copilot_responses_api
|
||||
assert _should_use_copilot_responses_api("claude-sonnet-4.6") is False
|
||||
assert _should_use_copilot_responses_api("claude-opus-4.6") is False
|
||||
assert _should_use_copilot_responses_api("gemini-2.5-pro") is False
|
||||
assert _should_use_copilot_responses_api("grok-code-fast-1") is False
|
||||
|
||||
|
||||
class TestEnvVarOrder:
|
||||
"""PROVIDER_REGISTRY has correct env var order."""
|
||||
|
||||
def test_copilot_env_vars_include_copilot_github_token(self):
|
||||
from hermes_cli.auth import PROVIDER_REGISTRY
|
||||
copilot = PROVIDER_REGISTRY["copilot"]
|
||||
assert "COPILOT_GITHUB_TOKEN" in copilot.api_key_env_vars
|
||||
# COPILOT_GITHUB_TOKEN should be first
|
||||
assert copilot.api_key_env_vars[0] == "COPILOT_GITHUB_TOKEN"
|
||||
|
||||
def test_copilot_env_vars_order_matches_docs(self):
|
||||
from hermes_cli.auth import PROVIDER_REGISTRY
|
||||
copilot = PROVIDER_REGISTRY["copilot"]
|
||||
assert copilot.api_key_env_vars == (
|
||||
"COPILOT_GITHUB_TOKEN", "GH_TOKEN", "GITHUB_TOKEN"
|
||||
)
|
||||
@@ -3,8 +3,12 @@
|
||||
from unittest.mock import patch
|
||||
|
||||
from hermes_cli.models import (
|
||||
copilot_model_api_mode,
|
||||
fetch_github_model_catalog,
|
||||
curated_models_for_provider,
|
||||
fetch_api_models,
|
||||
github_model_reasoning_efforts,
|
||||
normalize_copilot_model_id,
|
||||
normalize_provider,
|
||||
parse_model_input,
|
||||
probe_api_models,
|
||||
@@ -116,6 +120,7 @@ class TestNormalizeProvider:
|
||||
assert normalize_provider("glm") == "zai"
|
||||
assert normalize_provider("kimi") == "kimi-coding"
|
||||
assert normalize_provider("moonshot") == "kimi-coding"
|
||||
assert normalize_provider("github-copilot") == "copilot"
|
||||
|
||||
def test_case_insensitive(self):
|
||||
assert normalize_provider("OpenRouter") == "openrouter"
|
||||
@@ -125,6 +130,8 @@ class TestProviderLabel:
|
||||
def test_known_labels_and_auto(self):
|
||||
assert provider_label("anthropic") == "Anthropic"
|
||||
assert provider_label("kimi") == "Kimi / Moonshot"
|
||||
assert provider_label("copilot") == "GitHub Copilot"
|
||||
assert provider_label("copilot-acp") == "GitHub Copilot ACP"
|
||||
assert provider_label("auto") == "Auto"
|
||||
|
||||
def test_unknown_provider_preserves_original_name(self):
|
||||
@@ -145,6 +152,24 @@ class TestProviderModelIds:
|
||||
def test_zai_returns_glm_models(self):
|
||||
assert "glm-5" in provider_model_ids("zai")
|
||||
|
||||
def test_copilot_prefers_live_catalog(self):
|
||||
with patch("hermes_cli.auth.resolve_api_key_provider_credentials", return_value={"api_key": "gh-token"}), \
|
||||
patch("hermes_cli.models._fetch_github_models", return_value=["gpt-5.4", "claude-sonnet-4.6"]):
|
||||
assert provider_model_ids("copilot") == ["gpt-5.4", "claude-sonnet-4.6"]
|
||||
|
||||
def test_copilot_acp_reuses_copilot_catalog(self):
|
||||
with patch("hermes_cli.auth.resolve_api_key_provider_credentials", return_value={"api_key": "gh-token"}), \
|
||||
patch("hermes_cli.models._fetch_github_models", return_value=["gpt-5.4", "claude-sonnet-4.6"]):
|
||||
assert provider_model_ids("copilot-acp") == ["gpt-5.4", "claude-sonnet-4.6"]
|
||||
|
||||
def test_copilot_acp_falls_back_to_copilot_defaults(self):
|
||||
with patch("hermes_cli.auth.resolve_api_key_provider_credentials", side_effect=Exception("no token")), \
|
||||
patch("hermes_cli.models._fetch_github_models", return_value=None):
|
||||
ids = provider_model_ids("copilot-acp")
|
||||
|
||||
assert "gpt-5.4" in ids
|
||||
assert "copilot-acp" not in ids
|
||||
|
||||
|
||||
# -- fetch_api_models --------------------------------------------------------
|
||||
|
||||
@@ -183,6 +208,112 @@ class TestFetchApiModels:
|
||||
assert probe["resolved_base_url"] == "http://localhost:8000/v1"
|
||||
assert probe["used_fallback"] is True
|
||||
|
||||
def test_probe_api_models_uses_copilot_catalog(self):
|
||||
class _Resp:
|
||||
def __enter__(self):
|
||||
return self
|
||||
|
||||
def __exit__(self, exc_type, exc, tb):
|
||||
return False
|
||||
|
||||
def read(self):
|
||||
return b'{"data": [{"id": "gpt-5.4", "model_picker_enabled": true, "supported_endpoints": ["/responses"], "capabilities": {"type": "chat", "supports": {"reasoning_effort": ["low", "medium", "high"]}}}, {"id": "claude-sonnet-4.6", "model_picker_enabled": true, "supported_endpoints": ["/chat/completions"], "capabilities": {"type": "chat", "supports": {"reasoning_effort": ["low", "medium", "high"]}}}, {"id": "text-embedding-3-small", "model_picker_enabled": true, "capabilities": {"type": "embedding"}}]}'
|
||||
|
||||
with patch("hermes_cli.models.urllib.request.urlopen", return_value=_Resp()) as mock_urlopen:
|
||||
probe = probe_api_models("gh-token", "https://api.githubcopilot.com")
|
||||
|
||||
assert mock_urlopen.call_args[0][0].full_url == "https://api.githubcopilot.com/models"
|
||||
assert probe["models"] == ["gpt-5.4", "claude-sonnet-4.6"]
|
||||
assert probe["resolved_base_url"] == "https://api.githubcopilot.com"
|
||||
assert probe["used_fallback"] is False
|
||||
|
||||
def test_fetch_github_model_catalog_filters_non_chat_models(self):
|
||||
class _Resp:
|
||||
def __enter__(self):
|
||||
return self
|
||||
|
||||
def __exit__(self, exc_type, exc, tb):
|
||||
return False
|
||||
|
||||
def read(self):
|
||||
return b'{"data": [{"id": "gpt-5.4", "model_picker_enabled": true, "supported_endpoints": ["/responses"], "capabilities": {"type": "chat", "supports": {"reasoning_effort": ["low", "medium", "high"]}}}, {"id": "text-embedding-3-small", "model_picker_enabled": true, "capabilities": {"type": "embedding"}}]}'
|
||||
|
||||
with patch("hermes_cli.models.urllib.request.urlopen", return_value=_Resp()):
|
||||
catalog = fetch_github_model_catalog("gh-token")
|
||||
|
||||
assert catalog is not None
|
||||
assert [item["id"] for item in catalog] == ["gpt-5.4"]
|
||||
|
||||
|
||||
class TestGithubReasoningEfforts:
|
||||
def test_gpt5_supports_minimal_to_high(self):
|
||||
catalog = [{
|
||||
"id": "gpt-5.4",
|
||||
"capabilities": {"type": "chat", "supports": {"reasoning_effort": ["low", "medium", "high"]}},
|
||||
"supported_endpoints": ["/responses"],
|
||||
}]
|
||||
assert github_model_reasoning_efforts("gpt-5.4", catalog=catalog) == [
|
||||
"low",
|
||||
"medium",
|
||||
"high",
|
||||
]
|
||||
|
||||
def test_legacy_catalog_reasoning_still_supported(self):
|
||||
catalog = [{"id": "openai/o3", "capabilities": ["reasoning"]}]
|
||||
assert github_model_reasoning_efforts("openai/o3", catalog=catalog) == [
|
||||
"low",
|
||||
"medium",
|
||||
"high",
|
||||
]
|
||||
|
||||
def test_non_reasoning_model_returns_empty(self):
|
||||
catalog = [{"id": "gpt-4.1", "capabilities": {"type": "chat", "supports": {}}}]
|
||||
assert github_model_reasoning_efforts("gpt-4.1", catalog=catalog) == []
|
||||
|
||||
|
||||
class TestCopilotNormalization:
|
||||
def test_normalize_old_github_models_slug(self):
|
||||
catalog = [{"id": "gpt-4.1"}, {"id": "gpt-5.4"}]
|
||||
assert normalize_copilot_model_id("openai/gpt-4.1-mini", catalog=catalog) == "gpt-4.1"
|
||||
|
||||
def test_copilot_api_mode_gpt5_uses_responses(self):
|
||||
"""GPT-5+ models should use Responses API (matching opencode)."""
|
||||
assert copilot_model_api_mode("gpt-5.4") == "codex_responses"
|
||||
assert copilot_model_api_mode("gpt-5.4-mini") == "codex_responses"
|
||||
assert copilot_model_api_mode("gpt-5.3-codex") == "codex_responses"
|
||||
assert copilot_model_api_mode("gpt-5.2-codex") == "codex_responses"
|
||||
assert copilot_model_api_mode("gpt-5.2") == "codex_responses"
|
||||
|
||||
def test_copilot_api_mode_gpt5_mini_uses_chat(self):
|
||||
"""gpt-5-mini is the exception — uses Chat Completions."""
|
||||
assert copilot_model_api_mode("gpt-5-mini") == "chat_completions"
|
||||
|
||||
def test_copilot_api_mode_non_gpt5_uses_chat(self):
|
||||
"""Non-GPT-5 models use Chat Completions."""
|
||||
assert copilot_model_api_mode("gpt-4.1") == "chat_completions"
|
||||
assert copilot_model_api_mode("gpt-4o") == "chat_completions"
|
||||
assert copilot_model_api_mode("gpt-4o-mini") == "chat_completions"
|
||||
assert copilot_model_api_mode("claude-sonnet-4.6") == "chat_completions"
|
||||
assert copilot_model_api_mode("claude-opus-4.6") == "chat_completions"
|
||||
assert copilot_model_api_mode("gemini-2.5-pro") == "chat_completions"
|
||||
|
||||
def test_copilot_api_mode_with_catalog_both_endpoints(self):
|
||||
"""When catalog shows both endpoints, model ID pattern wins."""
|
||||
catalog = [{
|
||||
"id": "gpt-5.4",
|
||||
"supported_endpoints": ["/chat/completions", "/responses"],
|
||||
}]
|
||||
# GPT-5.4 should use responses even though chat/completions is listed
|
||||
assert copilot_model_api_mode("gpt-5.4", catalog=catalog) == "codex_responses"
|
||||
|
||||
def test_copilot_api_mode_with_catalog_only_responses(self):
|
||||
catalog = [{
|
||||
"id": "gpt-5.4",
|
||||
"supported_endpoints": ["/responses"],
|
||||
"capabilities": {"type": "chat"},
|
||||
}]
|
||||
assert copilot_model_api_mode("gpt-5.4", catalog=catalog) == "codex_responses"
|
||||
|
||||
|
||||
# -- validate — format checks -----------------------------------------------
|
||||
|
||||
|
||||
@@ -97,30 +97,32 @@ def test_custom_setup_clears_active_oauth_provider(tmp_path, monkeypatch):
|
||||
|
||||
monkeypatch.setattr("hermes_cli.setup.prompt_choice", fake_prompt_choice)
|
||||
|
||||
prompt_values = iter(
|
||||
[
|
||||
"https://custom.example/v1",
|
||||
"custom-api-key",
|
||||
"custom/model",
|
||||
]
|
||||
)
|
||||
monkeypatch.setattr(
|
||||
"hermes_cli.setup.prompt",
|
||||
lambda *args, **kwargs: next(prompt_values),
|
||||
)
|
||||
# _model_flow_custom uses builtins.input (URL, key, model, context_length)
|
||||
input_values = iter([
|
||||
"https://custom.example/v1",
|
||||
"custom-api-key",
|
||||
"custom/model",
|
||||
"", # context_length (blank = auto-detect)
|
||||
])
|
||||
monkeypatch.setattr("builtins.input", lambda _prompt="": next(input_values))
|
||||
monkeypatch.setattr("hermes_cli.setup.prompt_yes_no", lambda *args, **kwargs: False)
|
||||
monkeypatch.setattr("hermes_cli.auth.detect_external_credentials", lambda: [])
|
||||
monkeypatch.setattr("hermes_cli.main._save_custom_provider", lambda *args, **kwargs: None)
|
||||
monkeypatch.setattr(
|
||||
"hermes_cli.models.probe_api_models",
|
||||
lambda api_key, base_url: {"models": ["m"], "probed_url": base_url + "/models"},
|
||||
)
|
||||
|
||||
setup_model_provider(config)
|
||||
save_config(config)
|
||||
|
||||
reloaded = load_config()
|
||||
|
||||
# Core assertion: switching to custom endpoint clears OAuth provider
|
||||
assert get_active_provider() is None
|
||||
assert isinstance(reloaded["model"], dict)
|
||||
assert reloaded["model"]["provider"] == "custom"
|
||||
assert reloaded["model"]["base_url"] == "https://custom.example/v1"
|
||||
assert reloaded["model"]["default"] == "custom/model"
|
||||
|
||||
# _model_flow_custom writes config via its own load/save cycle
|
||||
reloaded = load_config()
|
||||
if isinstance(reloaded.get("model"), dict):
|
||||
assert reloaded["model"].get("provider") == "custom"
|
||||
assert reloaded["model"].get("default") == "custom/model"
|
||||
|
||||
|
||||
def test_codex_setup_uses_runtime_access_token_for_live_model_list(tmp_path, monkeypatch):
|
||||
|
||||
@@ -32,6 +32,8 @@ def _clear_provider_env(monkeypatch):
|
||||
"OPENAI_BASE_URL",
|
||||
"OPENAI_API_KEY",
|
||||
"OPENROUTER_API_KEY",
|
||||
"GITHUB_TOKEN",
|
||||
"GH_TOKEN",
|
||||
"GLM_API_KEY",
|
||||
"KIMI_API_KEY",
|
||||
"MINIMAX_API_KEY",
|
||||
@@ -97,21 +99,21 @@ def test_setup_custom_endpoint_saves_working_v1_base_url(tmp_path, monkeypatch):
|
||||
return tts_idx
|
||||
raise AssertionError(f"Unexpected prompt_choice call: {question}")
|
||||
|
||||
def fake_prompt(message, current=None, **kwargs):
|
||||
if "API base URL" in message:
|
||||
return "http://localhost:8000"
|
||||
if "API key" in message:
|
||||
return "local-key"
|
||||
if "Model name" in message:
|
||||
return "llm"
|
||||
return ""
|
||||
# _model_flow_custom uses builtins.input (URL, key, model, context_length)
|
||||
input_values = iter([
|
||||
"http://localhost:8000",
|
||||
"local-key",
|
||||
"llm",
|
||||
"", # context_length (blank = auto-detect)
|
||||
])
|
||||
monkeypatch.setattr("builtins.input", lambda _prompt="": next(input_values))
|
||||
|
||||
monkeypatch.setattr("hermes_cli.setup.prompt_choice", fake_prompt_choice)
|
||||
monkeypatch.setattr("hermes_cli.setup.prompt", fake_prompt)
|
||||
monkeypatch.setattr("hermes_cli.setup.prompt_yes_no", lambda *args, **kwargs: False)
|
||||
monkeypatch.setattr("hermes_cli.auth.get_active_provider", lambda: None)
|
||||
monkeypatch.setattr("hermes_cli.auth.detect_external_credentials", lambda: [])
|
||||
monkeypatch.setattr("agent.auxiliary_client.get_available_vision_backends", lambda: [])
|
||||
monkeypatch.setattr("hermes_cli.main._save_custom_provider", lambda *args, **kwargs: None)
|
||||
monkeypatch.setattr(
|
||||
"hermes_cli.models.probe_api_models",
|
||||
lambda api_key, base_url: {
|
||||
@@ -124,16 +126,19 @@ def test_setup_custom_endpoint_saves_working_v1_base_url(tmp_path, monkeypatch):
|
||||
)
|
||||
|
||||
setup_model_provider(config)
|
||||
save_config(config)
|
||||
|
||||
env = _read_env(tmp_path)
|
||||
reloaded = load_config()
|
||||
|
||||
# _model_flow_custom saves env vars and config to disk
|
||||
assert env.get("OPENAI_BASE_URL") == "http://localhost:8000/v1"
|
||||
assert env.get("OPENAI_API_KEY") == "local-key"
|
||||
assert reloaded["model"]["provider"] == "custom"
|
||||
assert reloaded["model"]["base_url"] == "http://localhost:8000/v1"
|
||||
assert reloaded["model"]["default"] == "llm"
|
||||
|
||||
# The model config is saved as a dict by _model_flow_custom
|
||||
reloaded = load_config()
|
||||
model_cfg = reloaded.get("model", {})
|
||||
if isinstance(model_cfg, dict):
|
||||
assert model_cfg.get("provider") == "custom"
|
||||
assert model_cfg.get("default") == "llm"
|
||||
|
||||
|
||||
def test_setup_keep_current_config_provider_uses_provider_specific_model_menu(tmp_path, monkeypatch):
|
||||
@@ -231,6 +236,152 @@ def test_setup_keep_current_anthropic_can_configure_openai_vision_default(tmp_pa
|
||||
assert env.get("AUXILIARY_VISION_MODEL") == "gpt-4o-mini"
|
||||
|
||||
|
||||
def test_setup_copilot_uses_gh_auth_and_saves_provider(tmp_path, monkeypatch):
|
||||
monkeypatch.setenv("HERMES_HOME", str(tmp_path))
|
||||
_clear_provider_env(monkeypatch)
|
||||
|
||||
config = load_config()
|
||||
|
||||
def fake_prompt_choice(question, choices, default=0):
|
||||
if question == "Select your inference provider:":
|
||||
assert choices[14] == "GitHub Copilot (uses GITHUB_TOKEN or gh auth token)"
|
||||
return 14
|
||||
if question == "Select default model:":
|
||||
assert "gpt-4.1" in choices
|
||||
assert "gpt-5.4" in choices
|
||||
return choices.index("gpt-5.4")
|
||||
if question == "Select reasoning effort:":
|
||||
assert "low" in choices
|
||||
assert "high" in choices
|
||||
return choices.index("high")
|
||||
if question == "Configure vision:":
|
||||
return len(choices) - 1
|
||||
tts_idx = _maybe_keep_current_tts(question, choices)
|
||||
if tts_idx is not None:
|
||||
return tts_idx
|
||||
raise AssertionError(f"Unexpected prompt_choice call: {question}")
|
||||
|
||||
def fake_prompt(message, *args, **kwargs):
|
||||
raise AssertionError(f"Unexpected prompt call: {message}")
|
||||
|
||||
def fake_get_auth_status(provider_id):
|
||||
if provider_id == "copilot":
|
||||
return {"logged_in": True}
|
||||
return {"logged_in": False}
|
||||
|
||||
monkeypatch.setattr("hermes_cli.setup.prompt_choice", fake_prompt_choice)
|
||||
monkeypatch.setattr("hermes_cli.setup.prompt", fake_prompt)
|
||||
monkeypatch.setattr("hermes_cli.setup.prompt_yes_no", lambda *args, **kwargs: False)
|
||||
monkeypatch.setattr("hermes_cli.auth.get_active_provider", lambda: None)
|
||||
monkeypatch.setattr("hermes_cli.auth.detect_external_credentials", lambda: [])
|
||||
monkeypatch.setattr("hermes_cli.auth.get_auth_status", fake_get_auth_status)
|
||||
monkeypatch.setattr(
|
||||
"hermes_cli.auth.resolve_api_key_provider_credentials",
|
||||
lambda provider_id: {
|
||||
"provider": provider_id,
|
||||
"api_key": "gh-cli-token",
|
||||
"base_url": "https://api.githubcopilot.com",
|
||||
"source": "gh auth token",
|
||||
},
|
||||
)
|
||||
monkeypatch.setattr(
|
||||
"hermes_cli.models.fetch_github_model_catalog",
|
||||
lambda api_key: [
|
||||
{
|
||||
"id": "gpt-4.1",
|
||||
"capabilities": {"type": "chat", "supports": {}},
|
||||
"supported_endpoints": ["/chat/completions"],
|
||||
},
|
||||
{
|
||||
"id": "gpt-5.4",
|
||||
"capabilities": {"type": "chat", "supports": {"reasoning_effort": ["low", "medium", "high"]}},
|
||||
"supported_endpoints": ["/responses"],
|
||||
},
|
||||
],
|
||||
)
|
||||
monkeypatch.setattr("agent.auxiliary_client.get_available_vision_backends", lambda: [])
|
||||
|
||||
setup_model_provider(config)
|
||||
save_config(config)
|
||||
|
||||
env = _read_env(tmp_path)
|
||||
reloaded = load_config()
|
||||
|
||||
assert env.get("GITHUB_TOKEN") is None
|
||||
assert reloaded["model"]["provider"] == "copilot"
|
||||
assert reloaded["model"]["base_url"] == "https://api.githubcopilot.com"
|
||||
assert reloaded["model"]["default"] == "gpt-5.4"
|
||||
assert reloaded["model"]["api_mode"] == "codex_responses"
|
||||
assert reloaded["agent"]["reasoning_effort"] == "high"
|
||||
|
||||
|
||||
def test_setup_copilot_acp_uses_model_picker_and_saves_provider(tmp_path, monkeypatch):
|
||||
monkeypatch.setenv("HERMES_HOME", str(tmp_path))
|
||||
_clear_provider_env(monkeypatch)
|
||||
|
||||
config = load_config()
|
||||
|
||||
def fake_prompt_choice(question, choices, default=0):
|
||||
if question == "Select your inference provider:":
|
||||
assert choices[15] == "GitHub Copilot ACP (spawns `copilot --acp --stdio`)"
|
||||
return 15
|
||||
if question == "Select default model:":
|
||||
assert "gpt-4.1" in choices
|
||||
assert "gpt-5.4" in choices
|
||||
return choices.index("gpt-5.4")
|
||||
if question == "Configure vision:":
|
||||
return len(choices) - 1
|
||||
tts_idx = _maybe_keep_current_tts(question, choices)
|
||||
if tts_idx is not None:
|
||||
return tts_idx
|
||||
raise AssertionError(f"Unexpected prompt_choice call: {question}")
|
||||
|
||||
def fake_prompt(message, *args, **kwargs):
|
||||
raise AssertionError(f"Unexpected prompt call: {message}")
|
||||
|
||||
monkeypatch.setattr("hermes_cli.setup.prompt_choice", fake_prompt_choice)
|
||||
monkeypatch.setattr("hermes_cli.setup.prompt", fake_prompt)
|
||||
monkeypatch.setattr("hermes_cli.setup.prompt_yes_no", lambda *args, **kwargs: False)
|
||||
monkeypatch.setattr("hermes_cli.auth.get_active_provider", lambda: None)
|
||||
monkeypatch.setattr("hermes_cli.auth.detect_external_credentials", lambda: [])
|
||||
monkeypatch.setattr("hermes_cli.auth.get_auth_status", lambda provider_id: {"logged_in": provider_id == "copilot-acp"})
|
||||
monkeypatch.setattr(
|
||||
"hermes_cli.auth.resolve_api_key_provider_credentials",
|
||||
lambda provider_id: {
|
||||
"provider": "copilot",
|
||||
"api_key": "gh-cli-token",
|
||||
"base_url": "https://api.githubcopilot.com",
|
||||
"source": "gh auth token",
|
||||
},
|
||||
)
|
||||
monkeypatch.setattr(
|
||||
"hermes_cli.models.fetch_github_model_catalog",
|
||||
lambda api_key: [
|
||||
{
|
||||
"id": "gpt-4.1",
|
||||
"capabilities": {"type": "chat", "supports": {}},
|
||||
"supported_endpoints": ["/chat/completions"],
|
||||
},
|
||||
{
|
||||
"id": "gpt-5.4",
|
||||
"capabilities": {"type": "chat", "supports": {"reasoning_effort": ["low", "medium", "high"]}},
|
||||
"supported_endpoints": ["/responses"],
|
||||
},
|
||||
],
|
||||
)
|
||||
monkeypatch.setattr("agent.auxiliary_client.get_available_vision_backends", lambda: [])
|
||||
|
||||
setup_model_provider(config)
|
||||
save_config(config)
|
||||
|
||||
reloaded = load_config()
|
||||
|
||||
assert reloaded["model"]["provider"] == "copilot-acp"
|
||||
assert reloaded["model"]["base_url"] == "acp://copilot"
|
||||
assert reloaded["model"]["default"] == "gpt-5.4"
|
||||
assert reloaded["model"]["api_mode"] == "chat_completions"
|
||||
|
||||
|
||||
def test_setup_switch_custom_to_codex_clears_custom_endpoint_and_updates_config(tmp_path, monkeypatch):
|
||||
"""Switching from custom to Codex should clear custom endpoint overrides."""
|
||||
monkeypatch.setenv("HERMES_HOME", str(tmp_path))
|
||||
|
||||
@@ -60,6 +60,21 @@ class TestFromEnv:
|
||||
config = HonchoClientConfig.from_env(workspace_id="custom")
|
||||
assert config.workspace_id == "custom"
|
||||
|
||||
def test_reads_base_url_from_env(self):
|
||||
with patch.dict(os.environ, {"HONCHO_BASE_URL": "http://localhost:8000"}, clear=False):
|
||||
config = HonchoClientConfig.from_env()
|
||||
assert config.base_url == "http://localhost:8000"
|
||||
assert config.enabled is True
|
||||
|
||||
def test_enabled_without_api_key_when_base_url_set(self):
|
||||
"""base_url alone (no API key) is sufficient to enable a local instance."""
|
||||
with patch.dict(os.environ, {"HONCHO_BASE_URL": "http://localhost:8000"}, clear=False):
|
||||
os.environ.pop("HONCHO_API_KEY", None)
|
||||
config = HonchoClientConfig.from_env()
|
||||
assert config.api_key is None
|
||||
assert config.base_url == "http://localhost:8000"
|
||||
assert config.enabled is True
|
||||
|
||||
|
||||
class TestFromGlobalConfig:
|
||||
def test_missing_config_falls_back_to_env(self, tmp_path):
|
||||
@@ -188,6 +203,36 @@ class TestFromGlobalConfig:
|
||||
config = HonchoClientConfig.from_global_config(config_path=config_file)
|
||||
assert config.api_key == "env-key"
|
||||
|
||||
def test_base_url_env_fallback(self, tmp_path):
|
||||
"""HONCHO_BASE_URL env var is used when no baseUrl in config JSON."""
|
||||
config_file = tmp_path / "config.json"
|
||||
config_file.write_text(json.dumps({"workspace": "local"}))
|
||||
|
||||
with patch.dict(os.environ, {"HONCHO_BASE_URL": "http://localhost:8000"}, clear=False):
|
||||
config = HonchoClientConfig.from_global_config(config_path=config_file)
|
||||
assert config.base_url == "http://localhost:8000"
|
||||
assert config.enabled is True
|
||||
|
||||
def test_base_url_from_config_root(self, tmp_path):
|
||||
"""baseUrl in config root is read and takes precedence over env var."""
|
||||
config_file = tmp_path / "config.json"
|
||||
config_file.write_text(json.dumps({"baseUrl": "http://config-host:9000"}))
|
||||
|
||||
with patch.dict(os.environ, {"HONCHO_BASE_URL": "http://localhost:8000"}, clear=False):
|
||||
config = HonchoClientConfig.from_global_config(config_path=config_file)
|
||||
assert config.base_url == "http://config-host:9000"
|
||||
|
||||
def test_base_url_not_read_from_host_block(self, tmp_path):
|
||||
"""baseUrl is a root-level connection setting, not overridable per-host (consistent with apiKey)."""
|
||||
config_file = tmp_path / "config.json"
|
||||
config_file.write_text(json.dumps({
|
||||
"baseUrl": "http://root:9000",
|
||||
"hosts": {"hermes": {"baseUrl": "http://host-block:9001"}},
|
||||
}))
|
||||
|
||||
config = HonchoClientConfig.from_global_config(config_path=config_file)
|
||||
assert config.base_url == "http://root:9000"
|
||||
|
||||
|
||||
class TestResolveSessionName:
|
||||
def test_manual_override(self):
|
||||
|
||||
@@ -578,21 +578,39 @@ class TestConvertMessages:
|
||||
|
||||
def test_converts_tool_results(self):
|
||||
messages = [
|
||||
{
|
||||
"role": "assistant",
|
||||
"content": "",
|
||||
"tool_calls": [
|
||||
{"id": "tc_1", "function": {"name": "test_tool", "arguments": "{}"}},
|
||||
],
|
||||
},
|
||||
{"role": "tool", "tool_call_id": "tc_1", "content": "result data"},
|
||||
]
|
||||
_, result = convert_messages_to_anthropic(messages)
|
||||
assert result[0]["role"] == "user"
|
||||
assert result[0]["content"][0]["type"] == "tool_result"
|
||||
assert result[0]["content"][0]["tool_use_id"] == "tc_1"
|
||||
# tool result is in the second message (user role)
|
||||
user_msg = [m for m in result if m["role"] == "user"][0]
|
||||
assert user_msg["content"][0]["type"] == "tool_result"
|
||||
assert user_msg["content"][0]["tool_use_id"] == "tc_1"
|
||||
|
||||
def test_merges_consecutive_tool_results(self):
|
||||
messages = [
|
||||
{
|
||||
"role": "assistant",
|
||||
"content": "",
|
||||
"tool_calls": [
|
||||
{"id": "tc_1", "function": {"name": "tool_a", "arguments": "{}"}},
|
||||
{"id": "tc_2", "function": {"name": "tool_b", "arguments": "{}"}},
|
||||
],
|
||||
},
|
||||
{"role": "tool", "tool_call_id": "tc_1", "content": "result 1"},
|
||||
{"role": "tool", "tool_call_id": "tc_2", "content": "result 2"},
|
||||
]
|
||||
_, result = convert_messages_to_anthropic(messages)
|
||||
assert len(result) == 1
|
||||
assert len(result[0]["content"]) == 2
|
||||
# assistant + merged user (with 2 tool_results)
|
||||
user_msgs = [m for m in result if m["role"] == "user"]
|
||||
assert len(user_msgs) == 1
|
||||
assert len(user_msgs[0]["content"]) == 2
|
||||
|
||||
def test_strips_orphaned_tool_use(self):
|
||||
messages = [
|
||||
@@ -610,6 +628,51 @@ class TestConvertMessages:
|
||||
assistant_blocks = result[0]["content"]
|
||||
assert all(b.get("type") != "tool_use" for b in assistant_blocks)
|
||||
|
||||
def test_strips_orphaned_tool_result(self):
|
||||
"""tool_result with no matching tool_use should be stripped.
|
||||
|
||||
This happens when context compression removes the assistant message
|
||||
containing the tool_use but leaves the subsequent tool_result intact.
|
||||
Anthropic rejects orphaned tool_results with a 400.
|
||||
"""
|
||||
messages = [
|
||||
{"role": "user", "content": "Hello"},
|
||||
{"role": "assistant", "content": "Hi there"},
|
||||
# The assistant tool_use message was removed by compression,
|
||||
# but the tool_result survived:
|
||||
{"role": "tool", "tool_call_id": "tc_gone", "content": "stale result"},
|
||||
{"role": "user", "content": "Thanks"},
|
||||
]
|
||||
_, result = convert_messages_to_anthropic(messages)
|
||||
# tc_gone has no matching tool_use — its tool_result should be stripped
|
||||
for m in result:
|
||||
if m["role"] == "user" and isinstance(m["content"], list):
|
||||
assert all(
|
||||
b.get("type") != "tool_result"
|
||||
for b in m["content"]
|
||||
), "Orphaned tool_result should have been stripped"
|
||||
|
||||
def test_strips_orphaned_tool_result_preserves_valid(self):
|
||||
"""Orphaned tool_results are stripped while valid ones survive."""
|
||||
messages = [
|
||||
{
|
||||
"role": "assistant",
|
||||
"content": "",
|
||||
"tool_calls": [
|
||||
{"id": "tc_valid", "function": {"name": "search", "arguments": "{}"}},
|
||||
],
|
||||
},
|
||||
{"role": "tool", "tool_call_id": "tc_valid", "content": "good result"},
|
||||
{"role": "tool", "tool_call_id": "tc_orphan", "content": "stale result"},
|
||||
]
|
||||
_, result = convert_messages_to_anthropic(messages)
|
||||
user_msg = [m for m in result if m["role"] == "user"][0]
|
||||
tool_results = [
|
||||
b for b in user_msg["content"] if b.get("type") == "tool_result"
|
||||
]
|
||||
assert len(tool_results) == 1
|
||||
assert tool_results[0]["tool_use_id"] == "tc_valid"
|
||||
|
||||
def test_system_with_cache_control(self):
|
||||
messages = [
|
||||
{
|
||||
@@ -641,11 +704,19 @@ class TestConvertMessages:
|
||||
def test_tool_cache_control_is_preserved_on_tool_result_block(self):
|
||||
messages = apply_anthropic_cache_control([
|
||||
{"role": "system", "content": "System prompt"},
|
||||
{
|
||||
"role": "assistant",
|
||||
"content": "",
|
||||
"tool_calls": [
|
||||
{"id": "tc_1", "function": {"name": "test_tool", "arguments": "{}"}},
|
||||
],
|
||||
},
|
||||
{"role": "tool", "tool_call_id": "tc_1", "content": "result"},
|
||||
])
|
||||
|
||||
_, result = convert_messages_to_anthropic(messages)
|
||||
tool_block = result[0]["content"][0]
|
||||
user_msg = [m for m in result if m["role"] == "user"][0]
|
||||
tool_block = user_msg["content"][0]
|
||||
|
||||
assert tool_block["type"] == "tool_result"
|
||||
assert tool_block["tool_use_id"] == "tc_1"
|
||||
|
||||
@@ -18,9 +18,12 @@ from hermes_cli.auth import (
|
||||
resolve_provider,
|
||||
get_api_key_provider_status,
|
||||
resolve_api_key_provider_credentials,
|
||||
get_external_process_provider_status,
|
||||
resolve_external_process_provider_credentials,
|
||||
get_auth_status,
|
||||
AuthError,
|
||||
KIMI_CODE_BASE_URL,
|
||||
_try_gh_cli_token,
|
||||
_resolve_kimi_base_url,
|
||||
)
|
||||
|
||||
@@ -33,6 +36,8 @@ class TestProviderRegistry:
|
||||
"""Test that new providers are correctly registered."""
|
||||
|
||||
@pytest.mark.parametrize("provider_id,name,auth_type", [
|
||||
("copilot-acp", "GitHub Copilot ACP", "external_process"),
|
||||
("copilot", "GitHub Copilot", "api_key"),
|
||||
("zai", "Z.AI / GLM", "api_key"),
|
||||
("kimi-coding", "Kimi / Moonshot", "api_key"),
|
||||
("minimax", "MiniMax", "api_key"),
|
||||
@@ -52,6 +57,11 @@ class TestProviderRegistry:
|
||||
assert pconfig.api_key_env_vars == ("GLM_API_KEY", "ZAI_API_KEY", "Z_AI_API_KEY")
|
||||
assert pconfig.base_url_env_var == "GLM_BASE_URL"
|
||||
|
||||
def test_copilot_env_vars(self):
|
||||
pconfig = PROVIDER_REGISTRY["copilot"]
|
||||
assert pconfig.api_key_env_vars == ("COPILOT_GITHUB_TOKEN", "GH_TOKEN", "GITHUB_TOKEN")
|
||||
assert pconfig.base_url_env_var == ""
|
||||
|
||||
def test_kimi_env_vars(self):
|
||||
pconfig = PROVIDER_REGISTRY["kimi-coding"]
|
||||
assert pconfig.api_key_env_vars == ("KIMI_API_KEY",)
|
||||
@@ -78,10 +88,12 @@ class TestProviderRegistry:
|
||||
assert pconfig.base_url_env_var == "KILOCODE_BASE_URL"
|
||||
|
||||
def test_base_urls(self):
|
||||
assert PROVIDER_REGISTRY["copilot"].inference_base_url == "https://api.githubcopilot.com"
|
||||
assert PROVIDER_REGISTRY["copilot-acp"].inference_base_url == "acp://copilot"
|
||||
assert PROVIDER_REGISTRY["zai"].inference_base_url == "https://api.z.ai/api/paas/v4"
|
||||
assert PROVIDER_REGISTRY["kimi-coding"].inference_base_url == "https://api.moonshot.ai/v1"
|
||||
assert PROVIDER_REGISTRY["minimax"].inference_base_url == "https://api.minimax.io/v1"
|
||||
assert PROVIDER_REGISTRY["minimax-cn"].inference_base_url == "https://api.minimaxi.com/v1"
|
||||
assert PROVIDER_REGISTRY["minimax"].inference_base_url == "https://api.minimax.io/anthropic"
|
||||
assert PROVIDER_REGISTRY["minimax-cn"].inference_base_url == "https://api.minimaxi.com/anthropic"
|
||||
assert PROVIDER_REGISTRY["ai-gateway"].inference_base_url == "https://ai-gateway.vercel.sh/v1"
|
||||
assert PROVIDER_REGISTRY["kilocode"].inference_base_url == "https://api.kilo.ai/api/gateway"
|
||||
|
||||
@@ -105,8 +117,9 @@ PROVIDER_ENV_VARS = (
|
||||
"AI_GATEWAY_API_KEY", "AI_GATEWAY_BASE_URL",
|
||||
"KILOCODE_API_KEY", "KILOCODE_BASE_URL",
|
||||
"DASHSCOPE_API_KEY", "OPENCODE_ZEN_API_KEY", "OPENCODE_GO_API_KEY",
|
||||
"NOUS_API_KEY",
|
||||
"OPENAI_BASE_URL",
|
||||
"NOUS_API_KEY", "GITHUB_TOKEN", "GH_TOKEN",
|
||||
"OPENAI_BASE_URL", "HERMES_COPILOT_ACP_COMMAND", "COPILOT_CLI_PATH",
|
||||
"HERMES_COPILOT_ACP_ARGS", "COPILOT_ACP_BASE_URL",
|
||||
)
|
||||
|
||||
|
||||
@@ -176,6 +189,16 @@ class TestResolveProvider:
|
||||
assert resolve_provider("Z-AI") == "zai"
|
||||
assert resolve_provider("Kimi") == "kimi-coding"
|
||||
|
||||
def test_alias_github_copilot(self):
|
||||
assert resolve_provider("github-copilot") == "copilot"
|
||||
|
||||
def test_alias_github_models(self):
|
||||
assert resolve_provider("github-models") == "copilot"
|
||||
|
||||
def test_alias_github_copilot_acp(self):
|
||||
assert resolve_provider("github-copilot-acp") == "copilot-acp"
|
||||
assert resolve_provider("copilot-acp-agent") == "copilot-acp"
|
||||
|
||||
def test_unknown_provider_raises(self):
|
||||
with pytest.raises(AuthError):
|
||||
resolve_provider("nonexistent-provider-xyz")
|
||||
@@ -218,6 +241,10 @@ class TestResolveProvider:
|
||||
monkeypatch.setenv("GLM_API_KEY", "glm-key")
|
||||
assert resolve_provider("auto") == "openrouter"
|
||||
|
||||
def test_auto_does_not_select_copilot_from_github_token(self, monkeypatch):
|
||||
monkeypatch.setenv("GITHUB_TOKEN", "gh-test-token")
|
||||
assert resolve_provider("auto") == "openrouter"
|
||||
|
||||
|
||||
# =============================================================================
|
||||
# API Key Provider Status tests
|
||||
@@ -251,12 +278,41 @@ class TestApiKeyProviderStatus:
|
||||
status = get_api_key_provider_status("kimi-coding")
|
||||
assert status["base_url"] == "https://custom.kimi.example/v1"
|
||||
|
||||
def test_copilot_status_uses_gh_cli_token(self, monkeypatch):
|
||||
monkeypatch.setattr("hermes_cli.copilot_auth._try_gh_cli_token", lambda: "gho_gh_cli_token")
|
||||
status = get_api_key_provider_status("copilot")
|
||||
assert status["configured"] is True
|
||||
assert status["logged_in"] is True
|
||||
assert status["key_source"] == "gh auth token"
|
||||
assert status["base_url"] == "https://api.githubcopilot.com"
|
||||
|
||||
def test_get_auth_status_dispatches_to_api_key(self, monkeypatch):
|
||||
monkeypatch.setenv("MINIMAX_API_KEY", "mm-key")
|
||||
status = get_auth_status("minimax")
|
||||
assert status["configured"] is True
|
||||
assert status["provider"] == "minimax"
|
||||
|
||||
def test_copilot_acp_status_detects_local_cli(self, monkeypatch):
|
||||
monkeypatch.setenv("HERMES_COPILOT_ACP_ARGS", "--acp --stdio --debug")
|
||||
monkeypatch.setattr("hermes_cli.auth.shutil.which", lambda command: f"/usr/local/bin/{command}")
|
||||
|
||||
status = get_external_process_provider_status("copilot-acp")
|
||||
|
||||
assert status["configured"] is True
|
||||
assert status["logged_in"] is True
|
||||
assert status["command"] == "copilot"
|
||||
assert status["resolved_command"] == "/usr/local/bin/copilot"
|
||||
assert status["args"] == ["--acp", "--stdio", "--debug"]
|
||||
assert status["base_url"] == "acp://copilot"
|
||||
|
||||
def test_get_auth_status_dispatches_to_external_process(self, monkeypatch):
|
||||
monkeypatch.setattr("hermes_cli.auth.shutil.which", lambda command: f"/opt/bin/{command}")
|
||||
|
||||
status = get_auth_status("copilot-acp")
|
||||
|
||||
assert status["configured"] is True
|
||||
assert status["provider"] == "copilot-acp"
|
||||
|
||||
def test_non_api_key_provider(self):
|
||||
status = get_api_key_provider_status("nous")
|
||||
assert status["configured"] is False
|
||||
@@ -276,6 +332,61 @@ class TestResolveApiKeyProviderCredentials:
|
||||
assert creds["base_url"] == "https://api.z.ai/api/paas/v4"
|
||||
assert creds["source"] == "GLM_API_KEY"
|
||||
|
||||
def test_resolve_copilot_with_github_token(self, monkeypatch):
|
||||
monkeypatch.setenv("GITHUB_TOKEN", "gh-env-secret")
|
||||
creds = resolve_api_key_provider_credentials("copilot")
|
||||
assert creds["provider"] == "copilot"
|
||||
assert creds["api_key"] == "gh-env-secret"
|
||||
assert creds["base_url"] == "https://api.githubcopilot.com"
|
||||
assert creds["source"] == "GITHUB_TOKEN"
|
||||
|
||||
def test_resolve_copilot_with_gh_cli_fallback(self, monkeypatch):
|
||||
monkeypatch.setattr("hermes_cli.copilot_auth._try_gh_cli_token", lambda: "gho_cli_secret")
|
||||
creds = resolve_api_key_provider_credentials("copilot")
|
||||
assert creds["provider"] == "copilot"
|
||||
assert creds["api_key"] == "gho_cli_secret"
|
||||
assert creds["base_url"] == "https://api.githubcopilot.com"
|
||||
assert creds["source"] == "gh auth token"
|
||||
|
||||
def test_try_gh_cli_token_uses_homebrew_path_when_not_on_path(self, monkeypatch):
|
||||
monkeypatch.setattr("hermes_cli.auth.shutil.which", lambda command: None)
|
||||
monkeypatch.setattr(
|
||||
"hermes_cli.auth.os.path.isfile",
|
||||
lambda path: path == "/opt/homebrew/bin/gh",
|
||||
)
|
||||
monkeypatch.setattr(
|
||||
"hermes_cli.auth.os.access",
|
||||
lambda path, mode: path == "/opt/homebrew/bin/gh" and mode == os.X_OK,
|
||||
)
|
||||
|
||||
calls = []
|
||||
|
||||
class _Result:
|
||||
returncode = 0
|
||||
stdout = "gh-cli-secret\n"
|
||||
|
||||
def _fake_run(cmd, capture_output, text, timeout):
|
||||
calls.append(cmd)
|
||||
return _Result()
|
||||
|
||||
monkeypatch.setattr("hermes_cli.auth.subprocess.run", _fake_run)
|
||||
|
||||
assert _try_gh_cli_token() == "gh-cli-secret"
|
||||
assert calls == [["/opt/homebrew/bin/gh", "auth", "token"]]
|
||||
|
||||
def test_resolve_copilot_acp_with_local_cli(self, monkeypatch):
|
||||
monkeypatch.setenv("HERMES_COPILOT_ACP_ARGS", "--acp --stdio")
|
||||
monkeypatch.setattr("hermes_cli.auth.shutil.which", lambda command: f"/usr/local/bin/{command}")
|
||||
|
||||
creds = resolve_external_process_provider_credentials("copilot-acp")
|
||||
|
||||
assert creds["provider"] == "copilot-acp"
|
||||
assert creds["api_key"] == "copilot-acp"
|
||||
assert creds["base_url"] == "acp://copilot"
|
||||
assert creds["command"] == "/usr/local/bin/copilot"
|
||||
assert creds["args"] == ["--acp", "--stdio"]
|
||||
assert creds["source"] == "process"
|
||||
|
||||
def test_resolve_kimi_with_key(self, monkeypatch):
|
||||
monkeypatch.setenv("KIMI_API_KEY", "kimi-secret-key")
|
||||
creds = resolve_api_key_provider_credentials("kimi-coding")
|
||||
@@ -288,14 +399,14 @@ class TestResolveApiKeyProviderCredentials:
|
||||
creds = resolve_api_key_provider_credentials("minimax")
|
||||
assert creds["provider"] == "minimax"
|
||||
assert creds["api_key"] == "mm-secret-key"
|
||||
assert creds["base_url"] == "https://api.minimax.io/v1"
|
||||
assert creds["base_url"] == "https://api.minimax.io/anthropic"
|
||||
|
||||
def test_resolve_minimax_cn_with_key(self, monkeypatch):
|
||||
monkeypatch.setenv("MINIMAX_CN_API_KEY", "mmcn-secret-key")
|
||||
creds = resolve_api_key_provider_credentials("minimax-cn")
|
||||
assert creds["provider"] == "minimax-cn"
|
||||
assert creds["api_key"] == "mmcn-secret-key"
|
||||
assert creds["base_url"] == "https://api.minimaxi.com/v1"
|
||||
assert creds["base_url"] == "https://api.minimaxi.com/anthropic"
|
||||
|
||||
def test_resolve_ai_gateway_with_key(self, monkeypatch):
|
||||
monkeypatch.setenv("AI_GATEWAY_API_KEY", "gw-secret-key")
|
||||
@@ -403,6 +514,53 @@ class TestRuntimeProviderResolution:
|
||||
assert result["provider"] == "kimi-coding"
|
||||
assert result["api_key"] == "auto-kimi-key"
|
||||
|
||||
def test_runtime_copilot_uses_gh_cli_token(self, monkeypatch):
|
||||
monkeypatch.setattr("hermes_cli.copilot_auth._try_gh_cli_token", lambda: "gho_cli_secret")
|
||||
from hermes_cli.runtime_provider import resolve_runtime_provider
|
||||
result = resolve_runtime_provider(requested="copilot")
|
||||
assert result["provider"] == "copilot"
|
||||
assert result["api_mode"] == "chat_completions"
|
||||
assert result["api_key"] == "gho_cli_secret"
|
||||
assert result["base_url"] == "https://api.githubcopilot.com"
|
||||
|
||||
def test_runtime_copilot_uses_responses_for_gpt_5_4(self, monkeypatch):
|
||||
monkeypatch.setattr("hermes_cli.copilot_auth._try_gh_cli_token", lambda: "gho_cli_secret")
|
||||
monkeypatch.setattr(
|
||||
"hermes_cli.runtime_provider._get_model_config",
|
||||
lambda: {"provider": "copilot", "default": "gpt-5.4"},
|
||||
)
|
||||
monkeypatch.setattr(
|
||||
"hermes_cli.models.fetch_github_model_catalog",
|
||||
lambda api_key=None, timeout=5.0: [
|
||||
{
|
||||
"id": "gpt-5.4",
|
||||
"supported_endpoints": ["/responses"],
|
||||
"capabilities": {"type": "chat"},
|
||||
}
|
||||
],
|
||||
)
|
||||
from hermes_cli.runtime_provider import resolve_runtime_provider
|
||||
|
||||
result = resolve_runtime_provider(requested="copilot")
|
||||
|
||||
assert result["provider"] == "copilot"
|
||||
assert result["api_mode"] == "codex_responses"
|
||||
|
||||
def test_runtime_copilot_acp_uses_process_runtime(self, monkeypatch):
|
||||
monkeypatch.setattr("hermes_cli.auth.shutil.which", lambda command: f"/usr/local/bin/{command}")
|
||||
monkeypatch.setenv("HERMES_COPILOT_ACP_ARGS", "--acp --stdio --debug")
|
||||
|
||||
from hermes_cli.runtime_provider import resolve_runtime_provider
|
||||
|
||||
result = resolve_runtime_provider(requested="copilot-acp")
|
||||
|
||||
assert result["provider"] == "copilot-acp"
|
||||
assert result["api_mode"] == "chat_completions"
|
||||
assert result["api_key"] == "copilot-acp"
|
||||
assert result["base_url"] == "acp://copilot"
|
||||
assert result["command"] == "/usr/local/bin/copilot"
|
||||
assert result["args"] == ["--acp", "--stdio", "--debug"]
|
||||
|
||||
|
||||
# =============================================================================
|
||||
# _has_any_provider_configured tests
|
||||
@@ -430,6 +588,16 @@ class TestHasAnyProviderConfigured:
|
||||
from hermes_cli.main import _has_any_provider_configured
|
||||
assert _has_any_provider_configured() is True
|
||||
|
||||
def test_gh_cli_token_counts(self, monkeypatch, tmp_path):
|
||||
from hermes_cli import config as config_module
|
||||
monkeypatch.setattr("hermes_cli.copilot_auth._try_gh_cli_token", lambda: "gho_cli_secret")
|
||||
hermes_home = tmp_path / ".hermes"
|
||||
hermes_home.mkdir()
|
||||
monkeypatch.setattr(config_module, "get_env_path", lambda: hermes_home / ".env")
|
||||
monkeypatch.setattr(config_module, "get_hermes_home", lambda: hermes_home)
|
||||
from hermes_cli.main import _has_any_provider_configured
|
||||
assert _has_any_provider_configured() is True
|
||||
|
||||
|
||||
# =============================================================================
|
||||
# Kimi Code auto-detection tests
|
||||
|
||||
@@ -42,6 +42,7 @@ def _make_cli(env_overrides=None, config_overrides=None, **kwargs):
|
||||
"prompt_toolkit.key_binding": MagicMock(),
|
||||
"prompt_toolkit.completion": MagicMock(),
|
||||
"prompt_toolkit.formatted_text": MagicMock(),
|
||||
"prompt_toolkit.auto_suggest": MagicMock(),
|
||||
}
|
||||
with patch.dict(sys.modules, prompt_toolkit_stubs), \
|
||||
patch.dict("os.environ", clean_env, clear=False):
|
||||
|
||||
@@ -12,6 +12,17 @@ from hermes_state import SessionDB
|
||||
from tools.todo_tool import TodoStore
|
||||
|
||||
|
||||
class _FakeCompressor:
|
||||
"""Minimal stand-in for ContextCompressor."""
|
||||
|
||||
def __init__(self):
|
||||
self.last_prompt_tokens = 500
|
||||
self.last_completion_tokens = 200
|
||||
self.last_total_tokens = 700
|
||||
self.compression_count = 3
|
||||
self._context_probed = True
|
||||
|
||||
|
||||
class _FakeAgent:
|
||||
def __init__(self, session_id: str, session_start):
|
||||
self.session_id = session_id
|
||||
@@ -25,6 +36,42 @@ class _FakeAgent:
|
||||
self.flush_memories = MagicMock()
|
||||
self._invalidate_system_prompt = MagicMock()
|
||||
|
||||
# Token counters (non-zero to verify reset)
|
||||
self.session_total_tokens = 1000
|
||||
self.session_input_tokens = 600
|
||||
self.session_output_tokens = 400
|
||||
self.session_prompt_tokens = 550
|
||||
self.session_completion_tokens = 350
|
||||
self.session_cache_read_tokens = 100
|
||||
self.session_cache_write_tokens = 50
|
||||
self.session_reasoning_tokens = 80
|
||||
self.session_api_calls = 5
|
||||
self.session_estimated_cost_usd = 0.42
|
||||
self.session_cost_status = "estimated"
|
||||
self.session_cost_source = "openrouter"
|
||||
self.context_compressor = _FakeCompressor()
|
||||
|
||||
def reset_session_state(self):
|
||||
"""Mirror the real AIAgent.reset_session_state()."""
|
||||
self.session_total_tokens = 0
|
||||
self.session_input_tokens = 0
|
||||
self.session_output_tokens = 0
|
||||
self.session_prompt_tokens = 0
|
||||
self.session_completion_tokens = 0
|
||||
self.session_cache_read_tokens = 0
|
||||
self.session_cache_write_tokens = 0
|
||||
self.session_reasoning_tokens = 0
|
||||
self.session_api_calls = 0
|
||||
self.session_estimated_cost_usd = 0.0
|
||||
self.session_cost_status = "unknown"
|
||||
self.session_cost_source = "none"
|
||||
if hasattr(self, "context_compressor") and self.context_compressor:
|
||||
self.context_compressor.last_prompt_tokens = 0
|
||||
self.context_compressor.last_completion_tokens = 0
|
||||
self.context_compressor.last_total_tokens = 0
|
||||
self.context_compressor.compression_count = 0
|
||||
self.context_compressor._context_probed = False
|
||||
|
||||
|
||||
def _make_cli(env_overrides=None, config_overrides=None, **kwargs):
|
||||
"""Create a HermesCLI instance with minimal mocking."""
|
||||
@@ -58,6 +105,7 @@ def _make_cli(env_overrides=None, config_overrides=None, **kwargs):
|
||||
"prompt_toolkit.key_binding": MagicMock(),
|
||||
"prompt_toolkit.completion": MagicMock(),
|
||||
"prompt_toolkit.formatted_text": MagicMock(),
|
||||
"prompt_toolkit.auto_suggest": MagicMock(),
|
||||
}
|
||||
with patch.dict(sys.modules, prompt_toolkit_stubs), patch.dict(
|
||||
"os.environ", clean_env, clear=False
|
||||
@@ -137,3 +185,38 @@ def test_clear_command_starts_new_session_before_redrawing(tmp_path):
|
||||
cli.console.clear.assert_called_once()
|
||||
cli.show_banner.assert_called_once()
|
||||
assert cli.conversation_history == []
|
||||
|
||||
|
||||
def test_new_session_resets_token_counters(tmp_path):
|
||||
"""Regression test for #2099: /new must zero all token counters."""
|
||||
cli = _prepare_cli_with_active_session(tmp_path)
|
||||
|
||||
# Verify counters are non-zero before reset
|
||||
agent = cli.agent
|
||||
assert agent.session_total_tokens > 0
|
||||
assert agent.session_api_calls > 0
|
||||
assert agent.context_compressor.compression_count > 0
|
||||
|
||||
cli.process_command("/new")
|
||||
|
||||
# All agent token counters must be zero
|
||||
assert agent.session_total_tokens == 0
|
||||
assert agent.session_input_tokens == 0
|
||||
assert agent.session_output_tokens == 0
|
||||
assert agent.session_prompt_tokens == 0
|
||||
assert agent.session_completion_tokens == 0
|
||||
assert agent.session_cache_read_tokens == 0
|
||||
assert agent.session_cache_write_tokens == 0
|
||||
assert agent.session_reasoning_tokens == 0
|
||||
assert agent.session_api_calls == 0
|
||||
assert agent.session_estimated_cost_usd == 0.0
|
||||
assert agent.session_cost_status == "unknown"
|
||||
assert agent.session_cost_source == "none"
|
||||
|
||||
# Context compressor counters must also be zero
|
||||
comp = agent.context_compressor
|
||||
assert comp.last_prompt_tokens == 0
|
||||
assert comp.last_completion_tokens == 0
|
||||
assert comp.last_total_tokens == 0
|
||||
assert comp.compression_count == 0
|
||||
assert comp._context_probed is False
|
||||
|
||||
@@ -459,7 +459,7 @@ def test_model_flow_custom_saves_verified_v1_base_url(monkeypatch, capsys):
|
||||
)
|
||||
monkeypatch.setattr("hermes_cli.config.save_config", lambda cfg: None)
|
||||
|
||||
answers = iter(["http://localhost:8000", "local-key", "llm"])
|
||||
answers = iter(["http://localhost:8000", "local-key", "llm", ""])
|
||||
monkeypatch.setattr("builtins.input", lambda _prompt="": next(answers))
|
||||
|
||||
hermes_main._model_flow_custom({})
|
||||
|
||||
@@ -0,0 +1,199 @@
|
||||
"""Tests for context compression boundary alignment.
|
||||
|
||||
Verifies that _align_boundary_backward correctly handles tool result groups
|
||||
so that parallel tool calls are never split during compression.
|
||||
"""
|
||||
|
||||
import pytest
|
||||
from unittest.mock import patch, MagicMock
|
||||
|
||||
from agent.context_compressor import ContextCompressor
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Helpers
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
def _tc(call_id: str) -> dict:
|
||||
"""Create a minimal tool_call dict."""
|
||||
return {"id": call_id, "type": "function", "function": {"name": "test", "arguments": "{}"}}
|
||||
|
||||
|
||||
def _tool_result(call_id: str, content: str = "result") -> dict:
|
||||
"""Create a tool result message."""
|
||||
return {"role": "tool", "tool_call_id": call_id, "content": content}
|
||||
|
||||
|
||||
def _assistant_with_tools(*call_ids: str) -> dict:
|
||||
"""Create an assistant message with tool_calls."""
|
||||
return {"role": "assistant", "tool_calls": [_tc(cid) for cid in call_ids], "content": None}
|
||||
|
||||
|
||||
def _make_compressor(**kwargs) -> ContextCompressor:
|
||||
defaults = dict(
|
||||
model="test-model",
|
||||
threshold_percent=0.75,
|
||||
protect_first_n=3,
|
||||
protect_last_n=4,
|
||||
quiet_mode=True,
|
||||
)
|
||||
defaults.update(kwargs)
|
||||
with patch("agent.context_compressor.get_model_context_length", return_value=8000):
|
||||
return ContextCompressor(**defaults)
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# _align_boundary_backward tests
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
class TestAlignBoundaryBackward:
|
||||
"""Test that compress-end boundary never splits a tool_call/result group."""
|
||||
|
||||
def test_boundary_at_clean_position(self):
|
||||
"""Boundary after a user message — no adjustment needed."""
|
||||
comp = _make_compressor()
|
||||
messages = [
|
||||
{"role": "system", "content": "sys"},
|
||||
{"role": "user", "content": "hello"},
|
||||
{"role": "assistant", "content": "hi"},
|
||||
{"role": "user", "content": "do something"},
|
||||
_assistant_with_tools("tc_1"),
|
||||
_tool_result("tc_1", "done"),
|
||||
{"role": "user", "content": "thanks"}, # idx=6
|
||||
{"role": "assistant", "content": "np"},
|
||||
]
|
||||
# Boundary at 7, messages[6] = user — no adjustment
|
||||
assert comp._align_boundary_backward(messages, 7) == 7
|
||||
|
||||
def test_boundary_after_assistant_with_tools(self):
|
||||
"""Original case: boundary right after assistant with tool_calls."""
|
||||
comp = _make_compressor()
|
||||
messages = [
|
||||
{"role": "system", "content": "sys"},
|
||||
{"role": "user", "content": "hello"},
|
||||
{"role": "assistant", "content": "hi"},
|
||||
_assistant_with_tools("tc_1", "tc_2"), # idx=3
|
||||
_tool_result("tc_1"), # idx=4
|
||||
_tool_result("tc_2"), # idx=5
|
||||
{"role": "user", "content": "next"},
|
||||
]
|
||||
# Boundary at 4, messages[3] = assistant with tool_calls → pull back to 3
|
||||
assert comp._align_boundary_backward(messages, 4) == 3
|
||||
|
||||
def test_boundary_in_middle_of_tool_results(self):
|
||||
"""THE BUG: boundary falls between tool results of the same group."""
|
||||
comp = _make_compressor()
|
||||
messages = [
|
||||
{"role": "system", "content": "sys"},
|
||||
{"role": "user", "content": "hello"},
|
||||
{"role": "assistant", "content": "hi"},
|
||||
{"role": "user", "content": "do 5 things"},
|
||||
_assistant_with_tools("tc_A", "tc_B", "tc_C", "tc_D", "tc_E"), # idx=4
|
||||
_tool_result("tc_A", "result A"), # idx=5
|
||||
_tool_result("tc_B", "result B"), # idx=6
|
||||
_tool_result("tc_C", "result C"), # idx=7
|
||||
_tool_result("tc_D", "result D"), # idx=8
|
||||
_tool_result("tc_E", "result E"), # idx=9
|
||||
{"role": "user", "content": "ok"},
|
||||
{"role": "assistant", "content": "done"},
|
||||
]
|
||||
# Boundary at 8 — in middle of tool results. messages[7] = tool result.
|
||||
# Must walk back to idx=4 (the parent assistant).
|
||||
assert comp._align_boundary_backward(messages, 8) == 4
|
||||
|
||||
def test_boundary_at_last_tool_result(self):
|
||||
"""Boundary right after last tool result — messages[idx-1] is tool."""
|
||||
comp = _make_compressor()
|
||||
messages = [
|
||||
{"role": "system", "content": "sys"},
|
||||
{"role": "user", "content": "hello"},
|
||||
{"role": "assistant", "content": "hi"},
|
||||
_assistant_with_tools("tc_1", "tc_2", "tc_3"), # idx=3
|
||||
_tool_result("tc_1"), # idx=4
|
||||
_tool_result("tc_2"), # idx=5
|
||||
_tool_result("tc_3"), # idx=6
|
||||
{"role": "user", "content": "next"},
|
||||
]
|
||||
# Boundary at 7 — messages[6] is last tool result.
|
||||
# Walk back: [6]=tool, [5]=tool, [4]=tool, [3]=assistant with tools → idx=3
|
||||
assert comp._align_boundary_backward(messages, 7) == 3
|
||||
|
||||
def test_boundary_with_consecutive_tool_groups(self):
|
||||
"""Two consecutive tool groups — only walk back to the nearest parent."""
|
||||
comp = _make_compressor()
|
||||
messages = [
|
||||
{"role": "system", "content": "sys"},
|
||||
{"role": "user", "content": "hello"},
|
||||
_assistant_with_tools("tc_1"), # idx=2
|
||||
_tool_result("tc_1"), # idx=3
|
||||
{"role": "user", "content": "more"},
|
||||
_assistant_with_tools("tc_2", "tc_3"), # idx=5
|
||||
_tool_result("tc_2"), # idx=6
|
||||
_tool_result("tc_3"), # idx=7
|
||||
{"role": "user", "content": "done"},
|
||||
]
|
||||
# Boundary at 7 — messages[6] = tool result for tc_2 group
|
||||
# Walk back: [6]=tool, [5]=assistant with tools → idx=5
|
||||
assert comp._align_boundary_backward(messages, 7) == 5
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# End-to-end: compression must not lose tool results
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
class TestCompressionToolResultPreservation:
|
||||
"""Verify that compress() never silently drops tool results."""
|
||||
|
||||
def test_parallel_tool_results_not_lost(self):
|
||||
"""The exact scenario that triggered silent data loss before the fix."""
|
||||
comp = _make_compressor(protect_first_n=3, protect_last_n=4)
|
||||
|
||||
messages = [
|
||||
{"role": "system", "content": "You are helpful."}, # 0
|
||||
{"role": "user", "content": "Hello"}, # 1
|
||||
{"role": "assistant", "content": "Hi there!"}, # 2 (end of head)
|
||||
{"role": "user", "content": "Read 7 files for me"}, # 3
|
||||
_assistant_with_tools("tc_A", "tc_B", "tc_C", "tc_D", "tc_E", "tc_F", "tc_G"), # 4
|
||||
_tool_result("tc_A", "content of file A"), # 5
|
||||
_tool_result("tc_B", "content of file B"), # 6
|
||||
_tool_result("tc_C", "content of file C"), # 7
|
||||
_tool_result("tc_D", "content of file D"), # 8
|
||||
_tool_result("tc_E", "content of file E"), # 9
|
||||
_tool_result("tc_F", "content of file F"), # 10
|
||||
_tool_result("tc_G", "CRITICAL DATA in file G"), # 11 ← compress_end=15-4=11
|
||||
{"role": "user", "content": "Now summarize them"}, # 12
|
||||
{"role": "assistant", "content": "Here is the summary..."}, # 13
|
||||
{"role": "user", "content": "Thanks"}, # 14
|
||||
]
|
||||
# 15 messages. compress_end = 15 - 4 = 11 (before fix: splits tool group)
|
||||
|
||||
fake_summary = "[Summary of earlier conversation]"
|
||||
with patch.object(comp, "_generate_summary", return_value=fake_summary):
|
||||
result = comp.compress(messages, current_tokens=7000)
|
||||
|
||||
# After compression, no tool results should be orphaned/lost.
|
||||
# All tool results in the result must have a matching assistant tool_call.
|
||||
assistant_call_ids = set()
|
||||
for msg in result:
|
||||
if msg.get("role") == "assistant":
|
||||
for tc in msg.get("tool_calls") or []:
|
||||
cid = tc.get("id", "")
|
||||
if cid:
|
||||
assistant_call_ids.add(cid)
|
||||
|
||||
tool_result_ids = set()
|
||||
for msg in result:
|
||||
if msg.get("role") == "tool":
|
||||
cid = msg.get("tool_call_id")
|
||||
if cid:
|
||||
tool_result_ids.add(cid)
|
||||
|
||||
# Every tool result must have a parent — no orphans
|
||||
orphaned = tool_result_ids - assistant_call_ids
|
||||
assert not orphaned, f"Orphaned tool results found (data loss!): {orphaned}"
|
||||
|
||||
# Every assistant tool_call must have a real result (not a stub)
|
||||
for msg in result:
|
||||
if msg.get("role") == "tool":
|
||||
assert msg["content"] != "[Result from earlier conversation — see context summary above]", \
|
||||
f"Stub result found for {msg.get('tool_call_id')} — real result was lost"
|
||||
@@ -0,0 +1,249 @@
|
||||
"""Tests for context pressure warnings (user-facing, not injected into messages).
|
||||
|
||||
Covers:
|
||||
- Display formatting (CLI and gateway variants)
|
||||
- Flag tracking and threshold logic on AIAgent
|
||||
- Flag reset after compression
|
||||
- status_callback invocation
|
||||
"""
|
||||
|
||||
import json
|
||||
from types import SimpleNamespace
|
||||
from unittest.mock import MagicMock, patch
|
||||
|
||||
import pytest
|
||||
|
||||
from agent.display import format_context_pressure, format_context_pressure_gateway
|
||||
from run_agent import AIAgent
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Display formatting tests
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
class TestFormatContextPressure:
|
||||
"""CLI context pressure display (agent/display.py).
|
||||
|
||||
The bar shows progress toward the compaction threshold, not the
|
||||
raw context window. 60% = 60% of the way to compaction.
|
||||
"""
|
||||
|
||||
def test_60_percent_uses_info_icon(self):
|
||||
line = format_context_pressure(0.60, 100_000, 0.50)
|
||||
assert "◐" in line
|
||||
assert "60% to compaction" in line
|
||||
|
||||
def test_85_percent_uses_warning_icon(self):
|
||||
line = format_context_pressure(0.85, 100_000, 0.50)
|
||||
assert "⚠" in line
|
||||
assert "85% to compaction" in line
|
||||
|
||||
def test_bar_length_scales_with_progress(self):
|
||||
line_60 = format_context_pressure(0.60, 100_000, 0.50)
|
||||
line_85 = format_context_pressure(0.85, 100_000, 0.50)
|
||||
assert line_85.count("▰") > line_60.count("▰")
|
||||
|
||||
def test_shows_threshold_tokens(self):
|
||||
line = format_context_pressure(0.60, 100_000, 0.50)
|
||||
assert "100k" in line
|
||||
|
||||
def test_small_threshold(self):
|
||||
line = format_context_pressure(0.60, 500, 0.50)
|
||||
assert "500" in line
|
||||
|
||||
def test_shows_threshold_percent(self):
|
||||
line = format_context_pressure(0.85, 100_000, 0.50)
|
||||
assert "50%" in line # threshold percent shown
|
||||
|
||||
def test_imminent_hint_at_85(self):
|
||||
line = format_context_pressure(0.85, 100_000, 0.50)
|
||||
assert "compaction imminent" in line
|
||||
|
||||
def test_approaching_hint_below_85(self):
|
||||
line = format_context_pressure(0.60, 100_000, 0.80)
|
||||
assert "approaching compaction" in line
|
||||
|
||||
def test_no_compaction_when_disabled(self):
|
||||
line = format_context_pressure(0.85, 100_000, 0.50, compression_enabled=False)
|
||||
assert "no auto-compaction" in line
|
||||
|
||||
def test_returns_string(self):
|
||||
result = format_context_pressure(0.65, 128_000, 0.50)
|
||||
assert isinstance(result, str)
|
||||
|
||||
def test_over_100_percent_capped(self):
|
||||
"""Progress > 1.0 should not break the bar."""
|
||||
line = format_context_pressure(1.05, 100_000, 0.50)
|
||||
assert "▰" in line
|
||||
assert line.count("▰") == 20
|
||||
|
||||
|
||||
class TestFormatContextPressureGateway:
|
||||
"""Gateway (plain text) context pressure display."""
|
||||
|
||||
def test_60_percent_informational(self):
|
||||
msg = format_context_pressure_gateway(0.60, 0.50)
|
||||
assert "60% to compaction" in msg
|
||||
assert "50%" in msg # threshold shown
|
||||
|
||||
def test_85_percent_warning(self):
|
||||
msg = format_context_pressure_gateway(0.85, 0.50)
|
||||
assert "85% to compaction" in msg
|
||||
assert "imminent" in msg
|
||||
|
||||
def test_no_compaction_warning(self):
|
||||
msg = format_context_pressure_gateway(0.85, 0.50, compression_enabled=False)
|
||||
assert "disabled" in msg
|
||||
|
||||
def test_no_ansi_codes(self):
|
||||
msg = format_context_pressure_gateway(0.85, 0.50)
|
||||
assert "\033[" not in msg
|
||||
|
||||
def test_has_progress_bar(self):
|
||||
msg = format_context_pressure_gateway(0.85, 0.50)
|
||||
assert "▰" in msg
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# AIAgent context pressure flag tests
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
def _make_tool_defs(*names):
|
||||
return [
|
||||
{
|
||||
"type": "function",
|
||||
"function": {
|
||||
"name": n,
|
||||
"description": f"{n} tool",
|
||||
"parameters": {"type": "object", "properties": {}},
|
||||
},
|
||||
}
|
||||
for n in names
|
||||
]
|
||||
|
||||
|
||||
@pytest.fixture()
|
||||
def agent():
|
||||
"""Minimal AIAgent with mocked internals."""
|
||||
with (
|
||||
patch("run_agent.get_tool_definitions", return_value=_make_tool_defs("web_search")),
|
||||
patch("run_agent.check_toolset_requirements", return_value={}),
|
||||
patch("run_agent.OpenAI"),
|
||||
):
|
||||
a = AIAgent(
|
||||
api_key="test-key-1234567890",
|
||||
quiet_mode=True,
|
||||
skip_context_files=True,
|
||||
skip_memory=True,
|
||||
)
|
||||
a.client = MagicMock()
|
||||
return a
|
||||
|
||||
|
||||
class TestContextPressureFlags:
|
||||
"""Context pressure warning flag tracking on AIAgent."""
|
||||
|
||||
def test_flags_initialized_false(self, agent):
|
||||
assert agent._context_50_warned is False
|
||||
assert agent._context_70_warned is False
|
||||
|
||||
def test_emit_calls_status_callback(self, agent):
|
||||
"""status_callback should be invoked with event type and message."""
|
||||
cb = MagicMock()
|
||||
agent.status_callback = cb
|
||||
|
||||
compressor = MagicMock()
|
||||
compressor.context_length = 200_000
|
||||
compressor.threshold_tokens = 100_000 # 50%
|
||||
|
||||
agent._emit_context_pressure(0.85, compressor)
|
||||
|
||||
cb.assert_called_once()
|
||||
args = cb.call_args[0]
|
||||
assert args[0] == "context_pressure"
|
||||
assert "85% to compaction" in args[1]
|
||||
|
||||
def test_emit_no_callback_no_crash(self, agent):
|
||||
"""No status_callback set — should not crash."""
|
||||
agent.status_callback = None
|
||||
|
||||
compressor = MagicMock()
|
||||
compressor.context_length = 200_000
|
||||
compressor.threshold_tokens = 100_000
|
||||
|
||||
# Should not raise
|
||||
agent._emit_context_pressure(0.60, compressor)
|
||||
|
||||
def test_emit_prints_for_cli_platform(self, agent, capsys):
|
||||
"""CLI platform should always print context pressure, even in quiet_mode."""
|
||||
agent.quiet_mode = True
|
||||
agent.platform = "cli"
|
||||
agent.status_callback = None
|
||||
|
||||
compressor = MagicMock()
|
||||
compressor.context_length = 200_000
|
||||
compressor.threshold_tokens = 100_000
|
||||
|
||||
agent._emit_context_pressure(0.85, compressor)
|
||||
captured = capsys.readouterr()
|
||||
assert "▰" in captured.out
|
||||
assert "to compaction" in captured.out
|
||||
|
||||
def test_emit_skips_print_for_gateway_platform(self, agent, capsys):
|
||||
"""Gateway platforms get the callback, not CLI print."""
|
||||
agent.platform = "telegram"
|
||||
agent.status_callback = None
|
||||
|
||||
compressor = MagicMock()
|
||||
compressor.context_length = 200_000
|
||||
compressor.threshold_tokens = 100_000
|
||||
|
||||
agent._emit_context_pressure(0.85, compressor)
|
||||
captured = capsys.readouterr()
|
||||
assert "▰" not in captured.out
|
||||
|
||||
def test_flags_reset_on_compression(self, agent):
|
||||
"""After _compress_context, context pressure flags should reset."""
|
||||
agent._context_50_warned = True
|
||||
agent._context_70_warned = True
|
||||
agent.compression_enabled = True
|
||||
|
||||
# Mock the compressor's compress method to return minimal valid output
|
||||
agent.context_compressor = MagicMock()
|
||||
agent.context_compressor.compress.return_value = [
|
||||
{"role": "user", "content": "Summary of conversation so far."}
|
||||
]
|
||||
agent.context_compressor.context_length = 200_000
|
||||
agent.context_compressor.threshold_tokens = 100_000
|
||||
|
||||
# Mock _todo_store
|
||||
agent._todo_store = MagicMock()
|
||||
agent._todo_store.format_for_injection.return_value = None
|
||||
|
||||
# Mock _build_system_prompt
|
||||
agent._build_system_prompt = MagicMock(return_value="system prompt")
|
||||
agent._cached_system_prompt = "old system prompt"
|
||||
agent._session_db = None
|
||||
|
||||
messages = [
|
||||
{"role": "user", "content": "hello"},
|
||||
{"role": "assistant", "content": "hi there"},
|
||||
]
|
||||
agent._compress_context(messages, "system prompt")
|
||||
|
||||
assert agent._context_50_warned is False
|
||||
assert agent._context_70_warned is False
|
||||
|
||||
def test_emit_callback_error_handled(self, agent):
|
||||
"""If status_callback raises, it should be caught gracefully."""
|
||||
cb = MagicMock(side_effect=RuntimeError("callback boom"))
|
||||
agent.status_callback = cb
|
||||
|
||||
compressor = MagicMock()
|
||||
compressor.context_length = 200_000
|
||||
compressor.threshold_tokens = 100_000
|
||||
|
||||
# Should not raise
|
||||
agent._emit_context_pressure(0.85, compressor)
|
||||
@@ -0,0 +1,493 @@
|
||||
"""Tests for _query_local_context_length and the local server fallback in
|
||||
get_model_context_length.
|
||||
|
||||
All tests use synthetic inputs — no filesystem or live server required.
|
||||
"""
|
||||
|
||||
import sys
|
||||
import os
|
||||
import json
|
||||
from unittest.mock import MagicMock, patch
|
||||
|
||||
sys.path.insert(0, os.path.join(os.path.dirname(__file__), ".."))
|
||||
|
||||
import pytest
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# _query_local_context_length — unit tests with mocked httpx
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
class TestQueryLocalContextLengthOllama:
|
||||
"""_query_local_context_length with server_type == 'ollama'."""
|
||||
|
||||
def _make_resp(self, status_code, body):
|
||||
resp = MagicMock()
|
||||
resp.status_code = status_code
|
||||
resp.json.return_value = body
|
||||
return resp
|
||||
|
||||
def test_ollama_model_info_context_length(self):
|
||||
"""Reads context length from model_info dict in /api/show response."""
|
||||
from agent.model_metadata import _query_local_context_length
|
||||
|
||||
show_resp = self._make_resp(200, {
|
||||
"model_info": {"llama.context_length": 131072}
|
||||
})
|
||||
models_resp = self._make_resp(404, {})
|
||||
|
||||
client_mock = MagicMock()
|
||||
client_mock.__enter__ = lambda s: client_mock
|
||||
client_mock.__exit__ = MagicMock(return_value=False)
|
||||
client_mock.post.return_value = show_resp
|
||||
client_mock.get.return_value = models_resp
|
||||
|
||||
with patch("agent.model_metadata.detect_local_server_type", return_value="ollama"), \
|
||||
patch("httpx.Client", return_value=client_mock):
|
||||
result = _query_local_context_length("omnicoder-9b", "http://localhost:11434/v1")
|
||||
|
||||
assert result == 131072
|
||||
|
||||
def test_ollama_parameters_num_ctx(self):
|
||||
"""Falls back to num_ctx in parameters string when model_info lacks context_length."""
|
||||
from agent.model_metadata import _query_local_context_length
|
||||
|
||||
show_resp = self._make_resp(200, {
|
||||
"model_info": {},
|
||||
"parameters": "num_ctx 32768\ntemperature 0.7\n"
|
||||
})
|
||||
models_resp = self._make_resp(404, {})
|
||||
|
||||
client_mock = MagicMock()
|
||||
client_mock.__enter__ = lambda s: client_mock
|
||||
client_mock.__exit__ = MagicMock(return_value=False)
|
||||
client_mock.post.return_value = show_resp
|
||||
client_mock.get.return_value = models_resp
|
||||
|
||||
with patch("agent.model_metadata.detect_local_server_type", return_value="ollama"), \
|
||||
patch("httpx.Client", return_value=client_mock):
|
||||
result = _query_local_context_length("some-model", "http://localhost:11434/v1")
|
||||
|
||||
assert result == 32768
|
||||
|
||||
def test_ollama_show_404_falls_through(self):
|
||||
"""When /api/show returns 404, falls through to /v1/models/{model}."""
|
||||
from agent.model_metadata import _query_local_context_length
|
||||
|
||||
show_resp = self._make_resp(404, {})
|
||||
model_detail_resp = self._make_resp(200, {"max_model_len": 65536})
|
||||
|
||||
client_mock = MagicMock()
|
||||
client_mock.__enter__ = lambda s: client_mock
|
||||
client_mock.__exit__ = MagicMock(return_value=False)
|
||||
client_mock.post.return_value = show_resp
|
||||
client_mock.get.return_value = model_detail_resp
|
||||
|
||||
with patch("agent.model_metadata.detect_local_server_type", return_value="ollama"), \
|
||||
patch("httpx.Client", return_value=client_mock):
|
||||
result = _query_local_context_length("some-model", "http://localhost:11434/v1")
|
||||
|
||||
assert result == 65536
|
||||
|
||||
|
||||
class TestQueryLocalContextLengthVllm:
|
||||
"""_query_local_context_length with vLLM-style /v1/models/{model} response."""
|
||||
|
||||
def _make_resp(self, status_code, body):
|
||||
resp = MagicMock()
|
||||
resp.status_code = status_code
|
||||
resp.json.return_value = body
|
||||
return resp
|
||||
|
||||
def test_vllm_max_model_len(self):
|
||||
"""Reads max_model_len from /v1/models/{model} response."""
|
||||
from agent.model_metadata import _query_local_context_length
|
||||
|
||||
detail_resp = self._make_resp(200, {"id": "omnicoder-9b", "max_model_len": 100000})
|
||||
list_resp = self._make_resp(404, {})
|
||||
|
||||
client_mock = MagicMock()
|
||||
client_mock.__enter__ = lambda s: client_mock
|
||||
client_mock.__exit__ = MagicMock(return_value=False)
|
||||
client_mock.post.return_value = self._make_resp(404, {})
|
||||
client_mock.get.return_value = detail_resp
|
||||
|
||||
with patch("agent.model_metadata.detect_local_server_type", return_value="vllm"), \
|
||||
patch("httpx.Client", return_value=client_mock):
|
||||
result = _query_local_context_length("omnicoder-9b", "http://localhost:8000/v1")
|
||||
|
||||
assert result == 100000
|
||||
|
||||
def test_vllm_context_length_key(self):
|
||||
"""Reads context_length from /v1/models/{model} response."""
|
||||
from agent.model_metadata import _query_local_context_length
|
||||
|
||||
detail_resp = self._make_resp(200, {"id": "some-model", "context_length": 32768})
|
||||
|
||||
client_mock = MagicMock()
|
||||
client_mock.__enter__ = lambda s: client_mock
|
||||
client_mock.__exit__ = MagicMock(return_value=False)
|
||||
client_mock.post.return_value = self._make_resp(404, {})
|
||||
client_mock.get.return_value = detail_resp
|
||||
|
||||
with patch("agent.model_metadata.detect_local_server_type", return_value="vllm"), \
|
||||
patch("httpx.Client", return_value=client_mock):
|
||||
result = _query_local_context_length("some-model", "http://localhost:8000/v1")
|
||||
|
||||
assert result == 32768
|
||||
|
||||
|
||||
class TestQueryLocalContextLengthModelsList:
|
||||
"""_query_local_context_length: falls back to /v1/models list."""
|
||||
|
||||
def _make_resp(self, status_code, body):
|
||||
resp = MagicMock()
|
||||
resp.status_code = status_code
|
||||
resp.json.return_value = body
|
||||
return resp
|
||||
|
||||
def test_models_list_max_model_len(self):
|
||||
"""Finds context length for model in /v1/models list."""
|
||||
from agent.model_metadata import _query_local_context_length
|
||||
|
||||
detail_resp = self._make_resp(404, {})
|
||||
list_resp = self._make_resp(200, {
|
||||
"data": [
|
||||
{"id": "other-model", "max_model_len": 4096},
|
||||
{"id": "omnicoder-9b", "max_model_len": 131072},
|
||||
]
|
||||
})
|
||||
|
||||
call_count = [0]
|
||||
def side_effect(url, **kwargs):
|
||||
call_count[0] += 1
|
||||
if call_count[0] == 1:
|
||||
return detail_resp # /v1/models/omnicoder-9b
|
||||
return list_resp # /v1/models
|
||||
|
||||
client_mock = MagicMock()
|
||||
client_mock.__enter__ = lambda s: client_mock
|
||||
client_mock.__exit__ = MagicMock(return_value=False)
|
||||
client_mock.post.return_value = self._make_resp(404, {})
|
||||
client_mock.get.side_effect = side_effect
|
||||
|
||||
with patch("agent.model_metadata.detect_local_server_type", return_value=None), \
|
||||
patch("httpx.Client", return_value=client_mock):
|
||||
result = _query_local_context_length("omnicoder-9b", "http://localhost:1234")
|
||||
|
||||
assert result == 131072
|
||||
|
||||
def test_models_list_model_not_found_returns_none(self):
|
||||
"""Returns None when model is not in the /v1/models list."""
|
||||
from agent.model_metadata import _query_local_context_length
|
||||
|
||||
detail_resp = self._make_resp(404, {})
|
||||
list_resp = self._make_resp(200, {
|
||||
"data": [{"id": "other-model", "max_model_len": 4096}]
|
||||
})
|
||||
|
||||
call_count = [0]
|
||||
def side_effect(url, **kwargs):
|
||||
call_count[0] += 1
|
||||
if call_count[0] == 1:
|
||||
return detail_resp
|
||||
return list_resp
|
||||
|
||||
client_mock = MagicMock()
|
||||
client_mock.__enter__ = lambda s: client_mock
|
||||
client_mock.__exit__ = MagicMock(return_value=False)
|
||||
client_mock.post.return_value = self._make_resp(404, {})
|
||||
client_mock.get.side_effect = side_effect
|
||||
|
||||
with patch("agent.model_metadata.detect_local_server_type", return_value=None), \
|
||||
patch("httpx.Client", return_value=client_mock):
|
||||
result = _query_local_context_length("omnicoder-9b", "http://localhost:1234")
|
||||
|
||||
assert result is None
|
||||
|
||||
|
||||
class TestQueryLocalContextLengthLmStudio:
|
||||
"""_query_local_context_length with LM Studio native /api/v1/models response."""
|
||||
|
||||
def _make_resp(self, status_code, body):
|
||||
resp = MagicMock()
|
||||
resp.status_code = status_code
|
||||
resp.json.return_value = body
|
||||
return resp
|
||||
|
||||
def _make_client(self, native_resp, detail_resp, list_resp):
|
||||
"""Build a mock httpx.Client with sequenced GET responses."""
|
||||
client_mock = MagicMock()
|
||||
client_mock.__enter__ = lambda s: client_mock
|
||||
client_mock.__exit__ = MagicMock(return_value=False)
|
||||
client_mock.post.return_value = self._make_resp(404, {})
|
||||
|
||||
responses = [native_resp, detail_resp, list_resp]
|
||||
call_idx = [0]
|
||||
|
||||
def get_side_effect(url, **kwargs):
|
||||
idx = call_idx[0]
|
||||
call_idx[0] += 1
|
||||
if idx < len(responses):
|
||||
return responses[idx]
|
||||
return self._make_resp(404, {})
|
||||
|
||||
client_mock.get.side_effect = get_side_effect
|
||||
return client_mock
|
||||
|
||||
def test_lmstudio_exact_key_match(self):
|
||||
"""Reads max_context_length when key matches exactly."""
|
||||
from agent.model_metadata import _query_local_context_length
|
||||
|
||||
native_resp = self._make_resp(200, {
|
||||
"models": [
|
||||
{"key": "nvidia/nvidia-nemotron-super-49b-v1", "id": "nvidia/nvidia-nemotron-super-49b-v1",
|
||||
"max_context_length": 131072},
|
||||
]
|
||||
})
|
||||
client_mock = self._make_client(
|
||||
native_resp,
|
||||
self._make_resp(404, {}),
|
||||
self._make_resp(404, {}),
|
||||
)
|
||||
|
||||
with patch("agent.model_metadata.detect_local_server_type", return_value="lm-studio"), \
|
||||
patch("httpx.Client", return_value=client_mock):
|
||||
result = _query_local_context_length(
|
||||
"nvidia/nvidia-nemotron-super-49b-v1", "http://192.168.1.22:1234/v1"
|
||||
)
|
||||
|
||||
assert result == 131072
|
||||
|
||||
def test_lmstudio_slug_only_matches_key_with_publisher_prefix(self):
|
||||
"""Fuzzy match: bare model slug matches key that includes publisher prefix.
|
||||
|
||||
When the user configures the model as "local:nvidia-nemotron-super-49b-v1"
|
||||
(slug only, no publisher), but LM Studio's native API stores it as
|
||||
"nvidia/nvidia-nemotron-super-49b-v1", the lookup must still succeed.
|
||||
"""
|
||||
from agent.model_metadata import _query_local_context_length
|
||||
|
||||
native_resp = self._make_resp(200, {
|
||||
"models": [
|
||||
{"key": "nvidia/nvidia-nemotron-super-49b-v1",
|
||||
"id": "nvidia/nvidia-nemotron-super-49b-v1",
|
||||
"max_context_length": 131072},
|
||||
]
|
||||
})
|
||||
client_mock = self._make_client(
|
||||
native_resp,
|
||||
self._make_resp(404, {}),
|
||||
self._make_resp(404, {}),
|
||||
)
|
||||
|
||||
with patch("agent.model_metadata.detect_local_server_type", return_value="lm-studio"), \
|
||||
patch("httpx.Client", return_value=client_mock):
|
||||
# Model passed in is just the slug after stripping "local:" prefix
|
||||
result = _query_local_context_length(
|
||||
"nvidia-nemotron-super-49b-v1", "http://192.168.1.22:1234/v1"
|
||||
)
|
||||
|
||||
assert result == 131072
|
||||
|
||||
def test_lmstudio_v1_models_list_slug_fuzzy_match(self):
|
||||
"""Fuzzy match also works for /v1/models list when exact match fails.
|
||||
|
||||
LM Studio's OpenAI-compat /v1/models returns id like
|
||||
"nvidia/nvidia-nemotron-super-49b-v1" — must match bare slug.
|
||||
"""
|
||||
from agent.model_metadata import _query_local_context_length
|
||||
|
||||
# native /api/v1/models: no match
|
||||
native_resp = self._make_resp(404, {})
|
||||
# /v1/models/{model}: no match
|
||||
detail_resp = self._make_resp(404, {})
|
||||
# /v1/models list: model found with publisher prefix, includes context_length
|
||||
list_resp = self._make_resp(200, {
|
||||
"data": [
|
||||
{"id": "nvidia/nvidia-nemotron-super-49b-v1", "context_length": 131072},
|
||||
]
|
||||
})
|
||||
client_mock = self._make_client(native_resp, detail_resp, list_resp)
|
||||
|
||||
with patch("agent.model_metadata.detect_local_server_type", return_value="lm-studio"), \
|
||||
patch("httpx.Client", return_value=client_mock):
|
||||
result = _query_local_context_length(
|
||||
"nvidia-nemotron-super-49b-v1", "http://192.168.1.22:1234/v1"
|
||||
)
|
||||
|
||||
assert result == 131072
|
||||
|
||||
def test_lmstudio_loaded_instances_context_length(self):
|
||||
"""Reads active context_length from loaded_instances when max_context_length absent."""
|
||||
from agent.model_metadata import _query_local_context_length
|
||||
|
||||
native_resp = self._make_resp(200, {
|
||||
"models": [
|
||||
{
|
||||
"key": "nvidia/nvidia-nemotron-super-49b-v1",
|
||||
"id": "nvidia/nvidia-nemotron-super-49b-v1",
|
||||
"loaded_instances": [
|
||||
{"config": {"context_length": 65536}},
|
||||
],
|
||||
},
|
||||
]
|
||||
})
|
||||
client_mock = self._make_client(
|
||||
native_resp,
|
||||
self._make_resp(404, {}),
|
||||
self._make_resp(404, {}),
|
||||
)
|
||||
|
||||
with patch("agent.model_metadata.detect_local_server_type", return_value="lm-studio"), \
|
||||
patch("httpx.Client", return_value=client_mock):
|
||||
result = _query_local_context_length(
|
||||
"nvidia-nemotron-super-49b-v1", "http://192.168.1.22:1234/v1"
|
||||
)
|
||||
|
||||
assert result == 65536
|
||||
|
||||
def test_lmstudio_loaded_instance_beats_max_context_length(self):
|
||||
"""loaded_instances context_length takes priority over max_context_length.
|
||||
|
||||
LM Studio may show max_context_length=1_048_576 (theoretical model max)
|
||||
while the actual loaded context is 122_651 (runtime setting). The loaded
|
||||
value is the real constraint and must be preferred.
|
||||
"""
|
||||
from agent.model_metadata import _query_local_context_length
|
||||
|
||||
native_resp = self._make_resp(200, {
|
||||
"models": [
|
||||
{
|
||||
"key": "nvidia/nvidia-nemotron-3-nano-4b",
|
||||
"id": "nvidia/nvidia-nemotron-3-nano-4b",
|
||||
"max_context_length": 1_048_576,
|
||||
"loaded_instances": [
|
||||
{"config": {"context_length": 122_651}},
|
||||
],
|
||||
},
|
||||
]
|
||||
})
|
||||
client_mock = self._make_client(
|
||||
native_resp,
|
||||
self._make_resp(404, {}),
|
||||
self._make_resp(404, {}),
|
||||
)
|
||||
|
||||
with patch("agent.model_metadata.detect_local_server_type", return_value="lm-studio"), \
|
||||
patch("httpx.Client", return_value=client_mock):
|
||||
result = _query_local_context_length(
|
||||
"nvidia-nemotron-3-nano-4b", "http://192.168.1.22:1234/v1"
|
||||
)
|
||||
|
||||
assert result == 122_651, (
|
||||
f"Expected loaded instance context (122651) but got {result}. "
|
||||
"max_context_length (1048576) must not win over loaded_instances."
|
||||
)
|
||||
|
||||
|
||||
class TestQueryLocalContextLengthNetworkError:
|
||||
"""_query_local_context_length handles network failures gracefully."""
|
||||
|
||||
def test_connection_error_returns_none(self):
|
||||
"""Returns None when the server is unreachable."""
|
||||
from agent.model_metadata import _query_local_context_length
|
||||
|
||||
client_mock = MagicMock()
|
||||
client_mock.__enter__ = lambda s: client_mock
|
||||
client_mock.__exit__ = MagicMock(return_value=False)
|
||||
client_mock.post.side_effect = Exception("Connection refused")
|
||||
client_mock.get.side_effect = Exception("Connection refused")
|
||||
|
||||
with patch("agent.model_metadata.detect_local_server_type", return_value=None), \
|
||||
patch("httpx.Client", return_value=client_mock):
|
||||
result = _query_local_context_length("omnicoder-9b", "http://localhost:11434/v1")
|
||||
|
||||
assert result is None
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# get_model_context_length — integration-style tests with mocked helpers
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
class TestGetModelContextLengthLocalFallback:
|
||||
"""get_model_context_length uses local server query before falling back to 2M."""
|
||||
|
||||
def test_local_endpoint_unknown_model_queries_server(self):
|
||||
"""Unknown model on local endpoint gets ctx from server, not 2M default."""
|
||||
from agent.model_metadata import get_model_context_length
|
||||
|
||||
with patch("agent.model_metadata.get_cached_context_length", return_value=None), \
|
||||
patch("agent.model_metadata.fetch_endpoint_model_metadata", return_value={}), \
|
||||
patch("agent.model_metadata.fetch_model_metadata", return_value={}), \
|
||||
patch("agent.model_metadata.is_local_endpoint", return_value=True), \
|
||||
patch("agent.model_metadata._query_local_context_length", return_value=131072), \
|
||||
patch("agent.model_metadata.save_context_length") as mock_save:
|
||||
result = get_model_context_length("omnicoder-9b", "http://localhost:11434/v1")
|
||||
|
||||
assert result == 131072
|
||||
|
||||
def test_local_endpoint_unknown_model_result_is_cached(self):
|
||||
"""Context length returned from local server is persisted to cache."""
|
||||
from agent.model_metadata import get_model_context_length
|
||||
|
||||
with patch("agent.model_metadata.get_cached_context_length", return_value=None), \
|
||||
patch("agent.model_metadata.fetch_endpoint_model_metadata", return_value={}), \
|
||||
patch("agent.model_metadata.fetch_model_metadata", return_value={}), \
|
||||
patch("agent.model_metadata.is_local_endpoint", return_value=True), \
|
||||
patch("agent.model_metadata._query_local_context_length", return_value=131072), \
|
||||
patch("agent.model_metadata.save_context_length") as mock_save:
|
||||
get_model_context_length("omnicoder-9b", "http://localhost:11434/v1")
|
||||
|
||||
mock_save.assert_called_once_with("omnicoder-9b", "http://localhost:11434/v1", 131072)
|
||||
|
||||
def test_local_endpoint_server_returns_none_falls_back_to_2m(self):
|
||||
"""When local server returns None, still falls back to 2M probe tier."""
|
||||
from agent.model_metadata import get_model_context_length, CONTEXT_PROBE_TIERS
|
||||
|
||||
with patch("agent.model_metadata.get_cached_context_length", return_value=None), \
|
||||
patch("agent.model_metadata.fetch_endpoint_model_metadata", return_value={}), \
|
||||
patch("agent.model_metadata.fetch_model_metadata", return_value={}), \
|
||||
patch("agent.model_metadata.is_local_endpoint", return_value=True), \
|
||||
patch("agent.model_metadata._query_local_context_length", return_value=None):
|
||||
result = get_model_context_length("omnicoder-9b", "http://localhost:11434/v1")
|
||||
|
||||
assert result == CONTEXT_PROBE_TIERS[0]
|
||||
|
||||
def test_non_local_endpoint_does_not_query_local_server(self):
|
||||
"""For non-local endpoints, _query_local_context_length is not called."""
|
||||
from agent.model_metadata import get_model_context_length, CONTEXT_PROBE_TIERS
|
||||
|
||||
with patch("agent.model_metadata.get_cached_context_length", return_value=None), \
|
||||
patch("agent.model_metadata.fetch_endpoint_model_metadata", return_value={}), \
|
||||
patch("agent.model_metadata.fetch_model_metadata", return_value={}), \
|
||||
patch("agent.model_metadata.is_local_endpoint", return_value=False), \
|
||||
patch("agent.model_metadata._query_local_context_length") as mock_query:
|
||||
result = get_model_context_length(
|
||||
"unknown-model", "https://some-cloud-api.example.com/v1"
|
||||
)
|
||||
|
||||
mock_query.assert_not_called()
|
||||
|
||||
def test_cached_result_skips_local_query(self):
|
||||
"""Cached context length is returned without querying the local server."""
|
||||
from agent.model_metadata import get_model_context_length
|
||||
|
||||
with patch("agent.model_metadata.get_cached_context_length", return_value=65536), \
|
||||
patch("agent.model_metadata._query_local_context_length") as mock_query:
|
||||
result = get_model_context_length("omnicoder-9b", "http://localhost:11434/v1")
|
||||
|
||||
assert result == 65536
|
||||
mock_query.assert_not_called()
|
||||
|
||||
def test_no_base_url_does_not_query_local_server(self):
|
||||
"""When base_url is empty, local server is not queried."""
|
||||
from agent.model_metadata import get_model_context_length
|
||||
|
||||
with patch("agent.model_metadata.get_cached_context_length", return_value=None), \
|
||||
patch("agent.model_metadata.fetch_endpoint_model_metadata", return_value={}), \
|
||||
patch("agent.model_metadata.fetch_model_metadata", return_value={}), \
|
||||
patch("agent.model_metadata._query_local_context_length") as mock_query:
|
||||
result = get_model_context_length("unknown-xyz-model", "")
|
||||
|
||||
mock_query.assert_not_called()
|
||||
@@ -27,6 +27,8 @@ def config_home(tmp_path, monkeypatch):
|
||||
monkeypatch.delenv("HERMES_MODEL", raising=False)
|
||||
monkeypatch.delenv("LLM_MODEL", raising=False)
|
||||
monkeypatch.delenv("HERMES_INFERENCE_PROVIDER", raising=False)
|
||||
monkeypatch.delenv("GITHUB_TOKEN", raising=False)
|
||||
monkeypatch.delenv("GH_TOKEN", raising=False)
|
||||
monkeypatch.delenv("OPENAI_BASE_URL", raising=False)
|
||||
monkeypatch.delenv("OPENAI_API_KEY", raising=False)
|
||||
monkeypatch.delenv("OPENROUTER_API_KEY", raising=False)
|
||||
@@ -97,3 +99,114 @@ class TestProviderPersistsAfterModelSave:
|
||||
f"provider should be 'kimi-coding', got {model.get('provider')}"
|
||||
)
|
||||
assert model.get("default") == "kimi-k2.5"
|
||||
|
||||
def test_copilot_provider_saved_when_selected(self, config_home):
|
||||
"""_model_flow_copilot should persist provider/base_url/model together."""
|
||||
from hermes_cli.main import _model_flow_copilot
|
||||
from hermes_cli.config import load_config
|
||||
|
||||
with patch(
|
||||
"hermes_cli.auth.resolve_api_key_provider_credentials",
|
||||
return_value={
|
||||
"provider": "copilot",
|
||||
"api_key": "gh-cli-token",
|
||||
"base_url": "https://api.githubcopilot.com",
|
||||
"source": "gh auth token",
|
||||
},
|
||||
), patch(
|
||||
"hermes_cli.models.fetch_github_model_catalog",
|
||||
return_value=[
|
||||
{
|
||||
"id": "gpt-4.1",
|
||||
"capabilities": {"type": "chat", "supports": {}},
|
||||
"supported_endpoints": ["/chat/completions"],
|
||||
},
|
||||
{
|
||||
"id": "gpt-5.4",
|
||||
"capabilities": {"type": "chat", "supports": {"reasoning_effort": ["low", "medium", "high"]}},
|
||||
"supported_endpoints": ["/responses"],
|
||||
},
|
||||
],
|
||||
), patch(
|
||||
"hermes_cli.auth._prompt_model_selection",
|
||||
return_value="gpt-5.4",
|
||||
), patch(
|
||||
"hermes_cli.main._prompt_reasoning_effort_selection",
|
||||
return_value="high",
|
||||
), patch(
|
||||
"hermes_cli.auth.deactivate_provider",
|
||||
):
|
||||
_model_flow_copilot(load_config(), "old-model")
|
||||
|
||||
import yaml
|
||||
|
||||
config = yaml.safe_load((config_home / "config.yaml").read_text()) or {}
|
||||
model = config.get("model")
|
||||
assert isinstance(model, dict), f"model should be dict, got {type(model)}"
|
||||
assert model.get("provider") == "copilot"
|
||||
assert model.get("base_url") == "https://api.githubcopilot.com"
|
||||
assert model.get("default") == "gpt-5.4"
|
||||
assert model.get("api_mode") == "codex_responses"
|
||||
assert config["agent"]["reasoning_effort"] == "high"
|
||||
|
||||
def test_copilot_acp_provider_saved_when_selected(self, config_home):
|
||||
"""_model_flow_copilot_acp should persist provider/base_url/model together."""
|
||||
from hermes_cli.main import _model_flow_copilot_acp
|
||||
from hermes_cli.config import load_config
|
||||
|
||||
with patch(
|
||||
"hermes_cli.auth.get_external_process_provider_status",
|
||||
return_value={
|
||||
"resolved_command": "/usr/local/bin/copilot",
|
||||
"command": "copilot",
|
||||
"base_url": "acp://copilot",
|
||||
},
|
||||
), patch(
|
||||
"hermes_cli.auth.resolve_external_process_provider_credentials",
|
||||
return_value={
|
||||
"provider": "copilot-acp",
|
||||
"api_key": "copilot-acp",
|
||||
"base_url": "acp://copilot",
|
||||
"command": "/usr/local/bin/copilot",
|
||||
"args": ["--acp", "--stdio"],
|
||||
"source": "process",
|
||||
},
|
||||
), patch(
|
||||
"hermes_cli.auth.resolve_api_key_provider_credentials",
|
||||
return_value={
|
||||
"provider": "copilot",
|
||||
"api_key": "gh-cli-token",
|
||||
"base_url": "https://api.githubcopilot.com",
|
||||
"source": "gh auth token",
|
||||
},
|
||||
), patch(
|
||||
"hermes_cli.models.fetch_github_model_catalog",
|
||||
return_value=[
|
||||
{
|
||||
"id": "gpt-4.1",
|
||||
"capabilities": {"type": "chat", "supports": {}},
|
||||
"supported_endpoints": ["/chat/completions"],
|
||||
},
|
||||
{
|
||||
"id": "gpt-5.4",
|
||||
"capabilities": {"type": "chat", "supports": {"reasoning_effort": ["low", "medium", "high"]}},
|
||||
"supported_endpoints": ["/responses"],
|
||||
},
|
||||
],
|
||||
), patch(
|
||||
"hermes_cli.auth._prompt_model_selection",
|
||||
return_value="gpt-5.4",
|
||||
), patch(
|
||||
"hermes_cli.auth.deactivate_provider",
|
||||
):
|
||||
_model_flow_copilot_acp(load_config(), "old-model")
|
||||
|
||||
import yaml
|
||||
|
||||
config = yaml.safe_load((config_home / "config.yaml").read_text()) or {}
|
||||
model = config.get("model")
|
||||
assert isinstance(model, dict), f"model should be dict, got {type(model)}"
|
||||
assert model.get("provider") == "copilot-acp"
|
||||
assert model.get("base_url") == "acp://copilot"
|
||||
assert model.get("default") == "gpt-5.4"
|
||||
assert model.get("api_mode") == "chat_completions"
|
||||
|
||||
@@ -0,0 +1,307 @@
|
||||
"""Regression tests for the _run_async() event-loop lifecycle.
|
||||
|
||||
These tests verify the fix for GitHub issue #2104:
|
||||
"Event loop is closed" after vision_analyze used as first call in session.
|
||||
|
||||
Root cause: asyncio.run() creates and *closes* a fresh event loop on every
|
||||
call. Cached httpx/AsyncOpenAI clients that were bound to the now-dead loop
|
||||
would crash with RuntimeError("Event loop is closed") when garbage-collected.
|
||||
|
||||
The fix replaces asyncio.run() with a persistent event loop in _run_async().
|
||||
"""
|
||||
|
||||
import asyncio
|
||||
import json
|
||||
import threading
|
||||
from types import SimpleNamespace
|
||||
from unittest.mock import AsyncMock, MagicMock, patch
|
||||
|
||||
import pytest
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Helpers
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
async def _get_current_loop():
|
||||
"""Return the running event loop from inside a coroutine."""
|
||||
return asyncio.get_event_loop()
|
||||
|
||||
|
||||
async def _create_and_return_transport():
|
||||
"""Simulate an async client creating a transport on the current loop.
|
||||
|
||||
Returns a simple asyncio.Future bound to the running loop so we can
|
||||
later check whether the loop is still alive.
|
||||
"""
|
||||
loop = asyncio.get_event_loop()
|
||||
fut = loop.create_future()
|
||||
fut.set_result("ok")
|
||||
return loop, fut
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Tests
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
class TestRunAsyncLoopLifecycle:
|
||||
"""Verify _run_async() keeps the event loop alive after returning."""
|
||||
|
||||
def test_loop_not_closed_after_run_async(self):
|
||||
"""The loop used by _run_async must still be open after the call."""
|
||||
from model_tools import _run_async
|
||||
|
||||
loop = _run_async(_get_current_loop())
|
||||
|
||||
assert not loop.is_closed(), (
|
||||
"_run_async() closed the event loop — cached async clients will "
|
||||
"crash with 'Event loop is closed' on GC (issue #2104)"
|
||||
)
|
||||
|
||||
def test_same_loop_reused_across_calls(self):
|
||||
"""Consecutive _run_async calls should reuse the same loop."""
|
||||
from model_tools import _run_async
|
||||
|
||||
loop1 = _run_async(_get_current_loop())
|
||||
loop2 = _run_async(_get_current_loop())
|
||||
|
||||
assert loop1 is loop2, (
|
||||
"_run_async() created a new loop on the second call — cached "
|
||||
"async clients from the first call would be orphaned"
|
||||
)
|
||||
|
||||
def test_cached_transport_survives_between_calls(self):
|
||||
"""A transport/future created in call 1 must be valid in call 2."""
|
||||
from model_tools import _run_async
|
||||
|
||||
loop, fut = _run_async(_create_and_return_transport())
|
||||
|
||||
assert not loop.is_closed()
|
||||
assert fut.result() == "ok"
|
||||
|
||||
loop2 = _run_async(_get_current_loop())
|
||||
assert loop2 is loop, "Loop changed between calls"
|
||||
assert not loop.is_closed(), "Loop closed before second call"
|
||||
|
||||
|
||||
class TestRunAsyncWorkerThread:
|
||||
"""Verify worker threads get persistent per-thread loops (delegate_task fix)."""
|
||||
|
||||
def test_worker_thread_loop_not_closed(self):
|
||||
"""A worker thread's loop must stay open after _run_async returns,
|
||||
so cached httpx/AsyncOpenAI clients don't crash on GC."""
|
||||
from concurrent.futures import ThreadPoolExecutor
|
||||
from model_tools import _run_async
|
||||
|
||||
def _run_on_worker():
|
||||
loop = _run_async(_get_current_loop())
|
||||
still_open = not loop.is_closed()
|
||||
return loop, still_open
|
||||
|
||||
with ThreadPoolExecutor(max_workers=1) as pool:
|
||||
loop, still_open = pool.submit(_run_on_worker).result()
|
||||
|
||||
assert still_open, (
|
||||
"Worker thread's event loop was closed after _run_async — "
|
||||
"cached async clients will crash with 'Event loop is closed'"
|
||||
)
|
||||
|
||||
def test_worker_thread_reuses_loop_across_calls(self):
|
||||
"""Multiple _run_async calls on the same worker thread should
|
||||
reuse the same persistent loop (not create-and-destroy each time)."""
|
||||
from concurrent.futures import ThreadPoolExecutor
|
||||
from model_tools import _run_async
|
||||
|
||||
def _run_twice_on_worker():
|
||||
loop1 = _run_async(_get_current_loop())
|
||||
loop2 = _run_async(_get_current_loop())
|
||||
return loop1, loop2
|
||||
|
||||
with ThreadPoolExecutor(max_workers=1) as pool:
|
||||
loop1, loop2 = pool.submit(_run_twice_on_worker).result()
|
||||
|
||||
assert loop1 is loop2, (
|
||||
"Worker thread created different loops for consecutive calls — "
|
||||
"cached clients from the first call would be orphaned"
|
||||
)
|
||||
assert not loop1.is_closed()
|
||||
|
||||
def test_parallel_workers_get_separate_loops(self):
|
||||
"""Different worker threads must get their own loops to avoid
|
||||
contention (the original reason for the worker-thread branch)."""
|
||||
import time
|
||||
from concurrent.futures import ThreadPoolExecutor, as_completed
|
||||
from model_tools import _run_async
|
||||
|
||||
barrier = threading.Barrier(3, timeout=5)
|
||||
|
||||
def _get_loop_id():
|
||||
# Use a barrier to force all 3 threads to be alive simultaneously,
|
||||
# ensuring the ThreadPoolExecutor actually uses 3 distinct threads.
|
||||
loop = _run_async(_get_current_loop())
|
||||
barrier.wait()
|
||||
return id(loop), not loop.is_closed(), threading.current_thread().ident
|
||||
|
||||
with ThreadPoolExecutor(max_workers=3) as pool:
|
||||
futures = [pool.submit(_get_loop_id) for _ in range(3)]
|
||||
results = [f.result() for f in as_completed(futures)]
|
||||
|
||||
loop_ids = {r[0] for r in results}
|
||||
thread_ids = {r[2] for r in results}
|
||||
all_open = all(r[1] for r in results)
|
||||
|
||||
assert all_open, "At least one worker thread's loop was closed"
|
||||
# The barrier guarantees 3 distinct threads were used
|
||||
assert len(thread_ids) == 3, f"Expected 3 threads, got {len(thread_ids)}"
|
||||
# Each thread should have its own loop
|
||||
assert len(loop_ids) == 3, (
|
||||
f"Expected 3 distinct loops for 3 parallel workers, "
|
||||
f"got {len(loop_ids)} — workers may be contending on a shared loop"
|
||||
)
|
||||
|
||||
def test_worker_loop_separate_from_main_loop(self):
|
||||
"""Worker thread loops must be different from the main thread's
|
||||
persistent loop to avoid cross-thread contention."""
|
||||
from concurrent.futures import ThreadPoolExecutor
|
||||
from model_tools import _run_async, _get_tool_loop
|
||||
|
||||
main_loop = _get_tool_loop()
|
||||
|
||||
def _get_worker_loop_id():
|
||||
loop = _run_async(_get_current_loop())
|
||||
return id(loop)
|
||||
|
||||
with ThreadPoolExecutor(max_workers=1) as pool:
|
||||
worker_loop_id = pool.submit(_get_worker_loop_id).result()
|
||||
|
||||
assert worker_loop_id != id(main_loop), (
|
||||
"Worker thread used the main thread's loop — this would cause "
|
||||
"cross-thread contention on the event loop"
|
||||
)
|
||||
|
||||
|
||||
class TestRunAsyncWithRunningLoop:
|
||||
"""When a loop is already running, _run_async falls back to a thread."""
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_run_async_from_async_context(self):
|
||||
"""_run_async should still work when called from inside an
|
||||
already-running event loop (gateway / Atropos path)."""
|
||||
from model_tools import _run_async
|
||||
|
||||
async def _simple():
|
||||
return 42
|
||||
|
||||
result = await asyncio.get_event_loop().run_in_executor(
|
||||
None, _run_async, _simple()
|
||||
)
|
||||
assert result == 42
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Integration: full vision_analyze dispatch chain
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
def _mock_vision_response():
|
||||
"""Build a fake LLM response matching async_call_llm's return shape."""
|
||||
message = SimpleNamespace(content="A cat sitting on a chair.")
|
||||
choice = SimpleNamespace(index=0, message=message, finish_reason="stop")
|
||||
return SimpleNamespace(choices=[choice], model="test/vision", usage=None)
|
||||
|
||||
|
||||
class TestVisionDispatchLoopSafety:
|
||||
"""Simulate the full registry.dispatch('vision_analyze') chain and
|
||||
verify the event loop stays alive afterwards — the exact scenario
|
||||
from issue #2104."""
|
||||
|
||||
def test_vision_dispatch_keeps_loop_alive(self, tmp_path):
|
||||
"""After dispatching vision_analyze via the registry, the event
|
||||
loop must remain open so cached async clients don't crash on GC."""
|
||||
from model_tools import _run_async, _get_tool_loop
|
||||
from tools.registry import registry
|
||||
|
||||
fake_response = _mock_vision_response()
|
||||
|
||||
with (
|
||||
patch(
|
||||
"tools.vision_tools.async_call_llm",
|
||||
new_callable=AsyncMock,
|
||||
return_value=fake_response,
|
||||
),
|
||||
patch(
|
||||
"tools.vision_tools._download_image",
|
||||
new_callable=AsyncMock,
|
||||
side_effect=lambda url, dest, **kw: _write_fake_image(dest),
|
||||
),
|
||||
patch(
|
||||
"tools.vision_tools._validate_image_url",
|
||||
return_value=True,
|
||||
),
|
||||
patch(
|
||||
"tools.vision_tools._image_to_base64_data_url",
|
||||
return_value="data:image/jpeg;base64,abc",
|
||||
),
|
||||
):
|
||||
result_json = registry.dispatch(
|
||||
"vision_analyze",
|
||||
{"image_url": "https://example.com/cat.png", "question": "What is this?"},
|
||||
)
|
||||
|
||||
result = json.loads(result_json)
|
||||
assert result.get("success") is True, f"dispatch failed: {result}"
|
||||
assert "cat" in result.get("analysis", "").lower()
|
||||
|
||||
loop = _get_tool_loop()
|
||||
assert not loop.is_closed(), (
|
||||
"Event loop closed after vision_analyze dispatch — cached async "
|
||||
"clients will crash with 'Event loop is closed' (issue #2104)"
|
||||
)
|
||||
|
||||
def test_two_consecutive_vision_dispatches(self, tmp_path):
|
||||
"""Two back-to-back vision_analyze dispatches must both succeed
|
||||
and share the same loop (simulates 'first call fails, second
|
||||
works' from the issue report)."""
|
||||
from model_tools import _get_tool_loop
|
||||
from tools.registry import registry
|
||||
|
||||
fake_response = _mock_vision_response()
|
||||
|
||||
with (
|
||||
patch(
|
||||
"tools.vision_tools.async_call_llm",
|
||||
new_callable=AsyncMock,
|
||||
return_value=fake_response,
|
||||
),
|
||||
patch(
|
||||
"tools.vision_tools._download_image",
|
||||
new_callable=AsyncMock,
|
||||
side_effect=lambda url, dest, **kw: _write_fake_image(dest),
|
||||
),
|
||||
patch(
|
||||
"tools.vision_tools._validate_image_url",
|
||||
return_value=True,
|
||||
),
|
||||
patch(
|
||||
"tools.vision_tools._image_to_base64_data_url",
|
||||
return_value="data:image/jpeg;base64,abc",
|
||||
),
|
||||
):
|
||||
args = {"image_url": "https://example.com/cat.png", "question": "Describe"}
|
||||
|
||||
r1 = json.loads(registry.dispatch("vision_analyze", args))
|
||||
loop_after_first = _get_tool_loop()
|
||||
|
||||
r2 = json.loads(registry.dispatch("vision_analyze", args))
|
||||
loop_after_second = _get_tool_loop()
|
||||
|
||||
assert r1.get("success") is True
|
||||
assert r2.get("success") is True
|
||||
assert loop_after_first is loop_after_second, "Loop changed between dispatches"
|
||||
assert not loop_after_second.is_closed()
|
||||
|
||||
|
||||
def _write_fake_image(dest):
|
||||
"""Write minimal bytes so vision_analyze_tool thinks download succeeded."""
|
||||
dest.parent.mkdir(parents=True, exist_ok=True)
|
||||
dest.write_bytes(b"\xff\xd8\xff" + b"\x00" * 16)
|
||||
return dest
|
||||
@@ -631,6 +631,28 @@ class TestBuildApiKwargs:
|
||||
kwargs = agent._build_api_kwargs(messages)
|
||||
assert kwargs["extra_body"]["reasoning"]["effort"] == "medium"
|
||||
|
||||
def test_reasoning_sent_for_copilot_gpt5(self, agent):
|
||||
agent.base_url = "https://api.githubcopilot.com"
|
||||
agent.model = "gpt-5.4"
|
||||
messages = [{"role": "user", "content": "hi"}]
|
||||
kwargs = agent._build_api_kwargs(messages)
|
||||
assert kwargs["extra_body"]["reasoning"] == {"effort": "medium"}
|
||||
|
||||
def test_reasoning_xhigh_normalized_for_copilot(self, agent):
|
||||
agent.base_url = "https://api.githubcopilot.com"
|
||||
agent.model = "gpt-5.4"
|
||||
agent.reasoning_config = {"enabled": True, "effort": "xhigh"}
|
||||
messages = [{"role": "user", "content": "hi"}]
|
||||
kwargs = agent._build_api_kwargs(messages)
|
||||
assert kwargs["extra_body"]["reasoning"] == {"effort": "high"}
|
||||
|
||||
def test_reasoning_omitted_for_non_reasoning_copilot_model(self, agent):
|
||||
agent.base_url = "https://api.githubcopilot.com"
|
||||
agent.model = "gpt-4.1"
|
||||
messages = [{"role": "user", "content": "hi"}]
|
||||
kwargs = agent._build_api_kwargs(messages)
|
||||
assert "reasoning" not in kwargs.get("extra_body", {})
|
||||
|
||||
def test_max_tokens_injected(self, agent):
|
||||
agent.max_tokens = 4096
|
||||
messages = [{"role": "user", "content": "hi"}]
|
||||
@@ -2293,6 +2315,41 @@ class TestFallbackAnthropicProvider:
|
||||
assert agent.client is mock_client
|
||||
|
||||
|
||||
def test_aiagent_uses_copilot_acp_client():
|
||||
with (
|
||||
patch("run_agent.get_tool_definitions", return_value=_make_tool_defs("web_search")),
|
||||
patch("run_agent.check_toolset_requirements", return_value={}),
|
||||
patch("run_agent.OpenAI") as mock_openai,
|
||||
patch("agent.copilot_acp_client.CopilotACPClient") as mock_acp_client,
|
||||
):
|
||||
acp_client = MagicMock()
|
||||
mock_acp_client.return_value = acp_client
|
||||
|
||||
agent = AIAgent(
|
||||
api_key="copilot-acp",
|
||||
base_url="acp://copilot",
|
||||
provider="copilot-acp",
|
||||
acp_command="/usr/local/bin/copilot",
|
||||
acp_args=["--acp", "--stdio"],
|
||||
quiet_mode=True,
|
||||
skip_context_files=True,
|
||||
skip_memory=True,
|
||||
)
|
||||
|
||||
assert agent.client is acp_client
|
||||
mock_openai.assert_not_called()
|
||||
mock_acp_client.assert_called_once()
|
||||
assert mock_acp_client.call_args.kwargs["base_url"] == "acp://copilot"
|
||||
assert mock_acp_client.call_args.kwargs["api_key"] == "copilot-acp"
|
||||
assert mock_acp_client.call_args.kwargs["command"] == "/usr/local/bin/copilot"
|
||||
assert mock_acp_client.call_args.kwargs["args"] == ["--acp", "--stdio"]
|
||||
|
||||
|
||||
def test_is_openai_client_closed_honors_custom_client_flag():
|
||||
assert AIAgent._is_openai_client_closed(SimpleNamespace(is_closed=True)) is True
|
||||
assert AIAgent._is_openai_client_closed(SimpleNamespace(is_closed=False)) is False
|
||||
|
||||
|
||||
class TestAnthropicBaseUrlPassthrough:
|
||||
"""Bug fix: base_url was filtered with 'anthropic in base_url', blocking proxies."""
|
||||
|
||||
|
||||
@@ -49,6 +49,27 @@ def _build_agent(monkeypatch):
|
||||
return agent
|
||||
|
||||
|
||||
def _build_copilot_agent(monkeypatch, *, model="gpt-5.4"):
|
||||
_patch_agent_bootstrap(monkeypatch)
|
||||
|
||||
agent = run_agent.AIAgent(
|
||||
model=model,
|
||||
provider="copilot",
|
||||
api_mode="codex_responses",
|
||||
base_url="https://api.githubcopilot.com",
|
||||
api_key="gh-token",
|
||||
quiet_mode=True,
|
||||
max_iterations=4,
|
||||
skip_context_files=True,
|
||||
skip_memory=True,
|
||||
)
|
||||
agent._cleanup_task_resources = lambda task_id: None
|
||||
agent._persist_session = lambda messages, history=None: None
|
||||
agent._save_trajectory = lambda messages, user_message, completed: None
|
||||
agent._save_session_log = lambda messages: None
|
||||
return agent
|
||||
|
||||
|
||||
def _codex_message_response(text: str):
|
||||
return SimpleNamespace(
|
||||
output=[
|
||||
@@ -244,6 +265,28 @@ def test_build_api_kwargs_codex(monkeypatch):
|
||||
assert "extra_body" not in kwargs
|
||||
|
||||
|
||||
def test_build_api_kwargs_copilot_responses_omits_openai_only_fields(monkeypatch):
|
||||
agent = _build_copilot_agent(monkeypatch)
|
||||
kwargs = agent._build_api_kwargs([{"role": "user", "content": "hi"}])
|
||||
|
||||
assert kwargs["model"] == "gpt-5.4"
|
||||
assert kwargs["store"] is False
|
||||
assert kwargs["tool_choice"] == "auto"
|
||||
assert kwargs["parallel_tool_calls"] is True
|
||||
assert kwargs["reasoning"] == {"effort": "medium"}
|
||||
assert "prompt_cache_key" not in kwargs
|
||||
assert "include" not in kwargs
|
||||
|
||||
|
||||
def test_build_api_kwargs_copilot_responses_omits_reasoning_for_non_reasoning_model(monkeypatch):
|
||||
agent = _build_copilot_agent(monkeypatch, model="gpt-4.1")
|
||||
kwargs = agent._build_api_kwargs([{"role": "user", "content": "hi"}])
|
||||
|
||||
assert "reasoning" not in kwargs
|
||||
assert "include" not in kwargs
|
||||
assert "prompt_cache_key" not in kwargs
|
||||
|
||||
|
||||
def test_run_codex_stream_retries_when_completed_event_missing(monkeypatch):
|
||||
agent = _build_agent(monkeypatch)
|
||||
calls = {"stream": 0}
|
||||
@@ -787,3 +830,212 @@ def test_dump_api_request_debug_uses_chat_completions_url(monkeypatch, tmp_path)
|
||||
|
||||
payload = json.loads(dump_file.read_text())
|
||||
assert payload["request"]["url"] == "http://127.0.0.1:9208/v1/chat/completions"
|
||||
|
||||
|
||||
# --- Reasoning-only response tests (fix for empty content retry loop) ---
|
||||
|
||||
|
||||
def _codex_reasoning_only_response(*, encrypted_content="enc_abc123", summary_text="Thinking..."):
|
||||
"""Codex response containing only reasoning items — no message text, no tool calls."""
|
||||
return SimpleNamespace(
|
||||
output=[
|
||||
SimpleNamespace(
|
||||
type="reasoning",
|
||||
id="rs_001",
|
||||
encrypted_content=encrypted_content,
|
||||
summary=[SimpleNamespace(type="summary_text", text=summary_text)],
|
||||
status="completed",
|
||||
)
|
||||
],
|
||||
usage=SimpleNamespace(input_tokens=50, output_tokens=100, total_tokens=150),
|
||||
status="completed",
|
||||
model="gpt-5-codex",
|
||||
)
|
||||
|
||||
|
||||
def test_normalize_codex_response_marks_reasoning_only_as_incomplete(monkeypatch):
|
||||
"""A response with only reasoning items and no content should be 'incomplete', not 'stop'.
|
||||
|
||||
Without this fix, reasoning-only responses get finish_reason='stop' which
|
||||
sends them into the empty-content retry loop (3 retries then failure).
|
||||
"""
|
||||
agent = _build_agent(monkeypatch)
|
||||
assistant_message, finish_reason = agent._normalize_codex_response(
|
||||
_codex_reasoning_only_response()
|
||||
)
|
||||
|
||||
assert finish_reason == "incomplete"
|
||||
assert assistant_message.content == ""
|
||||
assert assistant_message.codex_reasoning_items is not None
|
||||
assert len(assistant_message.codex_reasoning_items) == 1
|
||||
assert assistant_message.codex_reasoning_items[0]["encrypted_content"] == "enc_abc123"
|
||||
|
||||
|
||||
def test_normalize_codex_response_reasoning_with_content_is_stop(monkeypatch):
|
||||
"""If a response has both reasoning and message content, it should still be 'stop'."""
|
||||
agent = _build_agent(monkeypatch)
|
||||
response = SimpleNamespace(
|
||||
output=[
|
||||
SimpleNamespace(
|
||||
type="reasoning",
|
||||
id="rs_001",
|
||||
encrypted_content="enc_xyz",
|
||||
summary=[SimpleNamespace(type="summary_text", text="Thinking...")],
|
||||
status="completed",
|
||||
),
|
||||
SimpleNamespace(
|
||||
type="message",
|
||||
content=[SimpleNamespace(type="output_text", text="Here is the answer.")],
|
||||
status="completed",
|
||||
),
|
||||
],
|
||||
usage=SimpleNamespace(input_tokens=50, output_tokens=100, total_tokens=150),
|
||||
status="completed",
|
||||
model="gpt-5-codex",
|
||||
)
|
||||
assistant_message, finish_reason = agent._normalize_codex_response(response)
|
||||
|
||||
assert finish_reason == "stop"
|
||||
assert "Here is the answer" in assistant_message.content
|
||||
|
||||
|
||||
def test_run_conversation_codex_continues_after_reasoning_only_response(monkeypatch):
|
||||
"""End-to-end: reasoning-only → final message should succeed, not hit retry loop."""
|
||||
agent = _build_agent(monkeypatch)
|
||||
responses = [
|
||||
_codex_reasoning_only_response(),
|
||||
_codex_message_response("The final answer is 42."),
|
||||
]
|
||||
monkeypatch.setattr(agent, "_interruptible_api_call", lambda api_kwargs: responses.pop(0))
|
||||
|
||||
result = agent.run_conversation("what is the answer?")
|
||||
|
||||
assert result["completed"] is True
|
||||
assert result["final_response"] == "The final answer is 42."
|
||||
# The reasoning-only turn should be in messages as an incomplete interim
|
||||
assert any(
|
||||
msg.get("role") == "assistant"
|
||||
and msg.get("finish_reason") == "incomplete"
|
||||
and msg.get("codex_reasoning_items") is not None
|
||||
for msg in result["messages"]
|
||||
)
|
||||
|
||||
|
||||
def test_run_conversation_codex_preserves_encrypted_reasoning_in_interim(monkeypatch):
|
||||
"""Encrypted codex_reasoning_items must be preserved in interim messages
|
||||
even when there is no visible reasoning text or content."""
|
||||
agent = _build_agent(monkeypatch)
|
||||
# Response with encrypted reasoning but no human-readable summary
|
||||
reasoning_response = SimpleNamespace(
|
||||
output=[
|
||||
SimpleNamespace(
|
||||
type="reasoning",
|
||||
id="rs_002",
|
||||
encrypted_content="enc_opaque_blob",
|
||||
summary=[],
|
||||
status="completed",
|
||||
)
|
||||
],
|
||||
usage=SimpleNamespace(input_tokens=50, output_tokens=100, total_tokens=150),
|
||||
status="completed",
|
||||
model="gpt-5-codex",
|
||||
)
|
||||
responses = [
|
||||
reasoning_response,
|
||||
_codex_message_response("Done thinking."),
|
||||
]
|
||||
monkeypatch.setattr(agent, "_interruptible_api_call", lambda api_kwargs: responses.pop(0))
|
||||
|
||||
result = agent.run_conversation("think hard")
|
||||
|
||||
assert result["completed"] is True
|
||||
assert result["final_response"] == "Done thinking."
|
||||
# The interim message must have codex_reasoning_items preserved
|
||||
interim_msgs = [
|
||||
msg for msg in result["messages"]
|
||||
if msg.get("role") == "assistant"
|
||||
and msg.get("finish_reason") == "incomplete"
|
||||
]
|
||||
assert len(interim_msgs) >= 1
|
||||
assert interim_msgs[0].get("codex_reasoning_items") is not None
|
||||
assert interim_msgs[0]["codex_reasoning_items"][0]["encrypted_content"] == "enc_opaque_blob"
|
||||
|
||||
|
||||
def test_chat_messages_to_responses_input_reasoning_only_has_following_item(monkeypatch):
|
||||
"""When converting a reasoning-only interim message to Responses API input,
|
||||
the reasoning items must be followed by an assistant message (even if empty)
|
||||
to satisfy the API's 'required following item' constraint."""
|
||||
agent = _build_agent(monkeypatch)
|
||||
messages = [
|
||||
{"role": "user", "content": "think hard"},
|
||||
{
|
||||
"role": "assistant",
|
||||
"content": "",
|
||||
"reasoning": None,
|
||||
"finish_reason": "incomplete",
|
||||
"codex_reasoning_items": [
|
||||
{"type": "reasoning", "id": "rs_001", "encrypted_content": "enc_abc", "summary": []},
|
||||
],
|
||||
},
|
||||
]
|
||||
items = agent._chat_messages_to_responses_input(messages)
|
||||
|
||||
# Find the reasoning item
|
||||
reasoning_indices = [i for i, it in enumerate(items) if it.get("type") == "reasoning"]
|
||||
assert len(reasoning_indices) == 1
|
||||
ri_idx = reasoning_indices[0]
|
||||
|
||||
# There must be a following item after the reasoning
|
||||
assert ri_idx < len(items) - 1, "Reasoning item must not be the last item (missing_following_item)"
|
||||
following = items[ri_idx + 1]
|
||||
assert following.get("role") == "assistant"
|
||||
|
||||
|
||||
def test_duplicate_detection_distinguishes_different_codex_reasoning(monkeypatch):
|
||||
"""Two consecutive reasoning-only responses with different encrypted content
|
||||
must NOT be treated as duplicates."""
|
||||
agent = _build_agent(monkeypatch)
|
||||
responses = [
|
||||
# First reasoning-only response
|
||||
SimpleNamespace(
|
||||
output=[
|
||||
SimpleNamespace(
|
||||
type="reasoning", id="rs_001",
|
||||
encrypted_content="enc_first", summary=[], status="completed",
|
||||
)
|
||||
],
|
||||
usage=SimpleNamespace(input_tokens=50, output_tokens=100, total_tokens=150),
|
||||
status="completed", model="gpt-5-codex",
|
||||
),
|
||||
# Second reasoning-only response (different encrypted content)
|
||||
SimpleNamespace(
|
||||
output=[
|
||||
SimpleNamespace(
|
||||
type="reasoning", id="rs_002",
|
||||
encrypted_content="enc_second", summary=[], status="completed",
|
||||
)
|
||||
],
|
||||
usage=SimpleNamespace(input_tokens=50, output_tokens=100, total_tokens=150),
|
||||
status="completed", model="gpt-5-codex",
|
||||
),
|
||||
_codex_message_response("Final answer after thinking."),
|
||||
]
|
||||
monkeypatch.setattr(agent, "_interruptible_api_call", lambda api_kwargs: responses.pop(0))
|
||||
|
||||
result = agent.run_conversation("think very hard")
|
||||
|
||||
assert result["completed"] is True
|
||||
assert result["final_response"] == "Final answer after thinking."
|
||||
# Both reasoning-only interim messages should be in history (not collapsed)
|
||||
interim_msgs = [
|
||||
msg for msg in result["messages"]
|
||||
if msg.get("role") == "assistant"
|
||||
and msg.get("finish_reason") == "incomplete"
|
||||
]
|
||||
assert len(interim_msgs) == 2
|
||||
encrypted_contents = [
|
||||
msg["codex_reasoning_items"][0]["encrypted_content"]
|
||||
for msg in interim_msgs
|
||||
]
|
||||
assert "enc_first" in encrypted_contents
|
||||
assert "enc_second" in encrypted_contents
|
||||
|
||||
@@ -177,6 +177,50 @@ def test_custom_endpoint_uses_saved_config_base_url_when_env_missing(monkeypatch
|
||||
assert resolved["api_key"] == "local-key"
|
||||
|
||||
|
||||
def test_custom_endpoint_uses_config_api_key_over_env(monkeypatch):
|
||||
"""provider: custom with base_url and api_key in config uses them (#1760)."""
|
||||
monkeypatch.setattr(rp, "resolve_provider", lambda *a, **k: "openrouter")
|
||||
monkeypatch.setattr(
|
||||
rp,
|
||||
"_get_model_config",
|
||||
lambda: {
|
||||
"provider": "custom",
|
||||
"base_url": "https://my-api.example.com/v1",
|
||||
"api_key": "config-api-key",
|
||||
},
|
||||
)
|
||||
monkeypatch.setenv("OPENAI_BASE_URL", "https://other.example.com/v1")
|
||||
monkeypatch.setenv("OPENAI_API_KEY", "env-key")
|
||||
monkeypatch.delenv("OPENROUTER_BASE_URL", raising=False)
|
||||
|
||||
resolved = rp.resolve_runtime_provider(requested="custom")
|
||||
|
||||
assert resolved["base_url"] == "https://my-api.example.com/v1"
|
||||
assert resolved["api_key"] == "config-api-key"
|
||||
|
||||
|
||||
def test_custom_endpoint_uses_config_api_field_when_no_api_key(monkeypatch):
|
||||
"""provider: custom with 'api' in config uses it as api_key (#1760)."""
|
||||
monkeypatch.setattr(rp, "resolve_provider", lambda *a, **k: "openrouter")
|
||||
monkeypatch.setattr(
|
||||
rp,
|
||||
"_get_model_config",
|
||||
lambda: {
|
||||
"provider": "custom",
|
||||
"base_url": "https://custom.example.com/v1",
|
||||
"api": "config-api-field",
|
||||
},
|
||||
)
|
||||
monkeypatch.delenv("OPENAI_BASE_URL", raising=False)
|
||||
monkeypatch.delenv("OPENAI_API_KEY", raising=False)
|
||||
monkeypatch.delenv("OPENROUTER_API_KEY", raising=False)
|
||||
|
||||
resolved = rp.resolve_runtime_provider(requested="custom")
|
||||
|
||||
assert resolved["base_url"] == "https://custom.example.com/v1"
|
||||
assert resolved["api_key"] == "config-api-field"
|
||||
|
||||
|
||||
def test_custom_endpoint_auto_provider_prefers_openai_key(monkeypatch):
|
||||
"""Auto provider with non-OpenRouter base_url should prefer OPENAI_API_KEY.
|
||||
|
||||
@@ -394,10 +438,116 @@ def test_named_custom_provider_without_api_mode_defaults(monkeypatch):
|
||||
lambda p: {
|
||||
"name": "my-server",
|
||||
"base_url": "http://localhost:8000/v1",
|
||||
"api_key": "sk-test",
|
||||
"api_key": "***",
|
||||
},
|
||||
)
|
||||
|
||||
resolved = rp.resolve_runtime_provider(requested="my-server")
|
||||
|
||||
assert resolved["api_mode"] == "chat_completions"
|
||||
|
||||
|
||||
def test_anthropic_messages_in_valid_api_modes():
|
||||
"""anthropic_messages should be accepted by _parse_api_mode."""
|
||||
assert rp._parse_api_mode("anthropic_messages") == "anthropic_messages"
|
||||
|
||||
|
||||
def test_api_key_provider_anthropic_url_auto_detection(monkeypatch):
|
||||
"""API-key providers with /anthropic base URL should auto-detect anthropic_messages mode."""
|
||||
monkeypatch.setattr(rp, "resolve_provider", lambda *a, **k: "minimax")
|
||||
monkeypatch.setattr(rp, "_get_model_config", lambda: {})
|
||||
monkeypatch.setenv("MINIMAX_API_KEY", "test-minimax-key")
|
||||
monkeypatch.setenv("MINIMAX_BASE_URL", "https://api.minimax.io/anthropic")
|
||||
|
||||
resolved = rp.resolve_runtime_provider(requested="minimax")
|
||||
|
||||
assert resolved["provider"] == "minimax"
|
||||
assert resolved["api_mode"] == "anthropic_messages"
|
||||
assert resolved["base_url"] == "https://api.minimax.io/anthropic"
|
||||
|
||||
|
||||
def test_api_key_provider_explicit_api_mode_config(monkeypatch):
|
||||
"""API-key providers should respect api_mode from model config."""
|
||||
monkeypatch.setattr(rp, "resolve_provider", lambda *a, **k: "minimax")
|
||||
monkeypatch.setattr(rp, "_get_model_config", lambda: {"api_mode": "anthropic_messages"})
|
||||
monkeypatch.setenv("MINIMAX_API_KEY", "test-minimax-key")
|
||||
monkeypatch.delenv("MINIMAX_BASE_URL", raising=False)
|
||||
|
||||
resolved = rp.resolve_runtime_provider(requested="minimax")
|
||||
|
||||
assert resolved["provider"] == "minimax"
|
||||
assert resolved["api_mode"] == "anthropic_messages"
|
||||
|
||||
|
||||
def test_minimax_default_url_uses_anthropic_messages(monkeypatch):
|
||||
"""MiniMax with default /anthropic URL should auto-detect anthropic_messages mode."""
|
||||
monkeypatch.setattr(rp, "resolve_provider", lambda *a, **k: "minimax")
|
||||
monkeypatch.setattr(rp, "_get_model_config", lambda: {})
|
||||
monkeypatch.setenv("MINIMAX_API_KEY", "test-minimax-key")
|
||||
monkeypatch.delenv("MINIMAX_BASE_URL", raising=False)
|
||||
|
||||
resolved = rp.resolve_runtime_provider(requested="minimax")
|
||||
|
||||
assert resolved["provider"] == "minimax"
|
||||
assert resolved["api_mode"] == "anthropic_messages"
|
||||
assert resolved["base_url"] == "https://api.minimax.io/anthropic"
|
||||
|
||||
|
||||
def test_minimax_stale_v1_url_auto_corrected(monkeypatch):
|
||||
"""MiniMax with stale /v1 base URL should be auto-corrected to /anthropic."""
|
||||
monkeypatch.setattr(rp, "resolve_provider", lambda *a, **k: "minimax")
|
||||
monkeypatch.setattr(rp, "_get_model_config", lambda: {})
|
||||
monkeypatch.setenv("MINIMAX_API_KEY", "test-minimax-key")
|
||||
monkeypatch.setenv("MINIMAX_BASE_URL", "https://api.minimax.io/v1")
|
||||
|
||||
resolved = rp.resolve_runtime_provider(requested="minimax")
|
||||
|
||||
assert resolved["provider"] == "minimax"
|
||||
assert resolved["api_mode"] == "anthropic_messages"
|
||||
assert resolved["base_url"] == "https://api.minimax.io/anthropic"
|
||||
|
||||
|
||||
def test_minimax_cn_stale_v1_url_auto_corrected(monkeypatch):
|
||||
"""MiniMax-CN with stale /v1 base URL should be auto-corrected to /anthropic."""
|
||||
monkeypatch.setattr(rp, "resolve_provider", lambda *a, **k: "minimax-cn")
|
||||
monkeypatch.setattr(rp, "_get_model_config", lambda: {})
|
||||
monkeypatch.setenv("MINIMAX_CN_API_KEY", "test-minimax-cn-key")
|
||||
monkeypatch.setenv("MINIMAX_CN_BASE_URL", "https://api.minimaxi.com/v1")
|
||||
|
||||
resolved = rp.resolve_runtime_provider(requested="minimax-cn")
|
||||
|
||||
assert resolved["provider"] == "minimax-cn"
|
||||
assert resolved["api_mode"] == "anthropic_messages"
|
||||
assert resolved["base_url"] == "https://api.minimaxi.com/anthropic"
|
||||
|
||||
|
||||
def test_minimax_explicit_api_mode_respected(monkeypatch):
|
||||
"""Explicit api_mode config should override MiniMax auto-detection."""
|
||||
monkeypatch.setattr(rp, "resolve_provider", lambda *a, **k: "minimax")
|
||||
monkeypatch.setattr(rp, "_get_model_config", lambda: {"api_mode": "chat_completions"})
|
||||
monkeypatch.setenv("MINIMAX_API_KEY", "test-minimax-key")
|
||||
monkeypatch.delenv("MINIMAX_BASE_URL", raising=False)
|
||||
|
||||
resolved = rp.resolve_runtime_provider(requested="minimax")
|
||||
|
||||
assert resolved["provider"] == "minimax"
|
||||
assert resolved["api_mode"] == "chat_completions"
|
||||
|
||||
|
||||
def test_named_custom_provider_anthropic_api_mode(monkeypatch):
|
||||
"""Custom providers should accept api_mode: anthropic_messages."""
|
||||
monkeypatch.setattr(rp, "resolve_provider", lambda *a, **k: "my-anthropic-proxy")
|
||||
monkeypatch.setattr(
|
||||
rp, "_get_named_custom_provider",
|
||||
lambda p: {
|
||||
"name": "my-anthropic-proxy",
|
||||
"base_url": "https://proxy.example.com/anthropic",
|
||||
"api_key": "test-key",
|
||||
"api_mode": "anthropic_messages",
|
||||
},
|
||||
)
|
||||
|
||||
resolved = rp.resolve_runtime_provider(requested="my-anthropic-proxy")
|
||||
|
||||
assert resolved["api_mode"] == "anthropic_messages"
|
||||
assert resolved["base_url"] == "https://proxy.example.com/anthropic"
|
||||
|
||||
@@ -0,0 +1,43 @@
|
||||
"""Tests that verify SQL injection mitigations in insights and state modules."""
|
||||
|
||||
import re
|
||||
|
||||
from agent.insights import InsightsEngine
|
||||
|
||||
|
||||
def test_session_cols_no_injection_chars():
|
||||
"""_SESSION_COLS must not contain SQL injection vectors."""
|
||||
cols = InsightsEngine._SESSION_COLS
|
||||
assert ";" not in cols
|
||||
assert "--" not in cols
|
||||
assert "'" not in cols
|
||||
assert "DROP" not in cols.upper()
|
||||
|
||||
|
||||
def test_get_sessions_all_query_is_parameterized():
|
||||
"""_GET_SESSIONS_ALL must use a ? placeholder for the cutoff value."""
|
||||
query = InsightsEngine._GET_SESSIONS_ALL
|
||||
assert "?" in query
|
||||
assert "started_at >= ?" in query
|
||||
# Must not embed any runtime-variable content via brace interpolation
|
||||
assert "{" not in query
|
||||
|
||||
|
||||
def test_get_sessions_with_source_query_is_parameterized():
|
||||
"""_GET_SESSIONS_WITH_SOURCE must use ? placeholders for both parameters."""
|
||||
query = InsightsEngine._GET_SESSIONS_WITH_SOURCE
|
||||
assert query.count("?") == 2
|
||||
assert "started_at >= ?" in query
|
||||
assert "source = ?" in query
|
||||
assert "{" not in query
|
||||
|
||||
|
||||
def test_session_col_names_are_safe_identifiers():
|
||||
"""Every column name listed in _SESSION_COLS must be a simple identifier."""
|
||||
cols = InsightsEngine._SESSION_COLS
|
||||
identifiers = [c.strip() for c in cols.split(",")]
|
||||
safe_identifier = re.compile(r"^[a-zA-Z_][a-zA-Z0-9_]*$")
|
||||
for col in identifiers:
|
||||
assert safe_identifier.match(col), (
|
||||
f"Column name {col!r} is not a safe SQL identifier"
|
||||
)
|
||||
@@ -0,0 +1,47 @@
|
||||
from unittest.mock import Mock, patch
|
||||
|
||||
|
||||
HOST = "example-host"
|
||||
PORT = 9223
|
||||
WS_URL = f"ws://{HOST}:{PORT}/devtools/browser/abc123"
|
||||
HTTP_URL = f"http://{HOST}:{PORT}"
|
||||
VERSION_URL = f"{HTTP_URL}/json/version"
|
||||
|
||||
|
||||
class TestResolveCdpOverride:
|
||||
def test_keeps_full_devtools_websocket_url(self):
|
||||
from tools.browser_tool import _resolve_cdp_override
|
||||
|
||||
assert _resolve_cdp_override(WS_URL) == WS_URL
|
||||
|
||||
def test_resolves_http_discovery_endpoint_to_websocket(self):
|
||||
from tools.browser_tool import _resolve_cdp_override
|
||||
|
||||
response = Mock()
|
||||
response.raise_for_status.return_value = None
|
||||
response.json.return_value = {"webSocketDebuggerUrl": WS_URL}
|
||||
|
||||
with patch("tools.browser_tool.requests.get", return_value=response) as mock_get:
|
||||
resolved = _resolve_cdp_override(HTTP_URL)
|
||||
|
||||
assert resolved == WS_URL
|
||||
mock_get.assert_called_once_with(VERSION_URL, timeout=10)
|
||||
|
||||
def test_resolves_bare_ws_hostport_to_discovery_websocket(self):
|
||||
from tools.browser_tool import _resolve_cdp_override
|
||||
|
||||
response = Mock()
|
||||
response.raise_for_status.return_value = None
|
||||
response.json.return_value = {"webSocketDebuggerUrl": WS_URL}
|
||||
|
||||
with patch("tools.browser_tool.requests.get", return_value=response) as mock_get:
|
||||
resolved = _resolve_cdp_override(f"ws://{HOST}:{PORT}")
|
||||
|
||||
assert resolved == WS_URL
|
||||
mock_get.assert_called_once_with(VERSION_URL, timeout=10)
|
||||
|
||||
def test_falls_back_to_raw_url_when_discovery_fails(self):
|
||||
from tools.browser_tool import _resolve_cdp_override
|
||||
|
||||
with patch("tools.browser_tool.requests.get", side_effect=RuntimeError("boom")):
|
||||
assert _resolve_cdp_override(HTTP_URL) == HTTP_URL
|
||||
@@ -64,7 +64,8 @@ def make_env(daytona_sdk, monkeypatch):
|
||||
|
||||
def _factory(
|
||||
sandbox=None,
|
||||
find_one_side_effect=None,
|
||||
get_side_effect=None,
|
||||
list_return=None,
|
||||
home_dir="/root",
|
||||
persistent=True,
|
||||
**kwargs,
|
||||
@@ -76,11 +77,17 @@ def make_env(daytona_sdk, monkeypatch):
|
||||
mock_client = MagicMock()
|
||||
mock_client.create.return_value = sandbox
|
||||
|
||||
if find_one_side_effect is not None:
|
||||
mock_client.find_one.side_effect = find_one_side_effect
|
||||
if get_side_effect is not None:
|
||||
mock_client.get.side_effect = get_side_effect
|
||||
else:
|
||||
# Default: no existing sandbox found
|
||||
mock_client.find_one.side_effect = daytona_sdk.DaytonaError("not found")
|
||||
# Default: no existing sandbox found via get()
|
||||
mock_client.get.side_effect = daytona_sdk.DaytonaError("not found")
|
||||
|
||||
# Default: no legacy sandbox found via list()
|
||||
if list_return is not None:
|
||||
mock_client.list.return_value = list_return
|
||||
else:
|
||||
mock_client.list.return_value = SimpleNamespace(items=[])
|
||||
|
||||
daytona_sdk.Daytona = MagicMock(return_value=mock_client)
|
||||
|
||||
@@ -131,24 +138,46 @@ class TestCwdResolution:
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
class TestPersistence:
|
||||
def test_persistent_resumes_existing_sandbox(self, make_env):
|
||||
def test_persistent_resumes_via_get(self, make_env):
|
||||
existing = _make_sandbox(sandbox_id="sb-existing")
|
||||
existing.process.exec.return_value = _make_exec_response(result="/root")
|
||||
env = make_env(find_one_side_effect=lambda **kw: existing, persistent=True)
|
||||
env = make_env(get_side_effect=lambda name: existing, persistent=True,
|
||||
task_id="mytask")
|
||||
existing.start.assert_called_once()
|
||||
# Should NOT have called create since find_one succeeded
|
||||
env._mock_client.get.assert_called_once_with("hermes-mytask")
|
||||
env._mock_client.create.assert_not_called()
|
||||
|
||||
def test_persistent_resumes_legacy_via_list(self, make_env, daytona_sdk):
|
||||
legacy = _make_sandbox(sandbox_id="sb-legacy")
|
||||
legacy.process.exec.return_value = _make_exec_response(result="/root")
|
||||
env = make_env(
|
||||
get_side_effect=daytona_sdk.DaytonaError("not found"),
|
||||
list_return=SimpleNamespace(items=[legacy]),
|
||||
persistent=True,
|
||||
task_id="mytask",
|
||||
)
|
||||
legacy.start.assert_called_once()
|
||||
env._mock_client.list.assert_called_once_with(
|
||||
labels={"hermes_task_id": "mytask"}, page=1, limit=1)
|
||||
env._mock_client.create.assert_not_called()
|
||||
|
||||
def test_persistent_creates_new_when_none_found(self, make_env, daytona_sdk):
|
||||
env = make_env(
|
||||
find_one_side_effect=daytona_sdk.DaytonaError("not found"),
|
||||
get_side_effect=daytona_sdk.DaytonaError("not found"),
|
||||
persistent=True,
|
||||
task_id="mytask",
|
||||
)
|
||||
env._mock_client.create.assert_called_once()
|
||||
# Verify the name and labels were passed to CreateSandboxFromImageParams
|
||||
# by checking get() was called with the right sandbox name
|
||||
env._mock_client.get.assert_called_with("hermes-mytask")
|
||||
env._mock_client.list.assert_called_with(
|
||||
labels={"hermes_task_id": "mytask"}, page=1, limit=1)
|
||||
|
||||
def test_non_persistent_skips_find_one(self, make_env):
|
||||
def test_non_persistent_skips_lookup(self, make_env):
|
||||
env = make_env(persistent=False)
|
||||
env._mock_client.find_one.assert_not_called()
|
||||
env._mock_client.get.assert_not_called()
|
||||
env._mock_client.list.assert_not_called()
|
||||
env._mock_client.create.assert_called_once()
|
||||
|
||||
|
||||
|
||||
@@ -23,6 +23,7 @@ from tools.delegate_tool import (
|
||||
MAX_DEPTH,
|
||||
check_delegate_requirements,
|
||||
delegate_task,
|
||||
_build_child_agent,
|
||||
_build_child_system_prompt,
|
||||
_strip_blocked_tools,
|
||||
_resolve_delegation_credentials,
|
||||
@@ -291,6 +292,58 @@ class TestToolNamePreservation(unittest.TestCase):
|
||||
|
||||
self.assertEqual(model_tools._last_resolved_tool_names, original_tools)
|
||||
|
||||
def test_build_child_agent_does_not_raise_name_error(self):
|
||||
"""Regression: _build_child_agent must not reference _saved_tool_names.
|
||||
|
||||
The bug introduced by the e7844e9c merge conflict: line 235 inside
|
||||
_build_child_agent read `list(_saved_tool_names)` where that variable
|
||||
is only defined later in _run_single_child. Calling _build_child_agent
|
||||
standalone (without _run_single_child's scope) must never raise NameError.
|
||||
"""
|
||||
parent = _make_mock_parent(depth=0)
|
||||
|
||||
with patch("run_agent.AIAgent"):
|
||||
try:
|
||||
_build_child_agent(
|
||||
task_index=0,
|
||||
goal="regression check",
|
||||
context=None,
|
||||
toolsets=None,
|
||||
model=None,
|
||||
max_iterations=10,
|
||||
parent_agent=parent,
|
||||
)
|
||||
except NameError as exc:
|
||||
self.fail(
|
||||
f"_build_child_agent raised NameError — "
|
||||
f"_saved_tool_names leaked back into wrong scope: {exc}"
|
||||
)
|
||||
|
||||
def test_saved_tool_names_set_on_child_before_run(self):
|
||||
"""_run_single_child must set _delegate_saved_tool_names on the child
|
||||
from model_tools._last_resolved_tool_names before run_conversation."""
|
||||
import model_tools
|
||||
|
||||
parent = _make_mock_parent(depth=0)
|
||||
expected_tools = ["read_file", "web_search", "execute_code"]
|
||||
model_tools._last_resolved_tool_names = list(expected_tools)
|
||||
|
||||
captured = {}
|
||||
|
||||
with patch("run_agent.AIAgent") as MockAgent:
|
||||
mock_child = MagicMock()
|
||||
|
||||
def capture_and_return(user_message):
|
||||
captured["saved"] = list(mock_child._delegate_saved_tool_names)
|
||||
return {"final_response": "ok", "completed": True, "api_calls": 1}
|
||||
|
||||
mock_child.run_conversation.side_effect = capture_and_return
|
||||
MockAgent.return_value = mock_child
|
||||
|
||||
delegate_task(goal="capture test", parent_agent=parent)
|
||||
|
||||
self.assertEqual(captured["saved"], expected_tools)
|
||||
|
||||
|
||||
class TestDelegateObservability(unittest.TestCase):
|
||||
"""Tests for enriched metadata returned by _run_single_child."""
|
||||
|
||||
@@ -106,6 +106,18 @@ class TestSchemaConversion:
|
||||
assert schema["parameters"]["type"] == "object"
|
||||
assert schema["parameters"]["properties"] == {}
|
||||
|
||||
def test_object_schema_without_properties_gets_normalized(self):
|
||||
from tools.mcp_tool import _convert_mcp_schema
|
||||
|
||||
mcp_tool = _make_mcp_tool(
|
||||
name="ask",
|
||||
description="Ask Crawl4AI",
|
||||
input_schema={"type": "object"},
|
||||
)
|
||||
schema = _convert_mcp_schema("crawl4ai", mcp_tool)
|
||||
|
||||
assert schema["parameters"] == {"type": "object", "properties": {}}
|
||||
|
||||
def test_tool_name_prefix_format(self):
|
||||
from tools.mcp_tool import _convert_mcp_schema
|
||||
|
||||
@@ -1893,6 +1905,33 @@ class TestSamplingCallbackText:
|
||||
messages = call_args.kwargs["messages"]
|
||||
assert messages[0] == {"role": "system", "content": "Be helpful"}
|
||||
|
||||
def test_server_tools_with_object_schema_are_normalized(self):
|
||||
"""Server-provided tools should gain empty properties for object schemas."""
|
||||
fake_client = MagicMock()
|
||||
fake_client.chat.completions.create.return_value = _make_llm_response()
|
||||
server_tool = SimpleNamespace(
|
||||
name="ask",
|
||||
description="Ask Crawl4AI",
|
||||
inputSchema={"type": "object"},
|
||||
)
|
||||
|
||||
with patch(
|
||||
"agent.auxiliary_client.call_llm",
|
||||
return_value=fake_client.chat.completions.create.return_value,
|
||||
) as mock_call:
|
||||
params = _make_sampling_params(tools=[server_tool])
|
||||
asyncio.run(self.handler(None, params))
|
||||
|
||||
tools = mock_call.call_args.kwargs["tools"]
|
||||
assert tools == [{
|
||||
"type": "function",
|
||||
"function": {
|
||||
"name": "ask",
|
||||
"description": "Ask Crawl4AI",
|
||||
"parameters": {"type": "object", "properties": {}},
|
||||
},
|
||||
}]
|
||||
|
||||
def test_length_stop_reason(self):
|
||||
"""finish_reason='length' maps to stopReason='maxTokens'."""
|
||||
fake_client = MagicMock()
|
||||
|
||||
+51
-2
@@ -106,14 +106,63 @@ def _get_extraction_model() -> Optional[str]:
|
||||
return os.getenv("AUXILIARY_WEB_EXTRACT_MODEL", "").strip() or None
|
||||
|
||||
|
||||
def _resolve_cdp_override(cdp_url: str) -> str:
|
||||
"""Normalize a user-supplied CDP endpoint into a concrete connectable URL.
|
||||
|
||||
Accepts:
|
||||
- full websocket endpoints: ws://host:port/devtools/browser/...
|
||||
- HTTP discovery endpoints: http://host:port or http://host:port/json/version
|
||||
- bare websocket host:port values like ws://host:port
|
||||
|
||||
For discovery-style endpoints we fetch /json/version and return the
|
||||
webSocketDebuggerUrl so downstream tools always receive a concrete browser
|
||||
websocket instead of an ambiguous host:port URL.
|
||||
"""
|
||||
raw = (cdp_url or "").strip()
|
||||
if not raw:
|
||||
return ""
|
||||
|
||||
lowered = raw.lower()
|
||||
if "/devtools/browser/" in lowered:
|
||||
return raw
|
||||
|
||||
discovery_url = raw
|
||||
if lowered.startswith("ws://") or lowered.startswith("wss://"):
|
||||
if raw.count(":") == 2 and raw.rstrip("/").rsplit(":", 1)[-1].isdigit() and "/" not in raw.split(":", 2)[-1]:
|
||||
discovery_url = ("http://" if lowered.startswith("ws://") else "https://") + raw.split("://", 1)[1]
|
||||
else:
|
||||
return raw
|
||||
|
||||
if discovery_url.lower().endswith("/json/version"):
|
||||
version_url = discovery_url
|
||||
else:
|
||||
version_url = discovery_url.rstrip("/") + "/json/version"
|
||||
|
||||
try:
|
||||
response = requests.get(version_url, timeout=10)
|
||||
response.raise_for_status()
|
||||
payload = response.json()
|
||||
except Exception as exc:
|
||||
logger.warning("Failed to resolve CDP endpoint %s via %s: %s", raw, version_url, exc)
|
||||
return raw
|
||||
|
||||
ws_url = str(payload.get("webSocketDebuggerUrl") or "").strip()
|
||||
if ws_url:
|
||||
logger.info("Resolved CDP endpoint %s -> %s", raw, ws_url)
|
||||
return ws_url
|
||||
|
||||
logger.warning("CDP discovery at %s did not return webSocketDebuggerUrl; using raw endpoint", version_url)
|
||||
return raw
|
||||
|
||||
|
||||
def _get_cdp_override() -> str:
|
||||
"""Return a user-supplied CDP URL override, or empty string.
|
||||
"""Return a normalized user-supplied CDP URL override, or empty string.
|
||||
|
||||
When ``BROWSER_CDP_URL`` is set (e.g. via ``/browser connect``), we skip
|
||||
both Browserbase and the local headless launcher and connect directly to
|
||||
the supplied Chrome DevTools Protocol endpoint.
|
||||
"""
|
||||
return os.environ.get("BROWSER_CDP_URL", "").strip()
|
||||
return _resolve_cdp_override(os.environ.get("BROWSER_CDP_URL", ""))
|
||||
|
||||
|
||||
# ============================================================================
|
||||
|
||||
@@ -336,11 +336,9 @@ Jobs run in a fresh session with no current-chat context, so prompts must be sel
|
||||
If skill or skills are provided on create, the future cron run loads those skills in order, then follows the prompt as the task instruction.
|
||||
On update, passing skills=[] clears attached skills.
|
||||
|
||||
NOTE: The agent's final response is auto-delivered to the target — do NOT use
|
||||
send_message in the prompt for that same destination. Same-target send_message
|
||||
calls are skipped to avoid duplicate cron deliveries. Put the primary
|
||||
user-facing content in the final response, and use send_message only for
|
||||
additional or different targets.
|
||||
NOTE: The agent's final response is auto-delivered to the target. Put the primary
|
||||
user-facing content in the final response. Cron jobs run autonomously with no user
|
||||
present — they cannot ask questions or request clarification.
|
||||
|
||||
Important safety rule: cron-run sessions should not recursively schedule more cron jobs.""",
|
||||
"parameters": {
|
||||
@@ -372,7 +370,7 @@ Important safety rule: cron-run sessions should not recursively schedule more cr
|
||||
},
|
||||
"deliver": {
|
||||
"type": "string",
|
||||
"description": "Delivery target: origin, local, telegram, discord, signal, sms, or platform:chat_id"
|
||||
"description": "Delivery target: origin, local, telegram, discord, slack, whatsapp, signal, matrix, mattermost, homeassistant, dingtalk, email, sms, or platform:chat_id"
|
||||
},
|
||||
"model": {
|
||||
"type": "string",
|
||||
|
||||
+39
-17
@@ -201,6 +201,8 @@ def _build_child_agent(
|
||||
effective_base_url = override_base_url or parent_agent.base_url
|
||||
effective_api_key = override_api_key or parent_api_key
|
||||
effective_api_mode = override_api_mode or getattr(parent_agent, "api_mode", None)
|
||||
effective_acp_command = getattr(parent_agent, "acp_command", None)
|
||||
effective_acp_args = list(getattr(parent_agent, "acp_args", []) or [])
|
||||
|
||||
child = AIAgent(
|
||||
base_url=effective_base_url,
|
||||
@@ -208,6 +210,8 @@ def _build_child_agent(
|
||||
model=effective_model,
|
||||
provider=effective_provider,
|
||||
api_mode=effective_api_mode,
|
||||
acp_command=effective_acp_command,
|
||||
acp_args=effective_acp_args,
|
||||
max_iterations=max_iterations,
|
||||
max_tokens=getattr(parent_agent, "max_tokens", None),
|
||||
reasoning_config=getattr(parent_agent, "reasoning_config", None),
|
||||
@@ -228,7 +232,6 @@ def _build_child_agent(
|
||||
tool_progress_callback=child_progress_cb,
|
||||
iteration_budget=shared_budget,
|
||||
)
|
||||
|
||||
# Set delegation depth so children can't spawn grandchildren
|
||||
child._delegate_depth = getattr(parent_agent, '_delegate_depth', 0) + 1
|
||||
|
||||
@@ -259,12 +262,11 @@ def _run_single_child(
|
||||
# Get the progress callback from the child agent
|
||||
child_progress_cb = getattr(child, 'tool_progress_callback', None)
|
||||
|
||||
# Save the parent's resolved tool names before the child agent can
|
||||
# overwrite the process-global via get_tool_definitions().
|
||||
# This must be in _run_single_child (not _build_child_agent) so the
|
||||
# save/restore happens in the same scope as the try/finally.
|
||||
# Restore parent tool names using the value saved before child construction
|
||||
# mutated the global. This is the correct parent toolset, not the child's.
|
||||
import model_tools
|
||||
_saved_tool_names = list(model_tools._last_resolved_tool_names)
|
||||
_saved_tool_names = getattr(child, "_delegate_saved_tool_names",
|
||||
list(model_tools._last_resolved_tool_names))
|
||||
|
||||
try:
|
||||
result = child.run_conversation(user_message=goal)
|
||||
@@ -375,7 +377,11 @@ def _run_single_child(
|
||||
finally:
|
||||
# Restore the parent's tool names so the process-global is correct
|
||||
# for any subsequent execute_code calls or other consumers.
|
||||
model_tools._last_resolved_tool_names = _saved_tool_names
|
||||
import model_tools
|
||||
|
||||
saved_tool_names = getattr(child, "_delegate_saved_tool_names", None)
|
||||
if isinstance(saved_tool_names, list):
|
||||
model_tools._last_resolved_tool_names = list(saved_tool_names)
|
||||
|
||||
# Unregister child from interrupt propagation
|
||||
if hasattr(parent_agent, '_active_children'):
|
||||
@@ -457,18 +463,32 @@ def delegate_task(
|
||||
# Track goal labels for progress display (truncated for readability)
|
||||
task_labels = [t["goal"][:40] for t in task_list]
|
||||
|
||||
# Save parent tool names BEFORE any child construction mutates the global.
|
||||
# _build_child_agent() calls AIAgent() which calls get_tool_definitions(),
|
||||
# which overwrites model_tools._last_resolved_tool_names with child's toolset.
|
||||
import model_tools as _model_tools
|
||||
_parent_tool_names = list(_model_tools._last_resolved_tool_names)
|
||||
|
||||
# Build all child agents on the main thread (thread-safe construction)
|
||||
# Wrapped in try/finally so the global is always restored even if a
|
||||
# child build raises (otherwise _last_resolved_tool_names stays corrupted).
|
||||
children = []
|
||||
for i, t in enumerate(task_list):
|
||||
child = _build_child_agent(
|
||||
task_index=i, goal=t["goal"], context=t.get("context"),
|
||||
toolsets=t.get("toolsets") or toolsets, model=creds["model"],
|
||||
max_iterations=effective_max_iter, parent_agent=parent_agent,
|
||||
override_provider=creds["provider"], override_base_url=creds["base_url"],
|
||||
override_api_key=creds["api_key"],
|
||||
override_api_mode=creds["api_mode"],
|
||||
)
|
||||
children.append((i, t, child))
|
||||
try:
|
||||
for i, t in enumerate(task_list):
|
||||
child = _build_child_agent(
|
||||
task_index=i, goal=t["goal"], context=t.get("context"),
|
||||
toolsets=t.get("toolsets") or toolsets, model=creds["model"],
|
||||
max_iterations=effective_max_iter, parent_agent=parent_agent,
|
||||
override_provider=creds["provider"], override_base_url=creds["base_url"],
|
||||
override_api_key=creds["api_key"],
|
||||
override_api_mode=creds["api_mode"],
|
||||
)
|
||||
# Override with correct parent tool names (before child construction mutated global)
|
||||
child._delegate_saved_tool_names = _parent_tool_names
|
||||
children.append((i, t, child))
|
||||
finally:
|
||||
# Authoritative restore: reset global to parent's tool names after all children built
|
||||
_model_tools._last_resolved_tool_names = _parent_tool_names
|
||||
|
||||
if n_tasks == 1:
|
||||
# Single task -- run directly (no thread pool overhead)
|
||||
@@ -626,6 +646,8 @@ def _resolve_delegation_credentials(cfg: dict, parent_agent) -> dict:
|
||||
"base_url": runtime.get("base_url"),
|
||||
"api_key": api_key,
|
||||
"api_mode": runtime.get("api_mode"),
|
||||
"command": runtime.get("command"),
|
||||
"args": list(runtime.get("args") or []),
|
||||
}
|
||||
|
||||
|
||||
|
||||
@@ -68,11 +68,13 @@ class DaytonaEnvironment(BaseEnvironment):
|
||||
resources = Resources(cpu=cpu, memory=memory_gib, disk=disk_gib)
|
||||
|
||||
labels = {"hermes_task_id": task_id}
|
||||
sandbox_name = f"hermes-{task_id}"
|
||||
|
||||
# Try to resume an existing stopped sandbox for this task
|
||||
# Try to resume an existing sandbox for this task
|
||||
if self._persistent:
|
||||
# 1. Try name-based lookup (new path)
|
||||
try:
|
||||
self._sandbox = self._daytona.find_one(labels=labels)
|
||||
self._sandbox = self._daytona.get(sandbox_name)
|
||||
self._sandbox.start()
|
||||
logger.info("Daytona: resumed sandbox %s for task %s",
|
||||
self._sandbox.id, task_id)
|
||||
@@ -83,11 +85,26 @@ class DaytonaEnvironment(BaseEnvironment):
|
||||
task_id, e)
|
||||
self._sandbox = None
|
||||
|
||||
# 2. Legacy fallback: find sandbox created before the naming migration
|
||||
if self._sandbox is None:
|
||||
try:
|
||||
page = self._daytona.list(labels=labels, page=1, limit=1)
|
||||
if page.items:
|
||||
self._sandbox = page.items[0]
|
||||
self._sandbox.start()
|
||||
logger.info("Daytona: resumed legacy sandbox %s for task %s",
|
||||
self._sandbox.id, task_id)
|
||||
except Exception as e:
|
||||
logger.debug("Daytona: no legacy sandbox found for task %s: %s",
|
||||
task_id, e)
|
||||
self._sandbox = None
|
||||
|
||||
# Create a fresh sandbox if we don't have one
|
||||
if self._sandbox is None:
|
||||
self._sandbox = self._daytona.create(
|
||||
CreateSandboxFromImageParams(
|
||||
image=image,
|
||||
name=sandbox_name,
|
||||
labels=labels,
|
||||
auto_stop_interval=0,
|
||||
resources=resources,
|
||||
|
||||
+15
-5
@@ -605,7 +605,9 @@ class SamplingHandler:
|
||||
"function": {
|
||||
"name": getattr(t, "name", ""),
|
||||
"description": getattr(t, "description", "") or "",
|
||||
"parameters": getattr(t, "inputSchema", {}) or {},
|
||||
"parameters": _normalize_mcp_input_schema(
|
||||
getattr(t, "inputSchema", None)
|
||||
),
|
||||
},
|
||||
}
|
||||
for t in server_tools
|
||||
@@ -1213,6 +1215,17 @@ def _make_check_fn(server_name: str):
|
||||
# Discovery & registration
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
def _normalize_mcp_input_schema(schema: dict | None) -> dict:
|
||||
"""Normalize MCP input schemas for LLM tool-calling compatibility."""
|
||||
if not schema:
|
||||
return {"type": "object", "properties": {}}
|
||||
|
||||
if schema.get("type") == "object" and "properties" not in schema:
|
||||
return {**schema, "properties": {}}
|
||||
|
||||
return schema
|
||||
|
||||
|
||||
def _convert_mcp_schema(server_name: str, mcp_tool) -> dict:
|
||||
"""Convert an MCP tool listing to the Hermes registry schema format.
|
||||
|
||||
@@ -1231,10 +1244,7 @@ def _convert_mcp_schema(server_name: str, mcp_tool) -> dict:
|
||||
return {
|
||||
"name": prefixed_name,
|
||||
"description": mcp_tool.description or f"MCP tool {mcp_tool.name} from {server_name}",
|
||||
"parameters": mcp_tool.inputSchema if mcp_tool.inputSchema else {
|
||||
"type": "object",
|
||||
"properties": {},
|
||||
},
|
||||
"parameters": _normalize_mcp_input_schema(mcp_tool.inputSchema),
|
||||
}
|
||||
|
||||
|
||||
|
||||
@@ -124,6 +124,10 @@ def _handle_send(args):
|
||||
"slack": Platform.SLACK,
|
||||
"whatsapp": Platform.WHATSAPP,
|
||||
"signal": Platform.SIGNAL,
|
||||
"matrix": Platform.MATRIX,
|
||||
"mattermost": Platform.MATTERMOST,
|
||||
"homeassistant": Platform.HOMEASSISTANT,
|
||||
"dingtalk": Platform.DINGTALK,
|
||||
"email": Platform.EMAIL,
|
||||
"sms": Platform.SMS,
|
||||
}
|
||||
|
||||
+2
-1
@@ -239,6 +239,7 @@ def _generate_openai_tts(text: str, output_path: str, tts_config: Dict[str, Any]
|
||||
oai_config = tts_config.get("openai", {})
|
||||
model = oai_config.get("model", DEFAULT_OPENAI_MODEL)
|
||||
voice = oai_config.get("voice", DEFAULT_OPENAI_VOICE)
|
||||
base_url = oai_config.get("base_url", "https://api.openai.com/v1")
|
||||
|
||||
# Determine response format from extension
|
||||
if output_path.endswith(".ogg"):
|
||||
@@ -247,7 +248,7 @@ def _generate_openai_tts(text: str, output_path: str, tts_config: Dict[str, Any]
|
||||
response_format = "mp3"
|
||||
|
||||
OpenAIClient = _import_openai_client()
|
||||
client = OpenAIClient(api_key=api_key, base_url="https://api.openai.com/v1")
|
||||
client = OpenAIClient(api_key=api_key, base_url=base_url)
|
||||
response = client.audio.speech.create(
|
||||
model=model,
|
||||
voice=voice,
|
||||
|
||||
@@ -305,14 +305,14 @@ For docs-only examples, the exact file set may differ. The point is to cover:
|
||||
Run tests with xdist disabled:
|
||||
|
||||
```bash
|
||||
source .venv/bin/activate
|
||||
source venv/bin/activate
|
||||
python -m pytest tests/test_runtime_provider_resolution.py tests/test_cli_provider_resolution.py tests/test_cli_model_command.py tests/test_setup_model_selection.py -n0 -q
|
||||
```
|
||||
|
||||
For deeper changes, run the full suite before pushing:
|
||||
|
||||
```bash
|
||||
source .venv/bin/activate
|
||||
source venv/bin/activate
|
||||
python -m pytest tests/ -n0 -q
|
||||
```
|
||||
|
||||
@@ -321,14 +321,14 @@ python -m pytest tests/ -n0 -q
|
||||
After tests, run a real smoke test.
|
||||
|
||||
```bash
|
||||
source .venv/bin/activate
|
||||
source venv/bin/activate
|
||||
python -m hermes_cli.main chat -q "Say hello" --provider your-provider --model your-model
|
||||
```
|
||||
|
||||
Also test the interactive flows if you changed menus:
|
||||
|
||||
```bash
|
||||
source .venv/bin/activate
|
||||
source venv/bin/activate
|
||||
python -m hermes_cli.main model
|
||||
python -m hermes_cli.main setup
|
||||
```
|
||||
|
||||
@@ -28,17 +28,19 @@ Primary files:
|
||||
|
||||
The cached system prompt is assembled in roughly this order:
|
||||
|
||||
1. default agent identity
|
||||
1. agent identity — `SOUL.md` from `HERMES_HOME` when available, otherwise falls back to `DEFAULT_AGENT_IDENTITY` in `prompt_builder.py`
|
||||
2. tool-aware behavior guidance
|
||||
3. Honcho static block (when active)
|
||||
4. optional system message
|
||||
5. frozen MEMORY snapshot
|
||||
6. frozen USER profile snapshot
|
||||
7. skills index
|
||||
8. context files (`AGENTS.md`, `SOUL.md`, `.cursorrules`, `.cursor/rules/*.mdc`)
|
||||
8. context files (`AGENTS.md`, `.cursorrules`, `.cursor/rules/*.mdc`) — SOUL.md is **not** included here when it was already loaded as the identity in step 1
|
||||
9. timestamp / optional session ID
|
||||
10. platform hint
|
||||
|
||||
When `skip_context_files` is set (e.g., subagent delegation), SOUL.md is not loaded and the hardcoded `DEFAULT_AGENT_IDENTITY` is used instead.
|
||||
|
||||
## API-call-time-only layers
|
||||
|
||||
These are intentionally *not* persisted as part of the cached system prompt:
|
||||
@@ -59,10 +61,11 @@ Local memory and user profile data are injected as frozen snapshots at session s
|
||||
`agent/prompt_builder.py` scans and sanitizes:
|
||||
|
||||
- `AGENTS.md`
|
||||
- `SOUL.md`
|
||||
- `.cursorrules`
|
||||
- `.cursor/rules/*.mdc`
|
||||
|
||||
`SOUL.md` is loaded separately via `load_soul_md()` for the identity slot. When it loads successfully, `build_context_files_prompt(skip_soul=True)` prevents it from appearing twice.
|
||||
|
||||
Long files are truncated before injection.
|
||||
|
||||
## Skills index
|
||||
|
||||
@@ -51,11 +51,13 @@ hermes setup # Or configure everything at once
|
||||
| **MiniMax China** | China-region MiniMax endpoint | Set `MINIMAX_CN_API_KEY` |
|
||||
| **Alibaba Cloud** | Qwen models via DashScope | Set `DASHSCOPE_API_KEY` |
|
||||
| **Kilo Code** | KiloCode-hosted models | Set `KILOCODE_API_KEY` |
|
||||
| **OpenCode Zen** | Pay-as-you-go access to curated models | Set `OPENCODE_ZEN_API_KEY` |
|
||||
| **OpenCode Go** | $10/month subscription for open models | Set `OPENCODE_GO_API_KEY` |
|
||||
| **Vercel AI Gateway** | Vercel AI Gateway routing | Set `AI_GATEWAY_API_KEY` |
|
||||
| **Custom Endpoint** | VLLM, SGLang, or any OpenAI-compatible API | Set base URL + API key |
|
||||
| **Custom Endpoint** | VLLM, SGLang, Ollama, or any OpenAI-compatible API | Set base URL + API key |
|
||||
|
||||
:::tip
|
||||
You can switch providers at any time with `hermes model` — no code changes, no lock-in.
|
||||
You can switch providers at any time with `hermes model` — no code changes, no lock-in. When configuring a custom endpoint, Hermes will prompt for the context window size and auto-detect it when possible. See [Context Length Detection](../user-guide/configuration.md#context-length-detection) for details.
|
||||
:::
|
||||
|
||||
## 3. Start Chatting
|
||||
|
||||
@@ -6,9 +6,9 @@ description: "How to use SOUL.md to shape Hermes Agent's default voice, what bel
|
||||
|
||||
# Use SOUL.md with Hermes
|
||||
|
||||
`SOUL.md` is the easiest way to give Hermes a stable, default voice.
|
||||
`SOUL.md` is the **primary identity** for your Hermes instance. It's the first thing in the system prompt — it defines who the agent is, how it speaks, and what it avoids.
|
||||
|
||||
If you want Hermes to feel like the same assistant every time you talk to it — without repeating instructions in every session — this is the file to use.
|
||||
If you want Hermes to feel like the same assistant every time you talk to it — or if you want to replace the Hermes persona entirely with your own — this is the file to use.
|
||||
|
||||
## What SOUL.md is for
|
||||
|
||||
@@ -65,11 +65,11 @@ Important:
|
||||
|
||||
## How Hermes uses it
|
||||
|
||||
When Hermes starts a session, it reads `SOUL.md` from `HERMES_HOME`, scans it for prompt-injection patterns, truncates it if needed, and injects the content directly into the prompt.
|
||||
When Hermes starts a session, it reads `SOUL.md` from `HERMES_HOME`, scans it for prompt-injection patterns, truncates it if needed, and uses it as the **agent identity** — slot #1 in the system prompt. This means SOUL.md completely replaces the built-in default identity text.
|
||||
|
||||
No wrapper language is added around the file.
|
||||
If SOUL.md is missing, empty, or cannot be loaded, Hermes falls back to a built-in default identity.
|
||||
|
||||
So the content itself matters. Write the way you want Hermes to think and speak.
|
||||
No wrapper language is added around the file. The content itself matters — write the way you want your agent to think and speak.
|
||||
|
||||
## A good first edit
|
||||
|
||||
|
||||
@@ -66,7 +66,7 @@ Common options:
|
||||
| `-q`, `--query "..."` | One-shot, non-interactive prompt. |
|
||||
| `-m`, `--model <model>` | Override the model for this run. |
|
||||
| `-t`, `--toolsets <csv>` | Enable a comma-separated set of toolsets. |
|
||||
| `--provider <provider>` | Force a provider: `auto`, `openrouter`, `nous`, `openai-codex`, `anthropic`, `zai`, `kimi-coding`, `minimax`, `minimax-cn`. |
|
||||
| `--provider <provider>` | Force a provider: `auto`, `openrouter`, `nous`, `openai-codex`, `copilot`, `copilot-acp`, `anthropic`, `zai`, `kimi-coding`, `minimax`, `minimax-cn`, `opencode-zen`, `opencode-go`, `ai-gateway`, `kilocode`, `alibaba`. |
|
||||
| `-v`, `--verbose` | Verbose output. |
|
||||
| `-Q`, `--quiet` | Programmatic mode: suppress banner/spinner/tool previews. |
|
||||
| `--resume <session>` / `--continue [name]` | Resume a session directly from `chat`. |
|
||||
|
||||
@@ -18,6 +18,13 @@ All variables go in `~/.hermes/.env`. You can also set them with `hermes config
|
||||
| `AI_GATEWAY_BASE_URL` | Override AI Gateway base URL (default: `https://ai-gateway.vercel.sh/v1`) |
|
||||
| `OPENAI_API_KEY` | API key for custom OpenAI-compatible endpoints (used with `OPENAI_BASE_URL`) |
|
||||
| `OPENAI_BASE_URL` | Base URL for custom endpoint (VLLM, SGLang, etc.) |
|
||||
| `COPILOT_GITHUB_TOKEN` | GitHub token for Copilot API — first priority (OAuth `gho_*` or fine-grained PAT `github_pat_*`; classic PATs `ghp_*` are **not supported**) |
|
||||
| `GH_TOKEN` | GitHub token — second priority for Copilot (also used by `gh` CLI) |
|
||||
| `GITHUB_TOKEN` | GitHub token — third priority for Copilot |
|
||||
| `HERMES_COPILOT_ACP_COMMAND` | Override Copilot ACP CLI binary path (default: `copilot`) |
|
||||
| `COPILOT_CLI_PATH` | Alias for `HERMES_COPILOT_ACP_COMMAND` |
|
||||
| `HERMES_COPILOT_ACP_ARGS` | Override Copilot ACP arguments (default: `--acp --stdio`) |
|
||||
| `COPILOT_ACP_BASE_URL` | Override Copilot ACP base URL |
|
||||
| `GLM_API_KEY` | z.ai / ZhipuAI GLM API key ([z.ai](https://z.ai)) |
|
||||
| `ZAI_API_KEY` | Alias for `GLM_API_KEY` |
|
||||
| `Z_AI_API_KEY` | Alias for `GLM_API_KEY` |
|
||||
@@ -34,6 +41,12 @@ All variables go in `~/.hermes/.env`. You can also set them with `hermes config
|
||||
| `ANTHROPIC_TOKEN` | Manual or legacy Anthropic OAuth/setup-token override |
|
||||
| `DASHSCOPE_API_KEY` | Alibaba Cloud DashScope API key for Qwen models ([modelstudio.console.alibabacloud.com](https://modelstudio.console.alibabacloud.com/)) |
|
||||
| `DASHSCOPE_BASE_URL` | Custom DashScope base URL (default: international endpoint) |
|
||||
| `DEEPSEEK_API_KEY` | DeepSeek API key for direct DeepSeek access ([platform.deepseek.com](https://platform.deepseek.com/api_keys)) |
|
||||
| `DEEPSEEK_BASE_URL` | Custom DeepSeek API base URL |
|
||||
| `OPENCODE_ZEN_API_KEY` | OpenCode Zen API key — pay-as-you-go access to curated models ([opencode.ai](https://opencode.ai/auth)) |
|
||||
| `OPENCODE_ZEN_BASE_URL` | Override OpenCode Zen base URL |
|
||||
| `OPENCODE_GO_API_KEY` | OpenCode Go API key — $10/month subscription for open models ([opencode.ai](https://opencode.ai/auth)) |
|
||||
| `OPENCODE_GO_BASE_URL` | Override OpenCode Go base URL |
|
||||
| `CLAUDE_CODE_OAUTH_TOKEN` | Explicit Claude Code token override if you export one manually |
|
||||
| `HERMES_MODEL` | Preferred model name (checked before `LLM_MODEL`, used by gateway) |
|
||||
| `LLM_MODEL` | Default model name (fallback when not set in config.yaml) |
|
||||
@@ -48,7 +61,7 @@ For native Anthropic auth, Hermes prefers Claude Code's own credential files whe
|
||||
|
||||
| Variable | Description |
|
||||
|----------|-------------|
|
||||
| `HERMES_INFERENCE_PROVIDER` | Override provider selection: `auto`, `openrouter`, `nous`, `openai-codex`, `anthropic`, `zai`, `kimi-coding`, `minimax`, `minimax-cn`, `kilocode`, `alibaba` (default: `auto`) |
|
||||
| `HERMES_INFERENCE_PROVIDER` | Override provider selection: `auto`, `openrouter`, `nous`, `openai-codex`, `copilot`, `anthropic`, `zai`, `kimi-coding`, `minimax`, `minimax-cn`, `kilocode`, `alibaba` (default: `auto`) |
|
||||
| `HERMES_PORTAL_BASE_URL` | Override Nous Portal URL (for development/testing) |
|
||||
| `NOUS_INFERENCE_BASE_URL` | Override Nous inference API URL |
|
||||
| `HERMES_NOUS_MIN_KEY_TTL_SECONDS` | Min agent key TTL before re-mint (default: 1800 = 30min) |
|
||||
@@ -64,6 +77,7 @@ For native Anthropic auth, Hermes prefers Claude Code's own credential files whe
|
||||
| `PARALLEL_API_KEY` | AI-native web search ([parallel.ai](https://parallel.ai/)) |
|
||||
| `FIRECRAWL_API_KEY` | Web scraping ([firecrawl.dev](https://firecrawl.dev/)) |
|
||||
| `FIRECRAWL_API_URL` | Custom Firecrawl API endpoint for self-hosted instances (optional) |
|
||||
| `TAVILY_API_KEY` | Tavily API key for AI-native web search, extract, and crawl ([app.tavily.com](https://app.tavily.com/home)) |
|
||||
| `BROWSERBASE_API_KEY` | Browser automation ([browserbase.com](https://browserbase.com/)) |
|
||||
| `BROWSERBASE_PROJECT_ID` | Browserbase project ID |
|
||||
| `BROWSER_USE_API_KEY` | Browser Use cloud browser API key ([browser-use.com](https://browser-use.com/)) |
|
||||
@@ -76,7 +90,9 @@ For native Anthropic auth, Hermes prefers Claude Code's own credential files whe
|
||||
| `GROQ_BASE_URL` | Override the Groq OpenAI-compatible STT endpoint |
|
||||
| `STT_OPENAI_MODEL` | Override the OpenAI STT model (default: `whisper-1`) |
|
||||
| `STT_OPENAI_BASE_URL` | Override the OpenAI-compatible STT endpoint |
|
||||
| `GITHUB_TOKEN` | GitHub token for Skills Hub (higher API rate limits, skill publish) |
|
||||
| `HONCHO_API_KEY` | Cross-session user modeling ([honcho.dev](https://honcho.dev/)) |
|
||||
| `HONCHO_BASE_URL` | Base URL for self-hosted Honcho instances (default: Honcho cloud). No API key required for local instances |
|
||||
| `TINKER_API_KEY` | RL training ([tinker-console.thinkingmachines.ai](https://tinker-console.thinkingmachines.ai/)) |
|
||||
| `WANDB_API_KEY` | RL training metrics ([wandb.ai](https://wandb.ai/)) |
|
||||
| `DAYTONA_API_KEY` | Daytona cloud sandboxes ([daytona.io](https://daytona.io/)) |
|
||||
@@ -192,6 +208,9 @@ For native Anthropic auth, Hermes prefers Claude Code's own credential files whe
|
||||
| `MATRIX_ENCRYPTION` | Enable end-to-end encryption (`true`/`false`, default: `false`) |
|
||||
| `HASS_TOKEN` | Home Assistant Long-Lived Access Token (enables HA platform + tools) |
|
||||
| `HASS_URL` | Home Assistant URL (default: `http://homeassistant.local:8123`) |
|
||||
| `WEBHOOK_ENABLED` | Enable the webhook platform adapter (`true`/`false`) |
|
||||
| `WEBHOOK_PORT` | HTTP server port for receiving webhooks (default: `8644`) |
|
||||
| `WEBHOOK_SECRET` | Global HMAC secret for webhook signature validation (used as fallback when routes don't specify their own) |
|
||||
| `API_SERVER_ENABLED` | Enable the OpenAI-compatible API server (`true`/`false`). Runs alongside other platforms. |
|
||||
| `API_SERVER_KEY` | Bearer token for API server authentication. If empty, all requests are allowed (local-only use). |
|
||||
| `API_SERVER_PORT` | Port for the API server (default: `8642`) |
|
||||
@@ -204,7 +223,7 @@ For native Anthropic auth, Hermes prefers Claude Code's own credential files whe
|
||||
|
||||
| Variable | Description |
|
||||
|----------|-------------|
|
||||
| `HERMES_MAX_ITERATIONS` | Max tool-calling iterations per conversation (default: 60) |
|
||||
| `HERMES_MAX_ITERATIONS` | Max tool-calling iterations per conversation (default: 90) |
|
||||
| `HERMES_TOOL_PROGRESS` | Deprecated compatibility variable for tool progress display. Prefer `display.tool_progress` in `config.yaml`. |
|
||||
| `HERMES_TOOL_PROGRESS_MODE` | Deprecated compatibility variable for tool progress mode. Prefer `display.tool_progress` in `config.yaml`. |
|
||||
| `HERMES_HUMAN_DELAY_MODE` | Response pacing: `off`/`natural`/`custom` |
|
||||
@@ -214,6 +233,7 @@ For native Anthropic auth, Hermes prefers Claude Code's own credential files whe
|
||||
| `HERMES_API_TIMEOUT` | LLM API call timeout in seconds (default: `900`) |
|
||||
| `HERMES_EXEC_ASK` | Enable execution approval prompts in gateway mode (`true`/`false`) |
|
||||
| `HERMES_BACKGROUND_NOTIFICATIONS` | Background process notification mode in gateway: `all` (default), `result`, `error`, `off` |
|
||||
| `HERMES_EPHEMERAL_SYSTEM_PROMPT` | Ephemeral system prompt injected at API-call time (never persisted to sessions) |
|
||||
|
||||
## Session Settings
|
||||
|
||||
|
||||
@@ -42,18 +42,25 @@ API calls go **only to the LLM provider you configure** (e.g., OpenRouter, your
|
||||
|
||||
### Can I use it offline / with local models?
|
||||
|
||||
Yes. Point Hermes at any local OpenAI-compatible server:
|
||||
Yes. Run `hermes model`, select **Custom endpoint**, and enter your server's URL:
|
||||
|
||||
```bash
|
||||
hermes config set OPENAI_BASE_URL http://localhost:11434/v1 # Ollama
|
||||
hermes config set OPENAI_API_KEY ollama # Any non-empty value
|
||||
hermes config set HERMES_MODEL llama3.1
|
||||
hermes model
|
||||
# Select: Custom endpoint (enter URL manually)
|
||||
# API base URL: http://localhost:11434/v1
|
||||
# API key: ollama
|
||||
# Model name: qwen3.5:27b
|
||||
# Context length: 32768 ← set this to match your server's actual context window
|
||||
```
|
||||
|
||||
You can also save the endpoint interactively with `hermes model`. Hermes persists that custom endpoint in `config.yaml`, and auxiliary tasks configured with provider `main` follow the same saved endpoint.
|
||||
Hermes persists the endpoint in `config.yaml` and prompts for the context window size so compression triggers at the right time. If you leave context length blank, Hermes auto-detects it from the server's `/models` endpoint or [models.dev](https://models.dev).
|
||||
|
||||
This works with Ollama, vLLM, llama.cpp server, SGLang, LocalAI, and others. See the [Configuration guide](../user-guide/configuration.md) for details.
|
||||
|
||||
:::tip Ollama users
|
||||
If you set a custom `num_ctx` in Ollama (e.g., `ollama run --num_ctx 16384`), make sure to set the matching context length in Hermes — Ollama's `/api/show` reports the model's *maximum* context, not the effective `num_ctx` you configured.
|
||||
:::
|
||||
|
||||
### How much does it cost?
|
||||
|
||||
Hermes Agent itself is **free and open-source** (MIT license). You pay only for the LLM API usage from your chosen provider. Local models are completely free to run.
|
||||
@@ -200,7 +207,7 @@ hermes chat --model openrouter/meta-llama/llama-3.1-70b-instruct
|
||||
|
||||
#### Context length exceeded
|
||||
|
||||
**Cause:** The conversation has grown too long for the model's context window.
|
||||
**Cause:** The conversation has grown too long for the model's context window, or Hermes detected the wrong context length for your model.
|
||||
|
||||
**Solution:**
|
||||
```bash
|
||||
@@ -214,6 +221,35 @@ hermes chat
|
||||
hermes chat --model openrouter/google/gemini-2.0-flash-001
|
||||
```
|
||||
|
||||
If this happens on the first long conversation, Hermes may have the wrong context length for your model. Check what it detected:
|
||||
|
||||
```bash
|
||||
# Look at the status bar — it shows the detected context length
|
||||
/context
|
||||
```
|
||||
|
||||
To fix context detection, set it explicitly:
|
||||
|
||||
```yaml
|
||||
# In ~/.hermes/config.yaml
|
||||
model:
|
||||
default: your-model-name
|
||||
context_length: 131072 # your model's actual context window
|
||||
```
|
||||
|
||||
Or for custom endpoints, add it per-model:
|
||||
|
||||
```yaml
|
||||
custom_providers:
|
||||
- name: "My Server"
|
||||
base_url: "http://localhost:11434/v1"
|
||||
models:
|
||||
qwen3.5:27b:
|
||||
context_length: 32768
|
||||
```
|
||||
|
||||
See [Context Length Detection](../user-guide/configuration.md#context-length-detection) for how auto-detection works and all override options.
|
||||
|
||||
---
|
||||
|
||||
### Terminal Issues
|
||||
|
||||
@@ -21,9 +21,8 @@ Type `/` in the CLI to open the autocomplete menu. Built-in commands are case-in
|
||||
|
||||
| Command | Description |
|
||||
|---------|-------------|
|
||||
| `/new` | Start a new conversation (reset history) |
|
||||
| `/reset` | Reset conversation only (keep screen) |
|
||||
| `/clear` | Clear screen and reset conversation (fresh start) |
|
||||
| `/new` (alias: `/reset`) | Start a new session (fresh session ID + history) |
|
||||
| `/clear` | Clear screen and start a new session |
|
||||
| `/history` | Show conversation history |
|
||||
| `/save` | Save the current conversation |
|
||||
| `/retry` | Retry the last message (resend to agent) |
|
||||
@@ -31,6 +30,8 @@ Type `/` in the CLI to open the autocomplete menu. Built-in commands are case-in
|
||||
| `/title` | Set a title for the current session (usage: /title My Session Name) |
|
||||
| `/compress` | Manually compress conversation context (flush memories + summarize) |
|
||||
| `/rollback` | List or restore filesystem checkpoints (usage: /rollback [number]) |
|
||||
| `/stop` | Kill all running background processes |
|
||||
| `/statusbar` (alias: `/sb`) | Toggle the context/model status bar on or off |
|
||||
| `/background <prompt>` | Run a prompt in a separate background session. The agent processes your prompt independently — your current session stays free for other work. Results appear as a panel when the task finishes. See [CLI Background Sessions](/docs/user-guide/cli#background-sessions). |
|
||||
| `/plan [request]` | Load the bundled `plan` skill to write a markdown plan instead of executing the work. Plans are saved under `.hermes/plans/` relative to the active workspace/backend working directory. |
|
||||
|
||||
@@ -58,6 +59,7 @@ Type `/` in the CLI to open the autocomplete menu. Built-in commands are case-in
|
||||
| `/skills` | Search, install, inspect, or manage skills from online registries |
|
||||
| `/cron` | Manage scheduled tasks (list, add/create, edit, pause, resume, run, remove) |
|
||||
| `/reload-mcp` | Reload MCP servers from config.yaml |
|
||||
| `/plugins` | List installed plugins and their status |
|
||||
|
||||
### Info
|
||||
|
||||
@@ -95,7 +97,7 @@ The messaging gateway supports the following built-in commands inside Telegram,
|
||||
| `/new` | Start a new conversation. |
|
||||
| `/reset` | Reset conversation history. |
|
||||
| `/status` | Show session info. |
|
||||
| `/stop` | Interrupt the running agent without queuing a follow-up prompt. |
|
||||
| `/stop` | Kill all running background processes and interrupt the running agent. |
|
||||
| `/model [provider:model]` | Show or change the model, including provider switches. |
|
||||
| `/provider` | Show provider availability and auth status. |
|
||||
| `/personality [name]` | Set a personality overlay for the session. |
|
||||
@@ -113,13 +115,15 @@ The messaging gateway supports the following built-in commands inside Telegram,
|
||||
| `/background <prompt>` | Run a prompt in a separate background session. Results are delivered back to the same chat when the task finishes. See [Messaging Background Sessions](/docs/user-guide/messaging/#background-sessions). |
|
||||
| `/plan [request]` | Load the bundled `plan` skill to write a markdown plan instead of executing the work. Plans are saved under `.hermes/plans/` relative to the active workspace/backend working directory. |
|
||||
| `/reload-mcp` | Reload MCP servers from config. |
|
||||
| `/approve` | Approve and execute a pending dangerous command (terminal commands flagged for review). |
|
||||
| `/deny` | Reject a pending dangerous command. |
|
||||
| `/update` | Update Hermes Agent to the latest version. |
|
||||
| `/help` | Show messaging help. |
|
||||
| `/<skill-name>` | Invoke any installed skill by name. |
|
||||
|
||||
## Notes
|
||||
|
||||
- `/skin`, `/tools`, `/toolsets`, `/browser`, `/config`, `/prompt`, `/cron`, `/skills`, `/platforms`, `/paste`, and `/verbose` are **CLI-only** commands.
|
||||
- `/status`, `/stop`, `/sethome`, `/resume`, and `/update` are **messaging-only** commands.
|
||||
- `/skin`, `/tools`, `/toolsets`, `/browser`, `/config`, `/prompt`, `/cron`, `/skills`, `/platforms`, `/paste`, `/verbose`, `/statusbar`, and `/plugins` are **CLI-only** commands.
|
||||
- `/status`, `/sethome`, `/update`, `/approve`, and `/deny` are **messaging-only** commands.
|
||||
- `/background`, `/voice`, `/reload-mcp`, and `/rollback` work in **both** the CLI and the messaging gateway.
|
||||
- `/voice join`, `/voice channel`, and `/voice leave` are only meaningful on Discord.
|
||||
|
||||
@@ -141,6 +141,19 @@ This page documents the built-in Hermes tool registry as it exists in code. Avai
|
||||
|------|-------------|----------------------|
|
||||
| `todo` | Manage your task list for the current session. Use for complex tasks with 3+ steps or when the user provides multiple tasks. Call with no parameters to read the current list. Writing: - Provide 'todos' array to create/update items - merge=… | — |
|
||||
|
||||
## `vision` toolset
|
||||
|
||||
| Tool | Description | Requires environment |
|
||||
|------|-------------|----------------------|
|
||||
| `vision_analyze` | Analyze images using AI vision. Provides a comprehensive description and answers a specific question about the image content. | — |
|
||||
|
||||
## `web` toolset
|
||||
|
||||
| Tool | Description | Requires environment |
|
||||
|------|-------------|----------------------|
|
||||
| `web_search` | Search the web for information on any topic. Returns up to 5 relevant results with titles, URLs, and descriptions. | PARALLEL_API_KEY or FIRECRAWL_API_KEY or TAVILY_API_KEY |
|
||||
| `web_extract` | Extract content from web page URLs. Returns page content in markdown format. Also works with PDF URLs — pass the PDF link directly and it converts to markdown text. Pages under 5000 chars return full markdown; larger pages are LLM-summarized. | PARALLEL_API_KEY or FIRECRAWL_API_KEY or TAVILY_API_KEY |
|
||||
|
||||
## `tts` toolset
|
||||
|
||||
| Tool | Description | Requires environment |
|
||||
|
||||
Some files were not shown because too many files have changed in this diff Show More
Reference in New Issue
Block a user