fix(flush_memories): always deduct headroom + resolve flush aux model + trim defence

Three fixes for flush_memories / compression context window overflow: 1. ALWAYS deduct headroom before comparing aux_context vs threshold. #15631 only deducted inside 'if aux_context < threshold' — which never fires in the common same-model case (threshold = context × 0.50 means aux_context > threshold always). Now headroom is computed unconditionally and effective_limit = aux_context - headroom is compared against threshold. 2. Also resolve flush_memories auxiliary model in the feasibility check. If the user configures separate auxiliary.flush_memories provider, the flush model's smaller context was unchecked. 3. Defence-in-depth trimming in flush_memories() for CLI /new and gateway resets that bypass preflight compression entirely.
fix(compression): pass provider to context length resolver in feasibility check
2026-04-25 19:53:54 +05:30 · 2026-04-25 07:09:47 -07:00 · 2026-04-25 06:59:24 -07:00 · 2026-04-25 06:55:35 -07:00 · 2026-04-25 06:41:58 -07:00 · 2026-04-25 06:11:22 -07:00
52 changed files with 4202 additions and 1068 deletions
@@ -1349,6 +1349,49 @@ def _is_auth_error(exc: Exception) -> bool:
    return "error code: 401" in err_lower or "authenticationerror" in type(exc).__name__.lower()


+def _is_unsupported_parameter_error(exc: Exception, param: str) -> bool:
+    """Detect provider 400s for an unsupported request parameter.
+
+    Different OpenAI-compatible endpoints phrase the same class of error a few
+    ways: ``Unsupported parameter: X``, ``unsupported_parameter`` with a
+    ``param`` field, ``X is not supported``, ``unknown parameter: X``,
+    ``unrecognized request argument: X``.  We match on both the parameter
+    name and a generic "unsupported/unknown/unrecognized parameter" marker so
+    call sites can reactively retry without the offending key instead of
+    surfacing a noisy auxiliary failure.
+
+    Generalizes the temperature-specific detector that originally shipped
+    with PR #15621 so the same retry strategy can cover ``max_tokens``,
+    ``seed``, ``top_p``, and any future quirk. Credit @nicholasrae (PR #15416)
+    for the generalization pattern.
+    """
+    param_lower = (param or "").lower()
+    if not param_lower:
+        return False
+    err_lower = str(exc).lower()
+    if param_lower not in err_lower:
+        return False
+    return any(marker in err_lower for marker in (
+        "unsupported parameter",
+        "unsupported_parameter",
+        "not supported",
+        "does not support",
+        "unknown parameter",
+        "unrecognized request argument",
+        "unrecognized parameter",
+        "invalid parameter",
+    ))
+
+
+def _is_unsupported_temperature_error(exc: Exception) -> bool:
+    """Back-compat wrapper: detect API errors where the model rejects ``temperature``.
+
+    Delegates to :func:`_is_unsupported_parameter_error`; kept as a separate
+    public symbol because existing tests and call sites import it by name.
+    """
+    return _is_unsupported_parameter_error(exc, "temperature")
+
+
 def _evict_cached_clients(provider: str) -> None:
    """Drop cached auxiliary clients for a provider so fresh creds are used."""
    normalized = _normalize_aux_provider(provider)
@@ -2952,13 +2995,45 @@ def call_llm(
    if _is_anthropic_compat_endpoint(resolved_provider, _client_base):
        kwargs["messages"] = _convert_openai_images_to_anthropic(kwargs["messages"])

-    # Handle max_tokens vs max_completion_tokens retry, then payment fallback.
+    # Handle unsupported temperature, max_tokens vs max_completion_tokens retry,
+    # then payment fallback.
    try:
        return _validate_llm_response(
            client.chat.completions.create(**kwargs), task)
    except Exception as first_err:
+        if "temperature" in kwargs and _is_unsupported_temperature_error(first_err):
+            retry_kwargs = dict(kwargs)
+            retry_kwargs.pop("temperature", None)
+            logger.info(
+                "Auxiliary %s: provider rejected temperature; retrying once without it",
+                task or "call",
+            )
+            try:
+                return _validate_llm_response(
+                    client.chat.completions.create(**retry_kwargs), task)
+            except Exception as retry_err:
+                retry_err_str = str(retry_err)
+                # If retry still fails, fall through to the max_tokens /
+                # payment / auth chains below using the temperature-stripped
+                # kwargs.  Re-raise only if the retry hit something those
+                # chains won't handle.
+                if not (
+                    _is_payment_error(retry_err)
+                    or _is_connection_error(retry_err)
+                    or _is_auth_error(retry_err)
+                    or "max_tokens" in retry_err_str
+                    or "unsupported_parameter" in retry_err_str
+                ):
+                    raise
+                first_err = retry_err
+                kwargs = retry_kwargs
+
        err_str = str(first_err)
-        if "max_tokens" in err_str or "unsupported_parameter" in err_str:
+        if max_tokens is not None and (
+            "max_tokens" in err_str
+            or "unsupported_parameter" in err_str
+            or _is_unsupported_parameter_error(first_err, "max_tokens")
+        ):
            kwargs.pop("max_tokens", None)
            kwargs["max_completion_tokens"] = max_tokens
            try:
@@ -3221,8 +3296,35 @@ async def async_call_llm(
        return _validate_llm_response(
            await client.chat.completions.create(**kwargs), task)
    except Exception as first_err:
+        if "temperature" in kwargs and _is_unsupported_temperature_error(first_err):
+            retry_kwargs = dict(kwargs)
+            retry_kwargs.pop("temperature", None)
+            logger.info(
+                "Auxiliary %s (async): provider rejected temperature; retrying once without it",
+                task or "call",
+            )
+            try:
+                return _validate_llm_response(
+                    await client.chat.completions.create(**retry_kwargs), task)
+            except Exception as retry_err:
+                retry_err_str = str(retry_err)
+                if not (
+                    _is_payment_error(retry_err)
+                    or _is_connection_error(retry_err)
+                    or _is_auth_error(retry_err)
+                    or "max_tokens" in retry_err_str
+                    or "unsupported_parameter" in retry_err_str
+                ):
+                    raise
+                first_err = retry_err
+                kwargs = retry_kwargs
+
        err_str = str(first_err)
-        if "max_tokens" in err_str or "unsupported_parameter" in err_str:
+        if max_tokens is not None and (
+            "max_tokens" in err_str
+            or "unsupported_parameter" in err_str
+            or _is_unsupported_parameter_error(first_err, "max_tokens")
+        ):
            kwargs.pop("max_tokens", None)
            kwargs["max_completion_tokens"] = max_tokens
            try:
@@ -318,6 +318,13 @@ class ContextCompressor(ContextEngine):
            int(context_length * self.threshold_percent),
            MINIMUM_CONTEXT_LENGTH,
        )
+        # Recalculate token budgets for the new context length so the
+        # compressor stays calibrated after a model switch (e.g. 200K → 32K).
+        target_tokens = int(self.threshold_tokens * self.summary_target_ratio)
+        self.tail_token_budget = target_tokens
+        self.max_summary_tokens = min(
+            int(context_length * 0.05), _SUMMARY_TOKENS_CEILING,
+        )

    def __init__(
        self,
@@ -796,6 +796,10 @@ delegation:
                                              # Raise to 2 to allow workers to spawn their own subagents.
                                              # Requires role="orchestrator" on intermediate agents.
  # orchestrator_enabled: true                # Kill switch for role="orchestrator" children (default: true).
+  # subagent_auto_approve: false              # When a subagent hits a dangerous-command approval prompt, auto-deny (default: false)
+                                              # or auto-approve "once" (true) instead of blocking on stdin.
+                                              # The parent TUI owns stdin, so blocking would deadlock; non-interactive resolution is required.
+                                              # Both choices emit a logger.warning audit line. Flip to true only for cron/batch pipelines.
  # inherit_mcp_toolsets: true                # When explicit child toolsets are narrowed, also keep the parent's MCP toolsets (default: true). Set false for strict intersection.
  # model: "google/gemini-3-flash-preview"    # Override model for subagents (empty = inherit parent)
  # provider: "openrouter"                    # Override provider for subagents (empty = inherit parent)
@@ -3176,7 +3176,14 @@ class HermesCLI:
        # the configured model (e.g. "qwen3.6-plus"), causing 400 errors.
        runtime_model = runtime.get("model")
        if runtime_model and isinstance(runtime_model, str):
-            self.model = runtime_model
+            # Only use runtime model if: model is unset, or model equals provider name
+            should_use_runtime_model = (
+                not self.model or  # No model configured yet
+                self.model == self.provider or  # Model is the provider slug
+                self.model == runtime.get("name")  # Model matches provider display name
+            )
+            if should_use_runtime_model:
+                self.model = runtime_model

        # If model is still empty (e.g. user ran `hermes auth add openai-codex`
        # without `hermes model`), fall back to the provider's first catalog
@@ -16,7 +16,7 @@ import uuid
 from datetime import datetime, timedelta
 from pathlib import Path
 from hermes_constants import get_hermes_home
-from typing import Optional, Dict, List, Any
+from typing import Optional, Dict, List, Any, Union

 logger = logging.getLogger(__name__)

@@ -417,6 +417,7 @@ def create_job(
    provider: Optional[str] = None,
    base_url: Optional[str] = None,
    script: Optional[str] = None,
+    context_from: Optional[Union[str, List[str]]] = None,
    enabled_toolsets: Optional[List[str]] = None,
    workdir: Optional[str] = None,
 ) -> Dict[str, Any]:
@@ -438,6 +439,9 @@ def create_job(
        script: Optional path to a Python script whose stdout is injected into the
                prompt each run.  The script runs before the agent turn, and its output
                is prepended as context.  Useful for data collection / change detection.
+        context_from: Optional job ID (or list of job IDs) whose most recent output
+                      is injected into the prompt as context before each run.
+                      Useful for chaining cron jobs: job A finds data, job B processes it.
        enabled_toolsets: Optional list of toolset names to restrict the agent to.
                          When set, only tools from these toolsets are loaded, reducing
                          token overhead. When omitted, all default tools are loaded.
@@ -481,6 +485,14 @@ def create_job(
    normalized_toolsets = normalized_toolsets or None
    normalized_workdir = _normalize_workdir(workdir)

+    # Normalize context_from: accept str or list of str, store as list or None
+    if isinstance(context_from, str):
+        context_from = [context_from.strip()] if context_from.strip() else None
+    elif isinstance(context_from, list):
+        context_from = [str(j).strip() for j in context_from if str(j).strip()] or None
+    else:
+        context_from = None
+
    label_source = (prompt or (normalized_skills[0] if normalized_skills else None)) or "cron job"
    job = {
        "id": job_id,
@@ -492,6 +504,7 @@ def create_job(
        "provider": normalized_provider,
        "base_url": normalized_base_url,
        "script": normalized_script,
+        "context_from": context_from,
        "schedule": parsed_schedule,
        "schedule_display": parsed_schedule.get("display", schedule),
        "repeat": {
@@ -671,6 +671,47 @@ def _build_job_prompt(job: dict, prerun_script: Optional[tuple] = None) -> str:
                f"{prompt}"
            )

+    # Inject output from referenced cron jobs as context.
+    context_from = job.get("context_from")
+    if context_from:
+        from cron.jobs import OUTPUT_DIR
+        if isinstance(context_from, str):
+            context_from = [context_from]
+        for source_job_id in context_from:
+            # Guard against path traversal — valid job IDs are 12-char hex strings
+            if not source_job_id or not all(c in "0123456789abcdef" for c in source_job_id):
+                logger.warning("context_from: skipping invalid job_id %r", source_job_id)
+                continue
+            try:
+                job_output_dir = OUTPUT_DIR / source_job_id
+                if not job_output_dir.exists():
+                    continue  # silent skip — no output yet
+                output_files = sorted(
+                    job_output_dir.glob("*.md"),
+                    key=lambda f: f.stat().st_mtime,
+                    reverse=True,
+                )
+                if not output_files:
+                    continue  # silent skip — no output yet
+                latest_output = output_files[0].read_text(encoding="utf-8").strip()
+                # Truncate to 8K characters to avoid prompt bloat
+                _MAX_CONTEXT_CHARS = 8000
+                if len(latest_output) > _MAX_CONTEXT_CHARS:
+                    latest_output = latest_output[:_MAX_CONTEXT_CHARS] + "\n\n[... output truncated ...]"
+                if latest_output:
+                    prompt = (
+                        f"## Output from job '{source_job_id}'\n"
+                        "The following is the most recent output from a preceding "
+                        "cron job. Use it as context for your analysis.\n\n"
+                        f"```\n{latest_output}\n```\n\n"
+                        f"{prompt}"
+                    )
+                else:
+                    continue  # silent skip — empty output
+            except (OSError, PermissionError) as e:
+                logger.warning("context_from: failed to read output for job %r: %s", source_job_id, e)
+                # silent skip — do not pollute the prompt with error messages
+
    # Always prepend cron execution guidance so the agent knows how
    # delivery works and can suppress delivery when appropriate.
    cron_hint = (
@@ -2543,6 +2543,9 @@ class BasePlatformAdapter(ABC):
        user_id_alt: Optional[str] = None,
        chat_id_alt: Optional[str] = None,
        is_bot: bool = False,
+        guild_id: Optional[str] = None,
+        parent_chat_id: Optional[str] = None,
+        message_id: Optional[str] = None,
    ) -> SessionSource:
        """Helper to build a SessionSource for this platform."""
        # Normalize empty topic to None
@@ -2560,6 +2563,9 @@ class BasePlatformAdapter(ABC):
            user_id_alt=user_id_alt,
            chat_id_alt=chat_id_alt,
            is_bot=is_bot,
+            guild_id=str(guild_id) if guild_id else None,
+            parent_chat_id=str(parent_chat_id) if parent_chat_id else None,
+            message_id=str(message_id) if message_id else None,
        )
    
    @abstractmethod
@@ -3261,6 +3261,7 @@ class DiscordAdapter(BasePlatformAdapter):
            if auto_thread and not skip_thread and not is_voice_linked_channel and not is_reply_message:
                thread = await self._auto_create_thread(message)
                if thread:
+                    parent_channel_id = str(message.channel.id)
                    is_thread = True
                    thread_id = str(thread.id)
                    auto_threaded_channel = thread
@@ -3320,6 +3321,9 @@ class DiscordAdapter(BasePlatformAdapter):
            thread_id=thread_id,
            chat_topic=chat_topic,
            is_bot=getattr(message.author, "bot", False),
+            guild_id=str(message.guild.id) if message.guild else None,
+            parent_chat_id=parent_channel_id,
+            message_id=str(message.id),
        )

        # Build media URLs -- download image attachments to local cache so the
@@ -87,6 +87,9 @@ class SessionSource:
    user_id_alt: Optional[str] = None  # Platform-specific stable alt ID (Signal UUID, Feishu union_id)
    chat_id_alt: Optional[str] = None  # Signal group internal ID
    is_bot: bool = False  # True when the message author is a bot/webhook (Discord)
+    guild_id: Optional[str] = None  # Discord guild / Slack workspace / Matrix server scope
+    parent_chat_id: Optional[str] = None  # Parent channel when chat_id refers to a thread
+    message_id: Optional[str] = None  # ID of the triggering message (for pin/reply/react)
    
    @property
    def description(self) -> str:
@@ -124,8 +127,14 @@ class SessionSource:
            d["user_id_alt"] = self.user_id_alt
        if self.chat_id_alt:
            d["chat_id_alt"] = self.chat_id_alt
+        if self.guild_id:
+            d["guild_id"] = self.guild_id
+        if self.parent_chat_id:
+            d["parent_chat_id"] = self.parent_chat_id
+        if self.message_id:
+            d["message_id"] = self.message_id
        return d
-    
+
    @classmethod
    def from_dict(cls, data: Dict[str, Any]) -> "SessionSource":
        return cls(
@@ -139,6 +148,9 @@ class SessionSource:
            chat_topic=data.get("chat_topic"),
            user_id_alt=data.get("user_id_alt"),
            chat_id_alt=data.get("chat_id_alt"),
+            guild_id=data.get("guild_id"),
+            parent_chat_id=data.get("parent_chat_id"),
+            message_id=data.get("message_id"),
        )
    

@@ -190,6 +202,31 @@ that requires raw IDs).  Discord is excluded because mentions use ``<@user_id>``
 and the LLM needs the real ID to tag users."""


+def _discord_tools_loaded() -> bool:
+    """True iff the agent will actually have Discord tools this session.
+
+    Two conditions must hold:
+      1. The `discord` or `discord_admin` toolset is enabled for the
+         Discord platform via `hermes tools` (opt-in, default OFF).
+      2. `DISCORD_BOT_TOKEN` is set — the tool's `check_fn` gates on it
+         at registry time, so the toolset being enabled in config is not
+         enough if the token isn't configured.
+
+    Returns False (safe default — keeps the stale-API disclaimer) on any
+    error so a bad config can't silently promise tools the agent lacks.
+    """
+    if not (os.environ.get("DISCORD_BOT_TOKEN") or "").strip():
+        return False
+    try:
+        from hermes_cli.config import load_config
+        from hermes_cli.tools_config import _get_platform_tools
+        cfg = load_config()
+        enabled = _get_platform_tools(cfg, "discord", include_default_mcp_servers=False)
+        return "discord" in enabled or "discord_admin" in enabled
+    except Exception:
+        return False
+
+
 def build_session_context_prompt(
    context: SessionContext,
    *,
@@ -277,14 +314,33 @@ def build_session_context_prompt(
            "that you can only read messages sent directly to you and respond."
        )
    elif context.source.platform == Platform.DISCORD:
-        lines.append("")
-        lines.append(
-            "**Platform notes:** You are running inside Discord. "
-            "You do NOT have access to Discord-specific APIs — you cannot search "
-            "channel history, pin messages, manage roles, or list server members. "
-            "Do not promise to perform these actions. If the user asks, explain "
-            "that you can only read messages sent directly to you and respond."
-        )
+        # Inject the Discord IDs block only when the agent actually has
+        # Discord tools loaded this session — i.e. the user opted into
+        # `discord` / `discord_admin` via `hermes tools` AND the bot
+        # token is configured.  Otherwise keep the stale-API disclaimer
+        # honest so we never promise tools the agent lacks.
+        if _discord_tools_loaded():
+            src = context.source
+            id_lines = ["", "**Discord IDs (for the `discord` / `discord_admin` tools):**"]
+            if src.guild_id:
+                id_lines.append(f"  - Guild: `{src.guild_id}`")
+            if src.thread_id and src.parent_chat_id:
+                id_lines.append(f"  - Parent channel: `{src.parent_chat_id}`")
+                id_lines.append(f"  - Thread: `{src.thread_id}` (use as `channel_id` for fetch_messages etc.)")
+            else:
+                id_lines.append(f"  - Channel: `{src.chat_id}`")
+            if src.message_id:
+                id_lines.append(f"  - Triggering message: `{src.message_id}`")
+            lines.extend(id_lines)
+        else:
+            lines.append("")
+            lines.append(
+                "**Platform notes:** You are running inside Discord. "
+                "You do NOT have access to Discord-specific APIs — you cannot search "
+                "channel history, pin messages, manage roles, or list server members. "
+                "Do not promise to perform these actions. If the user asks, explain "
+                "that you can only read messages sent directly to you and respond."
+            )
    elif context.source.platform == Platform.BLUEBUBBLES:
        lines.append("")
        lines.append(
@@ -783,6 +783,15 @@ DEFAULT_CONFIG = {
        # warning log if out of range.
        "max_spawn_depth": 1,        # depth cap (1 = flat [default], 2 = orchestrator→leaf, 3 = three-level)
        "orchestrator_enabled": True,  # kill switch for role="orchestrator"
+        # When a subagent hits a dangerous-command approval prompt, the parent's
+        # prompt_toolkit TUI owns stdin — a thread-local input() call from the
+        # subagent worker would deadlock the parent UI. To avoid the deadlock,
+        # subagent threads ALWAYS resolve approvals non-interactively:
+        #   false (default) → auto-deny with a logger.warning audit line (safe)
+        #   true             → auto-approve "once" with a logger.warning audit line
+        # Flip to true only if you trust delegated work to run dangerous cmds
+        # without human review (cron pipelines, batch automation, etc.).
+        "subagent_auto_approve": False,
    },

    # Ephemeral prefill messages file — JSON list of {role, content} dicts
@@ -839,7 +848,7 @@ DEFAULT_CONFIG = {
        "auto_thread": True,           # Auto-create threads on @mention in channels (like Slack)
        "reactions": True,             # Add 👀/✅/❌ reactions to messages during processing
        "channel_prompts": {},         # Per-channel ephemeral system prompts (forum parents apply to child threads)
-        # discord_server tool: restrict which actions the agent may call.
+        # discord / discord_admin tools: restrict which actions the agent may call.
        # Default (empty) = all actions allowed (subject to bot privileged intents).
        # Accepts comma-separated string ("list_guilds,list_channels,fetch_messages")
        # or YAML list. Unknown names are dropped with a warning at load time.
@@ -6046,6 +6046,31 @@ def _cmd_update_impl(args, gateway_mode: bool):
            )
            import signal as _signal

+            def _wait_for_service_active(
+                scope_cmd_: list, svc_name_: str, timeout: float = 10.0,
+            ) -> bool:
+                """Poll ``systemctl is-active`` until the unit reports active.
+
+                systemd's Stopped -> Started transition after a graceful exit
+                (or a hard restart) is not instantaneous; a one-shot check
+                races that window and falsely reports the unit as down.
+                Poll every 0.5s up to ``timeout`` seconds before giving up.
+                """
+                deadline = _time.monotonic() + max(timeout, 0.5)
+                while True:
+                    try:
+                        _verify = subprocess.run(
+                            scope_cmd_ + ["is-active", svc_name_],
+                            capture_output=True, text=True, timeout=5,
+                        )
+                        if _verify.stdout.strip() == "active":
+                            return True
+                    except (FileNotFoundError, subprocess.TimeoutExpired):
+                        pass
+                    if _time.monotonic() >= deadline:
+                        return False
+                    _time.sleep(0.5)
+
            # Drain budget for graceful SIGUSR1 restarts.  The gateway drains
            # for up to ``agent.restart_drain_timeout`` (default 60s) before
            # exiting with code 75; we wait slightly longer so the drain
@@ -6152,14 +6177,14 @@ def _cmd_update_impl(args, gateway_mode: bool):

                            if _graceful_ok:
                                # Gateway exited 75; systemd should relaunch
-                                # via Restart=on-failure.  Verify the new
-                                # process came up.
-                                _time.sleep(3)
-                                verify = subprocess.run(
-                                    scope_cmd + ["is-active", svc_name],
-                                    capture_output=True, text=True, timeout=5,
-                                )
-                                if verify.stdout.strip() == "active":
+                                # via Restart=on-failure.  Poll is-active for
+                                # up to ~10s because the unit's Stopped ->
+                                # Started transition can take a few seconds
+                                # after the old PID exits, and a one-shot
+                                # check races that window.
+                                if _wait_for_service_active(
+                                    scope_cmd, svc_name, timeout=10.0,
+                                ):
                                    restarted_services.append(svc_name)
                                    continue
                                # Process exited but wasn't respawned (older
@@ -6185,14 +6210,9 @@ def _cmd_update_impl(args, gateway_mode: bool):
                                # Verify the service actually survived the
                                # restart.  systemctl restart returns 0 even
                                # if the new process crashes immediately.
-                                _time.sleep(3)
-                                verify = subprocess.run(
-                                    scope_cmd + ["is-active", svc_name],
-                                    capture_output=True,
-                                    text=True,
-                                    timeout=5,
-                                )
-                                if verify.stdout.strip() == "active":
+                                if _wait_for_service_active(
+                                    scope_cmd, svc_name, timeout=10.0,
+                                ):
                                    restarted_services.append(svc_name)
                                else:
                                    # Retry once — transient startup failures
@@ -6207,14 +6227,9 @@ def _cmd_update_impl(args, gateway_mode: bool):
                                        text=True,
                                        timeout=15,
                                    )
-                                    _time.sleep(3)
-                                    verify2 = subprocess.run(
-                                        scope_cmd + ["is-active", svc_name],
-                                        capture_output=True,
-                                        text=True,
-                                        timeout=5,
-                                    )
-                                    if verify2.stdout.strip() == "active":
+                                    if _wait_for_service_active(
+                                        scope_cmd, svc_name, timeout=10.0,
+                                    ):
                                        restarted_services.append(svc_name)
                                        print(f"  ✓ {svc_name} recovered on retry")
                                    else:
@@ -68,25 +68,58 @@ CONFIGURABLE_TOOLSETS = [
    ("rl",              "🧪 RL Training",               "Tinker-Atropos training tools"),
    ("homeassistant",    "🏠 Home Assistant",           "smart home device control"),
    ("spotify",          "🎵 Spotify",                  "playback, search, playlists, library"),
+    ("discord",         "💬 Discord (read/participate)", "fetch messages, search members, create thread"),
+    ("discord_admin",   "🛡️  Discord Server Admin",    "list channels/roles, pin, assign roles"),
 ]

 # Toolsets that are OFF by default for new installs.
 # They're still in _HERMES_CORE_TOOLS (available at runtime if enabled),
 # but the setup checklist won't pre-select them for first-time users.
-_DEFAULT_OFF_TOOLSETS = {"moa", "homeassistant", "rl", "spotify"}
+_DEFAULT_OFF_TOOLSETS = {"moa", "homeassistant", "rl", "spotify", "discord", "discord_admin"}
+
+# Platform-scoped toolsets: only appear in the `hermes tools` checklist for
+# these platforms, and only resolve/save for these platforms.  A toolset
+# absent from this map is available on every platform (current behaviour).
+#
+# Use this for tools whose APIs only make sense on one platform (Discord
+# server admin, Slack workspace admin, etc.).  Keeps every other platform's
+# checklist from filling up with irrelevant toggles.
+_TOOLSET_PLATFORM_RESTRICTIONS: Dict[str, Set[str]] = {
+    "discord": {"discord"},
+    "discord_admin": {"discord"},
+}
+
+
+def _toolset_allowed_for_platform(ts_key: str, platform: str) -> bool:
+    """Return True if ``ts_key`` is configurable on ``platform``.
+
+    Toolsets without a restriction entry are allowed everywhere (the default).
+    """
+    allowed = _TOOLSET_PLATFORM_RESTRICTIONS.get(ts_key)
+    return allowed is None or platform in allowed


 def _get_effective_configurable_toolsets():
    """Return CONFIGURABLE_TOOLSETS + any plugin-provided toolsets.

    Plugin toolsets are appended at the end so they appear after the
-    built-in toolsets in the TUI checklist.
+    built-in toolsets in the TUI checklist. A plugin whose toolset key
+    already appears in ``CONFIGURABLE_TOOLSETS`` is skipped — bundled
+    plugins (e.g. ``plugins/spotify``) share their toolset key with the
+    built-in entry, and we want the built-in label/description to win.
+    Without the dedupe, ``hermes tools`` → "reconfigure existing" would
+    list the same toolset twice.
    """
    result = list(CONFIGURABLE_TOOLSETS)
+    seen = {ts_key for ts_key, _, _ in result}
    try:
        from hermes_cli.plugins import discover_plugins, get_plugin_toolsets
        discover_plugins()  # idempotent — ensures plugins are loaded
-        result.extend(get_plugin_toolsets())
+        for entry in get_plugin_toolsets():
+            if entry[0] in seen:
+                continue
+            seen.add(entry[0])
+            result.append(entry)
    except Exception:
        pass
    return result
@@ -591,7 +624,7 @@ def _get_platform_tools(
    include_default_mcp_servers: bool = True,
 ) -> Set[str]:
    """Resolve which individual toolset names are enabled for a platform."""
-    from toolsets import resolve_toolset
+    from toolsets import resolve_toolset, TOOLSETS

    platform_toolsets = config.get("platform_toolsets") or {}
    toolset_names = platform_toolsets.get(platform)
@@ -605,6 +638,8 @@ def _get_platform_tools(
    toolset_names = [str(ts) for ts in toolset_names]

    configurable_keys = {ts_key for ts_key, _, _ in CONFIGURABLE_TOOLSETS}
+    plugin_ts_keys = _get_plugin_toolset_keys()
+    platform_default_keys = {p["default_toolset"] for p in PLATFORMS.values()}

    # If the saved list contains any configurable keys directly, the user
    # has explicitly configured this platform — use direct membership.
@@ -614,7 +649,10 @@ def _get_platform_tools(
    has_explicit_config = any(ts in configurable_keys for ts in toolset_names)

    if has_explicit_config:
-        enabled_toolsets = {ts for ts in toolset_names if ts in configurable_keys}
+        enabled_toolsets = {
+            ts for ts in toolset_names
+            if ts in configurable_keys and _toolset_allowed_for_platform(ts, platform)
+        }
    else:
        # No explicit config — fall back to resolving composite toolset names
        # (e.g. "hermes-cli") to individual tool names and reverse-mapping.
@@ -624,14 +662,52 @@ def _get_platform_tools(

        enabled_toolsets = set()
        for ts_key, _, _ in CONFIGURABLE_TOOLSETS:
+            if not _toolset_allowed_for_platform(ts_key, platform):
+                continue
            ts_tools = set(resolve_toolset(ts_key))
            if ts_tools and ts_tools.issubset(all_tool_names):
                enabled_toolsets.add(ts_key)
+
        default_off = set(_DEFAULT_OFF_TOOLSETS)
-        if platform in default_off:
+        # Legacy safety: if the platform's own name matches a default-off
+        # toolset (e.g. `homeassistant` platform + `homeassistant` toolset),
+        # keep that toolset enabled on first install.  Skip this dodge for
+        # platform-restricted toolsets — those are always opt-in even on
+        # their own platform (e.g. `discord` + `discord` should stay OFF).
+        if platform in default_off and platform not in _TOOLSET_PLATFORM_RESTRICTIONS:
            default_off.remove(platform)
        enabled_toolsets -= default_off

+    # Recover non-configurable platform toolsets (e.g. discord, feishu_doc,
+    # feishu_drive).  These are part of the platform's default composite but
+    # absent from CONFIGURABLE_TOOLSETS, so they can't appear in the TUI
+    # checklist or in a user-saved config.  Must run in BOTH branches —
+    # otherwise saving via `hermes tools` (which flips has_explicit_config
+    # to True) silently drops them.
+    platform_tool_universe = set(resolve_toolset(PLATFORMS[platform]["default_toolset"]))
+    configurable_tool_universe = set()
+    for ck in configurable_keys:
+        configurable_tool_universe.update(resolve_toolset(ck))
+    claimed = set()
+    for ts_key in enabled_toolsets:
+        claimed.update(resolve_toolset(ts_key))
+    skip = configurable_keys | plugin_ts_keys | platform_default_keys
+    skip |= {k for k in TOOLSETS if k.startswith("hermes-")}
+    skip |= set(_DEFAULT_OFF_TOOLSETS) - {platform}
+    for ts_key, ts_def in TOOLSETS.items():
+        if ts_key in skip:
+            continue
+        if ts_def.get("includes"):
+            continue
+        ts_tools = set(resolve_toolset(ts_key))
+        if not ts_tools or not ts_tools.issubset(platform_tool_universe):
+            continue
+        if ts_tools.issubset(configurable_tool_universe):
+            continue
+        if not ts_tools.issubset(claimed):
+            enabled_toolsets.add(ts_key)
+            claimed.update(ts_tools)
+
    # Plugin toolsets: enabled by default unless explicitly disabled, or
    # unless the toolset is in _DEFAULT_OFF_TOOLSETS (e.g. spotify —
    # shipped as a bundled plugin but user must opt in via `hermes tools`
@@ -639,7 +715,6 @@ def _get_platform_tools(
    # A plugin toolset is "known" for a platform once `hermes tools`
    # has been saved for that platform (tracked via known_plugin_toolsets).
    # Unknown plugins default to enabled; known-but-absent = disabled.
-    plugin_ts_keys = _get_plugin_toolset_keys()
    if plugin_ts_keys:
        known_map = config.get("known_plugin_toolsets", {})
        known_for_platform = set(known_map.get(platform, []))
@@ -657,7 +732,6 @@ def _get_platform_tools(

    # Preserve any explicit non-configurable toolset entries (for example,
    # custom toolsets or MCP server names saved in platform_toolsets).
-    platform_default_keys = {p["default_toolset"] for p in PLATFORMS.values()}
    explicit_passthrough = {
        ts
        for ts in toolset_names
@@ -703,6 +777,14 @@ def _save_platform_tools(config: dict, platform: str, enabled_toolset_keys: Set[
    """
    config.setdefault("platform_toolsets", {})

+    # Drop platform-scoped toolsets that don't apply here.  Prevents the
+    # "Configure all platforms" checklist (or a hand-edited config.yaml)
+    # from turning on, say, the `discord` toolset for Telegram.
+    enabled_toolset_keys = {
+        ts for ts in enabled_toolset_keys
+        if _toolset_allowed_for_platform(ts, platform)
+    }
+
    # Get the set of all configurable toolset keys (built-in + plugin)
    configurable_keys = {ts_key for ts_key, _, _ in CONFIGURABLE_TOOLSETS}
    plugin_keys = _get_plugin_toolset_keys()
@@ -717,6 +799,7 @@ def _save_platform_tools(config: dict, platform: str, enabled_toolset_keys: Set[
    existing_toolsets = config.get("platform_toolsets", {}).get(platform, [])
    if not isinstance(existing_toolsets, list):
        existing_toolsets = []
+    existing_toolsets = [str(ts) for ts in existing_toolsets]

    # Preserve any entries that are NOT configurable toolsets and NOT platform
    # defaults (i.e. only MCP server names should be preserved)
@@ -724,6 +807,11 @@ def _save_platform_tools(config: dict, platform: str, enabled_toolset_keys: Set[
        entry for entry in existing_toolsets
        if entry not in configurable_keys and entry not in platform_default_keys
    }
+    # Opening `hermes tools` is the user's opt-in to reconfigure tools, so treat
+    # saving from the picker as consent to clear the "no_mcp" sentinel. The
+    # picker has no checkbox for no_mcp, so without this users who once set it
+    # by hand could never re-enable MCP servers through the UI.
+    preserved_entries.discard("no_mcp")

    # Merge preserved entries with new enabled toolsets
    config["platform_toolsets"][platform] = sorted(enabled_toolset_keys | preserved_entries)
@@ -831,7 +919,7 @@ def _estimate_tool_tokens() -> Dict[str, int]:
    return _tool_token_cache


-def _prompt_toolset_checklist(platform_label: str, enabled: Set[str]) -> Set[str]:
+def _prompt_toolset_checklist(platform_label: str, enabled: Set[str], platform: str = "cli") -> Set[str]:
    """Multi-select checklist of toolsets. Returns set of selected toolset keys."""
    from hermes_cli.curses_ui import curses_checklist
    from toolsets import resolve_toolset
@@ -839,7 +927,12 @@ def _prompt_toolset_checklist(platform_label: str, enabled: Set[str]) -> Set[str
    # Pre-compute per-tool token counts (cached after first call).
    tool_tokens = _estimate_tool_tokens()

-    effective = _get_effective_configurable_toolsets()
+    effective_all = _get_effective_configurable_toolsets()
+    # Drop platform-scoped toolsets that don't apply to this platform.
+    effective = [
+        (k, l, d) for (k, l, d) in effective_all
+        if _toolset_allowed_for_platform(k, platform)
+    ]

    labels = []
    for ts_key, ts_label, ts_desc in effective:
@@ -1753,7 +1846,7 @@ def tools_command(args=None, first_install: bool = False, config: dict = None):
            checklist_preselected = current_enabled - _DEFAULT_OFF_TOOLSETS

            # Show checklist
-            new_enabled = _prompt_toolset_checklist(pinfo["label"], checklist_preselected)
+            new_enabled = _prompt_toolset_checklist(pinfo["label"], checklist_preselected, pkey)

            added = new_enabled - current_enabled
            removed = current_enabled - new_enabled
@@ -2109,7 +2202,11 @@ def _apply_mcp_change(config: dict, targets: List[str], action: str) -> Set[str]

 def _print_tools_list(enabled_toolsets: set, mcp_servers: dict, platform: str = "cli"):
    """Print a summary of enabled/disabled toolsets and MCP tool filters."""
-    effective = _get_effective_configurable_toolsets()
+    effective_all = _get_effective_configurable_toolsets()
+    effective = [
+        (k, l, d) for (k, l, d) in effective_all
+        if _toolset_allowed_for_platform(k, platform)
+    ]
    builtin_keys = {ts_key for ts_key, _, _ in CONFIGURABLE_TOOLSETS}

    print(f"Built-in toolsets ({platform}):")
@@ -2175,6 +2272,20 @@ def tools_disable_enable_command(args):
            _print_error(f"Unknown toolset '{name}'")
        toolset_targets = [t for t in toolset_targets if t in valid_toolsets]

+    # Reject platform-scoped toolsets on platforms that don't allow them.
+    restricted_targets = [
+        t for t in toolset_targets
+        if not _toolset_allowed_for_platform(t, platform)
+    ]
+    if restricted_targets:
+        for name in restricted_targets:
+            allowed = sorted(_TOOLSET_PLATFORM_RESTRICTIONS.get(name) or set())
+            _print_error(
+                f"Toolset '{name}' is not available on platform '{platform}' "
+                f"(only: {', '.join(allowed)})"
+            )
+        toolset_targets = [t for t in toolset_targets if t not in restricted_targets]
+
    if toolset_targets:
        _apply_toolset_change(config, platform, toolset_targets, action)

@@ -53,7 +53,7 @@ try:
    from fastapi.middleware.cors import CORSMiddleware
    from fastapi.responses import FileResponse, HTMLResponse, JSONResponse
    from fastapi.staticfiles import StaticFiles
-    from pydantic import BaseModel, field_validator
+    from pydantic import BaseModel
 except ImportError:
    raise SystemExit(
        "Web UI requires fastapi and uvicorn.\n"
@@ -425,20 +425,6 @@ class EnvVarUpdate(BaseModel):
    key: str
    value: str

-    @field_validator("key")
-    @classmethod
-    def key_must_be_nonempty(cls, v: str) -> str:
-        if not v.strip():
-            raise ValueError("key must not be empty")
-        return v
-
-    @field_validator("value")
-    @classmethod
-    def value_must_be_nonempty(cls, v: str) -> str:
-        if not v.strip():
-            raise ValueError("value must not be empty; use DELETE /api/env to remove a key")
-        return v
-

 class EnvVarDelete(BaseModel):
    key: str
@@ -288,30 +288,34 @@ def get_tool_definitions(
                filtered_tools[i] = {"type": "function", "function": dynamic_schema}
                break

-    # Rebuild discord_server schema based on the bot's privileged intents
-    # (detected from GET /applications/@me) and the user's action allowlist
-    # in config.  Hides actions the bot's intents don't support so the
-    # model never attempts them, and annotates fetch_messages when the
+    # Rebuild discord / discord_admin schemas based on the bot's privileged
+    # intents (detected from GET /applications/@me) and the user's action
+    # allowlist in config.  Hides actions the bot's intents don't support so
+    # the model never attempts them, and annotates fetch_messages when the
    # MESSAGE_CONTENT intent is missing.
-    if "discord_server" in available_tool_names:
-        try:
-            from tools.discord_tool import get_dynamic_schema
-            dynamic = get_dynamic_schema()
-        except Exception:  # pragma: no cover — defensive, fall back to static
-            dynamic = None
-        if dynamic is None:
-            # Tool filtered out entirely (empty allowlist or detection disabled
-            # the only remaining actions).  Drop it from the schema list.
-            filtered_tools = [
-                t for t in filtered_tools
-                if t.get("function", {}).get("name") != "discord_server"
-            ]
-            available_tool_names.discard("discord_server")
-        else:
-            for i, td in enumerate(filtered_tools):
-                if td.get("function", {}).get("name") == "discord_server":
-                    filtered_tools[i] = {"type": "function", "function": dynamic}
-                    break
+    _discord_schema_fns = {
+        "discord": "get_dynamic_schema_core",
+        "discord_admin": "get_dynamic_schema_admin",
+    }
+    for discord_tool_name in _discord_schema_fns:
+        if discord_tool_name in available_tool_names:
+            try:
+                from tools import discord_tool as _dt
+                schema_fn = getattr(_dt, _discord_schema_fns[discord_tool_name])
+                dynamic = schema_fn()
+            except Exception:
+                dynamic = None
+            if dynamic is None:
+                filtered_tools = [
+                    t for t in filtered_tools
+                    if t.get("function", {}).get("name") != discord_tool_name
+                ]
+                available_tool_names.discard(discord_tool_name)
+            else:
+                for i, td in enumerate(filtered_tools):
+                    if td.get("function", {}).get("name") == discord_tool_name:
+                        filtered_tools[i] = {"type": "function", "function": dynamic}
+                        break

    # Strip web tool cross-references from browser_navigate description when
    # web_search / web_extract are not available.  The static schema says
@@ -91,4 +91,29 @@

  // Register this plugin — the dashboard picks it up automatically.
  window.__HERMES_PLUGINS__.register("example", ExamplePage);
+
+  // ─────────────────────────────────────────────────────────────────────
+  // Page-scoped slot demo: inject a small banner at the top of /sessions.
+  //
+  // Built-in pages expose named slots (<page>:top, <page>:bottom) that
+  // plugins can populate without overriding the whole route. The
+  // manifest lists the slots we use in its `slots` array so the shell
+  // knows to render <PluginSlot name="sessions:top" /> there.
+  // ─────────────────────────────────────────────────────────────────────
+  function SessionsTopBanner() {
+    return React.createElement(Card, {
+      className: "border-dashed",
+    },
+      React.createElement(CardContent, { className: "flex items-center gap-3 py-2" },
+        React.createElement(Badge, { variant: "outline" }, "Example"),
+        React.createElement("span", {
+          className: "text-xs text-muted-foreground",
+        }, "This banner was injected into the Sessions page by the example plugin via the ",
+          React.createElement("code", { className: "font-courier" }, "sessions:top"),
+          " slot."),
+      ),
+    );
+  }
+
+  window.__HERMES_PLUGINS__.registerSlot("example", "sessions:top", SessionsTopBanner);
 })();
@@ -8,6 +8,7 @@
    "path": "/example",
    "position": "after:skills"
  },
+  "slots": ["sessions:top"],
  "entry": "dist/index.js",
  "api": "plugin_api.py"
 }
@@ -2399,8 +2399,37 @@ class AIAgent:
                base_url=aux_base_url,
                api_key=aux_api_key,
                config_context_length=getattr(self, "_aux_compression_context_length_config", None),
+                provider=getattr(self, "provider", ""),
            )

+            # Also resolve the flush_memories auxiliary model — it may differ
+            # from the compression model when the user configures separate
+            # auxiliary.flush_memories.provider/model, or when the fallback
+            # chain lands on a different provider.  flush_memories runs with
+            # the FULL pre-compression conversation, so its model's context
+            # must also be respected.
+            try:
+                flush_client, flush_model = get_text_auxiliary_client(
+                    "flush_memories",
+                    main_runtime=self._current_main_runtime(),
+                )
+                if flush_client and flush_model:
+                    _flush_ctx = get_model_context_length(
+                        flush_model,
+                        base_url=str(getattr(flush_client, "base_url", "") or ""),
+                        api_key=str(getattr(flush_client, "api_key", "") or ""),
+                        provider=getattr(self, "provider", ""),
+                    )
+                    if _flush_ctx and _flush_ctx < aux_context:
+                        logger.info(
+                            "flush_memories model %s context (%d) < compression "
+                            "model %s context (%d) — using the smaller value",
+                            flush_model, _flush_ctx, aux_model, aux_context,
+                        )
+                        aux_context = _flush_ctx
+            except Exception:
+                pass  # Non-fatal — fall through with compression model's context
+
            # Hard floor: the auxiliary compression model must have at least
            # MINIMUM_CONTEXT_LENGTH (64K) tokens of context.  The main model
            # is already required to meet this floor (checked earlier in
@@ -2420,13 +2449,25 @@ class AIAgent:
                )

            threshold = self.context_compressor.threshold_tokens
-            if aux_context < threshold:
-                # Auto-correct: lower the live session threshold so
-                # compression actually works this session.  The hard floor
-                # above guarantees aux_context >= MINIMUM_CONTEXT_LENGTH,
-                # so the new threshold is always >= 64K.
+
+            # Headroom: the threshold budgets RAW MESSAGES only, but the
+            # actual request auxiliary callers (compression summariser and
+            # flush_memories) send also includes the system prompt and every
+            # tool schema.  We must ensure threshold + headroom <= aux_context
+            # or the first compression/flush request will overflow.
+            #
+            # This applies even when aux_context > threshold (the common
+            # same-model case after a155b4a1) — e.g. 128K context, 85%
+            # threshold = 108K, 20K overhead → 108K + 20K = 128K exactly
+            # at the limit, and any token-estimate variance causes a 400.
+            from agent.model_metadata import estimate_request_tokens_rough
+            tool_overhead = estimate_request_tokens_rough([], tools=self.tools)
+            headroom = tool_overhead + 12_000
+            effective_limit = max(aux_context - headroom, MINIMUM_CONTEXT_LENGTH)
+
+            if effective_limit < threshold:
                old_threshold = threshold
-                new_threshold = aux_context
+                new_threshold = effective_limit
                self.context_compressor.threshold_tokens = new_threshold
                # Keep threshold_percent in sync so future main-model
                # context_length changes (update_model) re-derive from a
@@ -7975,6 +8016,67 @@ class AIAgent:
                messages.pop()  # remove flush msg
                return

+            # ── Defence-in-depth: trim messages to fit auxiliary context ──
+            #
+            # _check_compression_model_feasibility already lowers the
+            # compression threshold so conversations *triggered by preflight
+            # compression* should fit.  But flush_memories is also called
+            # from CLI /new and gateway session resets — paths that bypass
+            # the preflight check entirely.  Trim here as a safety net.
+            try:
+                from agent.auxiliary_client import get_text_auxiliary_client
+                from agent.model_metadata import (
+                    get_model_context_length,
+                    estimate_messages_tokens_rough,
+                )
+                _fc, _fm = get_text_auxiliary_client(
+                    "flush_memories",
+                    main_runtime=self._current_main_runtime(),
+                )
+                _fctx = 0
+                if _fc and _fm:
+                    _fctx = get_model_context_length(
+                        _fm,
+                        base_url=str(getattr(_fc, "base_url", "") or ""),
+                        api_key=str(getattr(_fc, "api_key", "") or ""),
+                        provider=getattr(self, "provider", ""),
+                    )
+                if not _fctx:
+                    _fctx = getattr(
+                        getattr(self, "context_compressor", None),
+                        "context_length", 0,
+                    )
+                if _fctx:
+                    _budget = _fctx - 5120 - 500  # output + tool schema
+                    if _budget > 0:
+                        _est = estimate_messages_tokens_rough(api_messages)
+                        if _est > _budget:
+                            _sys = []
+                            _conv = api_messages
+                            if api_messages and api_messages[0].get("role") == "system":
+                                _sys = [api_messages[0]]
+                                _conv = api_messages[1:]
+                            _rem = _budget - estimate_messages_tokens_rough(_sys)
+                            _kept: list = []
+                            _acc = 0
+                            for _m in reversed(_conv):
+                                _mt = estimate_messages_tokens_rough([_m])
+                                if _acc + _mt > _rem:
+                                    break
+                                _kept.append(_m)
+                                _acc += _mt
+                            _kept.reverse()
+                            if len(_kept) < 3 and len(_conv) >= 3:
+                                _kept = _conv[-3:]
+                            api_messages = _sys + _kept
+                            logger.info(
+                                "flush_memories: trimmed %d→%d msgs to fit "
+                                "%d-token aux context",
+                                len(_sys) + len(_conv), len(api_messages), _fctx,
+                            )
+            except Exception as _te:
+                logger.debug("flush_memories: context trim failed: %s", _te)
+
            # Use auxiliary client for the flush call when available --
            # it's cheaper and avoids Codex Responses API incompatibility.
            from agent.auxiliary_client import (
@@ -8010,17 +8112,20 @@ class AIAgent:
                response = None

            if not _aux_available and self.api_mode == "codex_responses":
-                # No auxiliary client -- use the Codex Responses path directly
+                # No auxiliary client -- use the Codex Responses path directly.
+                # The Responses API does not accept `temperature` on any
+                # supported backend (chatgpt.com/backend-api/codex rejects it
+                # outright; api.openai.com + gpt-5/o-series reasoning models
+                # and Copilot Responses reject it on reasoning models). The
+                # transport intentionally never sets it — strip any leftover
+                # here so the flush fallback matches the main-loop behavior.
                codex_kwargs = self._build_api_kwargs(api_messages)
                _ct_flush = self._get_transport()
                if _ct_flush is not None:
                    codex_kwargs["tools"] = _ct_flush.convert_tools([memory_tool_def])
                elif not codex_kwargs.get("tools"):
                    codex_kwargs["tools"] = [memory_tool_def]
-                if _flush_temperature is not None:
-                    codex_kwargs["temperature"] = _flush_temperature
-                else:
-                    codex_kwargs.pop("temperature", None)
+                codex_kwargs.pop("temperature", None)
                if "max_output_tokens" in codex_kwargs:
                    codex_kwargs["max_output_tokens"] = 5120
                response = self._run_codex_stream(codex_kwargs)
@@ -29,10 +29,25 @@ BOLD='\033[1m'
 REPO_URL_SSH="git@github.com:NousResearch/hermes-agent.git"
 REPO_URL_HTTPS="https://github.com/NousResearch/hermes-agent.git"
 HERMES_HOME="${HERMES_HOME:-$HOME/.hermes}"
-INSTALL_DIR="${HERMES_INSTALL_DIR:-$HERMES_HOME/hermes-agent}"
+# INSTALL_DIR is resolved AFTER arg parsing and OS detection so we can pick an
+# FHS-style layout for root installs.  Track whether the user gave us an
+# explicit directory — if so we never override it.
+if [ -n "${HERMES_INSTALL_DIR:-}" ]; then
+    INSTALL_DIR="$HERMES_INSTALL_DIR"
+    INSTALL_DIR_EXPLICIT=true
+else
+    INSTALL_DIR=""
+    INSTALL_DIR_EXPLICIT=false
+fi
 PYTHON_VERSION="3.11"
 NODE_VERSION="22"

+# FHS-style root install layout (set by resolve_install_layout when applicable):
+#   code at /usr/local/lib/hermes-agent, command at /usr/local/bin/hermes,
+#   data still at /root/.hermes (HERMES_HOME).  Matches Claude Code / Codex CLI
+#   and keeps Docker bind-mounted /root/ volumes lean.
+ROOT_FHS_LAYOUT=false
+
 # Options
 USE_VENV=true
 RUN_SETUP=true
@@ -64,6 +79,7 @@ while [[ $# -gt 0 ]]; do
            ;;
        --dir)
            INSTALL_DIR="$2"
+            INSTALL_DIR_EXPLICIT=true
            shift 2
            ;;
        --hermes-home)
@@ -79,9 +95,20 @@ while [[ $# -gt 0 ]]; do
            echo "  --no-venv      Don't create virtual environment"
            echo "  --skip-setup   Skip interactive setup wizard"
            echo "  --branch NAME  Git branch to install (default: main)"
-            echo "  --dir PATH     Installation directory (default: ~/.hermes/hermes-agent)"
+            echo "  --dir PATH     Installation directory"
+            echo "                   default (non-root):  ~/.hermes/hermes-agent"
+            echo "                   default (root, Linux): /usr/local/lib/hermes-agent"
            echo "  --hermes-home PATH  Data directory (default: ~/.hermes, or \$HERMES_HOME)"
            echo "  -h, --help     Show this help"
+            echo ""
+            echo "Notes:"
+            echo "  When running as root on Linux, Hermes installs the code under"
+            echo "  /usr/local/lib/hermes-agent and links the command into"
+            echo "  /usr/local/bin/hermes (FHS layout — matches Claude Code / Codex CLI)."
+            echo "  Data, config, sessions, and logs still live in \$HERMES_HOME"
+            echo "  (default /root/.hermes).  This keeps Docker bind-mounted volumes"
+            echo "  small and ensures the command is on PATH for all shells."
+            echo "  Existing installs at \$HERMES_HOME/hermes-agent are preserved in-place."
            exit 0
            ;;
        *)
@@ -163,9 +190,60 @@ is_termux() {
    [ -n "${TERMUX_VERSION:-}" ] || [[ "${PREFIX:-}" == *"com.termux/files/usr"* ]]
 }

+# Decide where the repo checkout + venv live, and where the `hermes` command
+# symlink goes.  Called after detect_os so $OS/$DISTRO are known.
+#
+# Defaults:
+#   - Non-root, any OS:       INSTALL_DIR = $HERMES_HOME/hermes-agent
+#                             command link in $HOME/.local/bin
+#   - Termux (any uid):       INSTALL_DIR = $HERMES_HOME/hermes-agent
+#                             command link in $PREFIX/bin (already on PATH)
+#   - Root on Linux (new):    INSTALL_DIR = /usr/local/lib/hermes-agent
+#                             command link in /usr/local/bin
+#                             (unless a legacy install already exists at
+#                              $HERMES_HOME/hermes-agent — then preserve it)
+#
+# Always no-op when the user set --dir or $HERMES_INSTALL_DIR.
+resolve_install_layout() {
+    if [ "$INSTALL_DIR_EXPLICIT" = true ]; then
+        log_info "Install directory: $INSTALL_DIR (explicit)"
+        return 0
+    fi
+
+    # Termux: package manager manages /data/data/..., keep code in HERMES_HOME.
+    if is_termux; then
+        INSTALL_DIR="$HERMES_HOME/hermes-agent"
+        return 0
+    fi
+
+    # Root on Linux: prefer FHS layout unless a legacy install already exists.
+    # macOS root installs keep the legacy layout because /usr/local/ on macOS
+    # is Homebrew territory and we don't want to fight that.
+    if [ "$OS" = "linux" ] && [ "$(id -u)" -eq 0 ]; then
+        if [ -d "$HERMES_HOME/hermes-agent/.git" ]; then
+            INSTALL_DIR="$HERMES_HOME/hermes-agent"
+            log_info "Existing install detected at $INSTALL_DIR — keeping legacy layout"
+            log_info "  (new root installs use /usr/local/lib/hermes-agent)"
+            return 0
+        fi
+        INSTALL_DIR="/usr/local/lib/hermes-agent"
+        ROOT_FHS_LAYOUT=true
+        log_info "Root install on Linux — using FHS layout"
+        log_info "  Code:    $INSTALL_DIR"
+        log_info "  Command: /usr/local/bin/hermes"
+        log_info "  Data:    $HERMES_HOME (unchanged)"
+        return 0
+    fi
+
+    # Default: non-root, non-Termux → legacy user-scoped layout.
+    INSTALL_DIR="$HERMES_HOME/hermes-agent"
+}
+
 get_command_link_dir() {
    if is_termux && [ -n "${PREFIX:-}" ]; then
        echo "$PREFIX/bin"
+    elif [ "$ROOT_FHS_LAYOUT" = true ]; then
+        echo "/usr/local/bin"
    else
        echo "$HOME/.local/bin"
    fi
@@ -174,6 +252,8 @@ get_command_link_dir() {
 get_command_link_display_dir() {
    if is_termux && [ -n "${PREFIX:-}" ]; then
        echo '$PREFIX/bin'
+    elif [ "$ROOT_FHS_LAYOUT" = true ]; then
+        echo '/usr/local/bin'
    else
        echo '~/.local/bin'
    fi
@@ -975,6 +1055,14 @@ setup_path() {
        return 0
    fi

+    # FHS layout: /usr/local/bin is on PATH for every standard shell, nothing to inject.
+    if [ "$ROOT_FHS_LAYOUT" = true ]; then
+        export PATH="$command_link_dir:$PATH"
+        log_info "/usr/local/bin is already on PATH for all shells"
+        log_success "hermes command ready"
+        return 0
+    fi
+
    # Check if ~/.local/bin is on PATH; if not, add it to shell config.
    # Detect the user's actual login shell (not the shell running this script,
    # which is always bash when piped from curl).
@@ -1339,12 +1427,12 @@ print_success() {
    echo ""

    # Show file locations
-    echo -e "${CYAN}${BOLD}📁 Your files (all in ~/.hermes/):${NC}"
+    echo -e "${CYAN}${BOLD}📁 Your files:${NC}"
    echo ""
-    echo -e "   ${YELLOW}Config:${NC}    ~/.hermes/config.yaml"
-    echo -e "   ${YELLOW}API Keys:${NC}  ~/.hermes/.env"
-    echo -e "   ${YELLOW}Data:${NC}      ~/.hermes/cron/, sessions/, logs/"
-    echo -e "   ${YELLOW}Code:${NC}      ~/.hermes/hermes-agent/"
+    echo -e "   ${YELLOW}Config:${NC}    $HERMES_HOME/config.yaml"
+    echo -e "   ${YELLOW}API Keys:${NC}  $HERMES_HOME/.env"
+    echo -e "   ${YELLOW}Data:${NC}      $HERMES_HOME/cron/, sessions/, logs/"
+    echo -e "   ${YELLOW}Code:${NC}      $INSTALL_DIR"
    echo ""

    echo -e "${CYAN}─────────────────────────────────────────────────────────${NC}"
@@ -1364,6 +1452,9 @@ print_success() {
    if [ "$DISTRO" = "termux" ]; then
        echo -e "${YELLOW}⚡ 'hermes' was linked into $(get_command_link_display_dir), which is already on PATH in Termux.${NC}"
        echo ""
+    elif [ "$ROOT_FHS_LAYOUT" = true ]; then
+        echo -e "${YELLOW}⚡ 'hermes' was linked into /usr/local/bin and is ready to use — no shell reload needed.${NC}"
+        echo ""
    else
        echo -e "${YELLOW}⚡ Reload your shell to use 'hermes' command:${NC}"
        echo ""
@@ -1415,6 +1506,7 @@ main() {
    print_banner

    detect_os
+    resolve_install_layout
    install_uv
    check_python
    check_git
@@ -92,6 +92,7 @@ AUTHOR_MAP = {
    "104278804+Sertug17@users.noreply.github.com": "Sertug17",
    "112503481+caentzminger@users.noreply.github.com": "caentzminger",
    "258577966+voidborne-d@users.noreply.github.com": "voidborne-d",
+    "xydarcher@uestc.edu.cn": "Readon",
    "sir_even@icloud.com": "sirEven",
    "36056348+sirEven@users.noreply.github.com": "sirEven",
    "70424851+insecurejezza@users.noreply.github.com": "insecurejezza",
@@ -504,6 +505,7 @@ AUTHOR_MAP = {
    "screenmachine@gmail.com": "teknium1",
    "chenzeshi@live.com": "chen1749144759",
    "mor.aleksandr@yahoo.com": "MorAlekss",
+    "ash@users.noreply.github.com": "ash",
 }


@@ -847,6 +847,32 @@ class TestTokenBudgetTailProtection:
        assert isinstance(pruned, int)


+class TestUpdateModelBudgets:
+    """Regression: update_model() must recalculate token budgets."""
+
+    def test_tail_budget_recalculated(self):
+        """tail_token_budget must change after switching to a different context length."""
+        from unittest.mock import patch
+        with patch("agent.context_compressor.get_model_context_length", return_value=200_000):
+            comp = ContextCompressor("model-a", threshold_percent=0.50, quiet_mode=True)
+        old_tail = comp.tail_token_budget
+        old_max_summary = comp.max_summary_tokens
+
+        comp.update_model("model-b", context_length=32_000)
+        assert comp.tail_token_budget != old_tail, "tail_token_budget should change"
+        assert comp.tail_token_budget < old_tail, "smaller context → smaller budget"
+        assert comp.max_summary_tokens != old_max_summary, "max_summary_tokens should change"
+
+    def test_budgets_proportional(self):
+        """Budgets should be proportional to context_length after update."""
+        from unittest.mock import patch
+        with patch("agent.context_compressor.get_model_context_length", return_value=100_000):
+            comp = ContextCompressor("model-a", threshold_percent=0.50, quiet_mode=True)
+        comp.update_model("model-b", context_length=10_000)
+        assert comp.tail_token_budget == int(comp.threshold_tokens * comp.summary_target_ratio)
+        assert comp.max_summary_tokens == min(int(10_000 * 0.05), 4000)
+
+
 class TestTruncateToolCallArgsJson:
    """Regression tests for #11762.

@@ -0,0 +1,201 @@
+"""Regression tests for the generic unsupported-parameter detector in
+``agent.auxiliary_client``.
+
+The original temperature-specific detector (PR #15621) was generalized so the
+same reactive-retry strategy covers any provider that rejects an arbitrary
+request parameter — ``max_tokens``, ``seed``, ``top_p``, future quirks — not
+just ``temperature``. Credit @nicholasrae (PR #15416) for the generalization
+pattern.
+
+These tests lock in:
+  * ``_is_unsupported_parameter_error(exc, param)`` across common phrasings
+  * the back-compat wrapper ``_is_unsupported_temperature_error`` still works
+  * the max_tokens retry branch no longer pops a key that was never set
+    (``max_tokens is None`` gate)
+  * the max_tokens retry branch matches via the generic helper on top of the
+    legacy ``"max_tokens"`` / ``"unsupported_parameter"`` substring checks
+"""
+
+from unittest.mock import patch, MagicMock, AsyncMock
+
+import pytest
+
+from agent.auxiliary_client import (
+    call_llm,
+    async_call_llm,
+    _is_unsupported_parameter_error,
+    _is_unsupported_temperature_error,
+)
+
+
+class TestIsUnsupportedParameterError:
+    """The generic detector must match real provider phrasings for any param."""
+
+    @pytest.mark.parametrize("param,message", [
+        # temperature phrasings (regression coverage via the generic API)
+        ("temperature", "HTTP 400: Unsupported parameter: temperature"),
+        ("temperature", "Error code: 400 - {'error': {'code': 'unsupported_parameter', 'param': 'temperature'}}"),
+        ("temperature", "this model does not support temperature"),
+        # max_tokens phrasings
+        ("max_tokens", "HTTP 400: Unsupported parameter: max_tokens"),
+        ("max_tokens", "Unknown parameter: max_tokens — use max_completion_tokens"),
+        ("max_tokens", "Invalid parameter: max_tokens is not supported"),
+        # arbitrary future params
+        ("seed", "HTTP 400: unrecognized parameter: seed"),
+        ("top_p", "Error: top_p is not supported for this model"),
+    ])
+    def test_matches_real_provider_messages(self, param, message):
+        assert _is_unsupported_parameter_error(RuntimeError(message), param) is True
+
+    @pytest.mark.parametrize("param,message", [
+        # Param not mentioned at all
+        ("temperature", "HTTP 400: max_tokens is too large"),
+        # Param mentioned but not flagged as unsupported
+        ("temperature", "temperature must be between 0 and 2"),
+        # Totally unrelated 400
+        ("max_tokens", "Rate limit exceeded"),
+        # Connection-level errors
+        ("temperature", "Connection reset by peer"),
+    ])
+    def test_does_not_match_unrelated_errors(self, param, message):
+        assert _is_unsupported_parameter_error(RuntimeError(message), param) is False
+
+    def test_empty_param_returns_false(self):
+        assert _is_unsupported_parameter_error(
+            RuntimeError("HTTP 400: Unsupported parameter: temperature"), ""
+        ) is False
+
+    def test_temperature_wrapper_delegates_to_generic(self):
+        """Back-compat: ``_is_unsupported_temperature_error`` still routes through."""
+        msg = "HTTP 400: Unsupported parameter: temperature"
+        assert _is_unsupported_temperature_error(RuntimeError(msg)) is True
+        # And the unrelated-case still holds
+        assert _is_unsupported_temperature_error(
+            RuntimeError("max_tokens is too large")) is False
+
+
+def _dummy_response():
+    """Sentinel — real code calls ``_validate_llm_response`` which we patch out."""
+    return {"ok": True}
+
+
+class TestMaxTokensRetryHardening:
+    """The max_tokens retry branch now (a) gates on ``max_tokens is not None``
+    and (b) also matches the generic phrasings via the helper.
+    """
+
+    def test_sync_max_tokens_retry_skipped_when_max_tokens_is_none(self):
+        """No max_tokens kwarg → must not pop/retry even if the error mentions it.
+
+        Before the hardening, ``kwargs.pop("max_tokens", None)`` was safe but
+        ``kwargs["max_completion_tokens"] = max_tokens`` would set a None
+        value and hit the provider again. The gate skips the whole branch.
+        """
+        client = MagicMock()
+        client.base_url = "https://api.openai.com/v1"
+        err = RuntimeError("HTTP 400: Unsupported parameter: max_tokens")
+        client.chat.completions.create.side_effect = err
+
+        with (
+            patch("agent.auxiliary_client._resolve_task_provider_model",
+                  return_value=("openai-codex", "gpt-5.5", None, None, None)),
+            patch("agent.auxiliary_client._get_cached_client",
+                  return_value=(client, "gpt-5.5")),
+            patch("agent.auxiliary_client._validate_llm_response",
+                  side_effect=lambda resp, _task: resp),
+        ):
+            with pytest.raises(RuntimeError):
+                call_llm(
+                    task="session_search",
+                    messages=[{"role": "user", "content": "hi"}],
+                    temperature=0.3,
+                    # max_tokens omitted on purpose
+                )
+
+        # Only the initial attempt — no retry because the gate blocked it
+        assert client.chat.completions.create.call_count == 1
+
+    def test_sync_max_tokens_retry_matches_generic_phrasing(self):
+        """A 400 saying "Unknown parameter: max_tokens" (not the legacy
+        substring ``"max_tokens"`` bare + no ``unsupported_parameter`` token)
+        now triggers the retry via the generic helper.
+        """
+        client = MagicMock()
+        client.base_url = "https://api.openai.com/v1"
+        err = RuntimeError("Unknown parameter: max_tokens")
+        response = _dummy_response()
+        client.chat.completions.create.side_effect = [err, response]
+
+        with (
+            patch("agent.auxiliary_client._resolve_task_provider_model",
+                  return_value=("openai-codex", "gpt-5.5", None, None, None)),
+            patch("agent.auxiliary_client._get_cached_client",
+                  return_value=(client, "gpt-5.5")),
+            patch("agent.auxiliary_client._validate_llm_response",
+                  side_effect=lambda resp, _task: resp),
+        ):
+            result = call_llm(
+                task="session_search",
+                messages=[{"role": "user", "content": "hi"}],
+                temperature=0.3,
+                max_tokens=512,
+            )
+
+        assert result is response
+        assert client.chat.completions.create.call_count == 2
+        second_call = client.chat.completions.create.call_args_list[1]
+        assert "max_tokens" not in second_call.kwargs
+        assert second_call.kwargs["max_completion_tokens"] == 512
+
+    @pytest.mark.asyncio
+    async def test_async_max_tokens_retry_skipped_when_max_tokens_is_none(self):
+        client = MagicMock()
+        client.base_url = "https://api.openai.com/v1"
+        err = RuntimeError("HTTP 400: Unsupported parameter: max_tokens")
+        client.chat.completions.create = AsyncMock(side_effect=err)
+
+        with (
+            patch("agent.auxiliary_client._resolve_task_provider_model",
+                  return_value=("openai-codex", "gpt-5.5", None, None, None)),
+            patch("agent.auxiliary_client._get_cached_client",
+                  return_value=(client, "gpt-5.5")),
+            patch("agent.auxiliary_client._validate_llm_response",
+                  side_effect=lambda resp, _task: resp),
+        ):
+            with pytest.raises(RuntimeError):
+                await async_call_llm(
+                    task="session_search",
+                    messages=[{"role": "user", "content": "hi"}],
+                    temperature=0.3,
+                )
+
+        assert client.chat.completions.create.call_count == 1
+
+    @pytest.mark.asyncio
+    async def test_async_max_tokens_retry_matches_generic_phrasing(self):
+        client = MagicMock()
+        client.base_url = "https://api.openai.com/v1"
+        err = RuntimeError("Unknown parameter: max_tokens")
+        response = _dummy_response()
+        client.chat.completions.create = AsyncMock(side_effect=[err, response])
+
+        with (
+            patch("agent.auxiliary_client._resolve_task_provider_model",
+                  return_value=("openai-codex", "gpt-5.5", None, None, None)),
+            patch("agent.auxiliary_client._get_cached_client",
+                  return_value=(client, "gpt-5.5")),
+            patch("agent.auxiliary_client._validate_llm_response",
+                  side_effect=lambda resp, _task: resp),
+        ):
+            result = await async_call_llm(
+                task="session_search",
+                messages=[{"role": "user", "content": "hi"}],
+                temperature=0.3,
+                max_tokens=512,
+            )
+
+        assert result is response
+        assert client.chat.completions.create.await_count == 2
+        second_call = client.chat.completions.create.call_args_list[1]
+        assert "max_tokens" not in second_call.kwargs
+        assert second_call.kwargs["max_completion_tokens"] == 512
@@ -0,0 +1,237 @@
+"""Regression tests for the universal "unsupported temperature" retry in
+``agent.auxiliary_client``.
+
+Auxiliary callers (``flush_memories``, context compression, session search,
+web extract summarisation, etc.) hardcode ``temperature=0.3`` for historical
+reasons. Several provider/model combinations reject ``temperature`` with a
+400:
+
+  * OpenAI Responses (gpt-5/o-series reasoning models)
+  * Copilot Responses (reasoning models)
+  * OpenRouter reasoning models (gpt-5.5, some anthropic via OAI-compat)
+  * Anthropic Opus 4.7+ via OpenAI-compat endpoints
+  * Kimi/Moonshot (server-managed)
+
+``_fixed_temperature_for_model`` catches Kimi up front, and
+``build_chat_completion_kwargs`` drops temperature for Anthropic Opus 4.7+,
+but the same backend can accept ``temperature`` for some models and reject
+it for others (for example gpt-5.4 accepts but gpt-5.5 rejects on the same
+endpoint). An allow/deny-list is not maintainable across providers.
+
+The universal fix is reactive: when a call returns an
+``Unsupported parameter: temperature`` 400, retry once without temperature.
+These tests lock in that behaviour for both sync and async paths.
+"""
+
+from unittest.mock import patch, MagicMock, AsyncMock
+
+import pytest
+
+from agent.auxiliary_client import (
+    call_llm,
+    async_call_llm,
+    _is_unsupported_temperature_error,
+)
+
+
+class TestIsUnsupportedTemperatureError:
+    """The detector must match the phrasings providers actually return."""
+
+    @pytest.mark.parametrize("message", [
+        # OpenAI / Codex Responses
+        "HTTP 400: Unsupported parameter: temperature",
+        "Error code: 400 - {'error': {'message': \"Unsupported parameter: 'temperature'\"}}",
+        # Copilot / OpenAI error-code form
+        "Error code: 400 - {'error': {'code': 'unsupported_parameter', 'param': 'temperature'}}",
+        # OpenRouter-style
+        "Provider returned error: temperature is not supported for this model",
+        "this model does not support temperature",
+        # Anthropic-style via OAI-compat
+        "temperature: unknown parameter",
+        # Some gateways
+        "unrecognized request argument supplied: temperature",
+    ])
+    def test_matches_real_provider_messages(self, message):
+        assert _is_unsupported_temperature_error(RuntimeError(message)) is True
+
+    @pytest.mark.parametrize("message", [
+        # Unrelated 400s must NOT trigger a silent-retry
+        "HTTP 400: Invalid value: 'tool'. Supported values are: 'assistant'...",
+        "max_tokens is too large for this model",
+        "Rate limit exceeded",
+        "Connection reset by peer",
+        # Temperature value error is a different class of problem
+        "temperature must be between 0 and 2",
+    ])
+    def test_does_not_match_unrelated_errors(self, message):
+        assert _is_unsupported_temperature_error(RuntimeError(message)) is False
+
+
+def _dummy_response():
+    # The real code calls _validate_llm_response which inspects
+    # response.choices[0].message.  The tests here patch that out, so
+    # any sentinel object is fine.
+    return {"ok": True}
+
+
+class TestCallLlmUnsupportedTemperatureRetry:
+    """``call_llm`` retries once without temperature and returns on success."""
+
+    def _setup(self, first_exc):
+        client = MagicMock()
+        client.base_url = "https://api.openai.com/v1"
+        client.chat.completions.create.side_effect = [first_exc, _dummy_response()]
+        return client
+
+    @pytest.mark.parametrize("error_message", [
+        "HTTP 400: Unsupported parameter: temperature",
+        "Error code: 400 - {'error': {'code': 'unsupported_parameter', 'param': 'temperature'}}",
+        "Provider error: this model does not support temperature",
+    ])
+    def test_retries_once_without_temperature(self, error_message):
+        client = self._setup(RuntimeError(error_message))
+
+        with (
+            patch("agent.auxiliary_client._resolve_task_provider_model",
+                  return_value=("openai-codex", "gpt-5.5", None, None, None)),
+            patch("agent.auxiliary_client._get_cached_client",
+                  return_value=(client, "gpt-5.5")),
+            patch("agent.auxiliary_client._validate_llm_response",
+                  side_effect=lambda resp, _task: resp),
+        ):
+            result = call_llm(
+                task="flush_memories",
+                messages=[{"role": "user", "content": "remember this"}],
+                temperature=0.3,
+                max_tokens=500,
+            )
+
+        assert result == {"ok": True}
+        assert client.chat.completions.create.call_count == 2
+        first_kwargs = client.chat.completions.create.call_args_list[0].kwargs
+        retry_kwargs = client.chat.completions.create.call_args_list[1].kwargs
+        assert first_kwargs["temperature"] == 0.3
+        assert "temperature" not in retry_kwargs
+        # other kwargs preserved
+        assert retry_kwargs["max_tokens"] == 500
+
+    def test_non_temperature_400_does_not_retry_as_temperature(self):
+        """Unrelated 400s (e.g. bad tool role) must not silently drop temp."""
+        client = MagicMock()
+        client.base_url = "https://api.openai.com/v1"
+        non_temp_err = RuntimeError(
+            "HTTP 400: Invalid value: 'tool'. Supported values are: 'assistant'..."
+        )
+        client.chat.completions.create.side_effect = non_temp_err
+
+        with (
+            patch("agent.auxiliary_client._resolve_task_provider_model",
+                  return_value=("openai-codex", "gpt-5.5", None, None, None)),
+            patch("agent.auxiliary_client._get_cached_client",
+                  return_value=(client, "gpt-5.5")),
+            patch("agent.auxiliary_client._validate_llm_response",
+                  side_effect=lambda resp, _task: resp),
+            patch("agent.auxiliary_client._try_payment_fallback",
+                  return_value=None),
+        ):
+            with pytest.raises(RuntimeError, match="Invalid value"):
+                call_llm(
+                    task="flush_memories",
+                    messages=[{"role": "user", "content": "x"}],
+                    temperature=0.3,
+                    max_tokens=500,
+                )
+        # Should NOT have retried (non-temperature 400 doesn't match)
+        assert client.chat.completions.create.call_count == 1
+
+    def test_no_retry_when_temperature_not_in_kwargs(self):
+        """If caller didn't send temperature, don't invent a temperature-retry."""
+        client = MagicMock()
+        client.base_url = "https://api.openai.com/v1"
+        # Provider complains about temperature even though we didn't send it.
+        # (Pathological but possible with misleading error text.)  The guard
+        # ``"temperature" in kwargs`` must prevent an unnecessary retry.
+        err = RuntimeError("HTTP 400: Unsupported parameter: temperature")
+        client.chat.completions.create.side_effect = err
+
+        with (
+            patch("agent.auxiliary_client._resolve_task_provider_model",
+                  return_value=("openai-codex", "gpt-5.5", None, None, None)),
+            patch("agent.auxiliary_client._get_cached_client",
+                  return_value=(client, "gpt-5.5")),
+            patch("agent.auxiliary_client._validate_llm_response",
+                  side_effect=lambda resp, _task: resp),
+            patch("agent.auxiliary_client._try_payment_fallback",
+                  return_value=None),
+        ):
+            with pytest.raises(RuntimeError):
+                call_llm(
+                    task="flush_memories",
+                    messages=[{"role": "user", "content": "x"}],
+                    temperature=None,  # explicit: no temperature sent
+                    max_tokens=500,
+                )
+        assert client.chat.completions.create.call_count == 1
+
+
+class TestAsyncCallLlmUnsupportedTemperatureRetry:
+    """``async_call_llm`` mirror of the sync retry semantics."""
+
+    @pytest.mark.asyncio
+    async def test_async_retries_once_without_temperature(self):
+        client = MagicMock()
+        client.base_url = "https://api.openai.com/v1"
+        client.chat.completions.create = AsyncMock(side_effect=[
+            RuntimeError("HTTP 400: Unsupported parameter: temperature"),
+            _dummy_response(),
+        ])
+
+        with (
+            patch("agent.auxiliary_client._resolve_task_provider_model",
+                  return_value=("openai-codex", "gpt-5.5", None, None, None)),
+            patch("agent.auxiliary_client._get_cached_client",
+                  return_value=(client, "gpt-5.5")),
+            patch("agent.auxiliary_client._validate_llm_response",
+                  side_effect=lambda resp, _task: resp),
+        ):
+            result = await async_call_llm(
+                task="session_search",
+                messages=[{"role": "user", "content": "query"}],
+                temperature=0.3,
+                max_tokens=500,
+            )
+
+        assert result == {"ok": True}
+        assert client.chat.completions.create.await_count == 2
+        first_kwargs = client.chat.completions.create.call_args_list[0].kwargs
+        retry_kwargs = client.chat.completions.create.call_args_list[1].kwargs
+        assert first_kwargs["temperature"] == 0.3
+        assert "temperature" not in retry_kwargs
+        assert retry_kwargs["max_tokens"] == 500
+
+    @pytest.mark.asyncio
+    async def test_async_non_temperature_400_does_not_retry(self):
+        client = MagicMock()
+        client.base_url = "https://api.openai.com/v1"
+        client.chat.completions.create = AsyncMock(
+            side_effect=RuntimeError("HTTP 400: Invalid value: 'tool'"),
+        )
+
+        with (
+            patch("agent.auxiliary_client._resolve_task_provider_model",
+                  return_value=("openai-codex", "gpt-5.5", None, None, None)),
+            patch("agent.auxiliary_client._get_cached_client",
+                  return_value=(client, "gpt-5.5")),
+            patch("agent.auxiliary_client._validate_llm_response",
+                  side_effect=lambda resp, _task: resp),
+            patch("agent.auxiliary_client._try_payment_fallback",
+                  return_value=None),
+        ):
+            with pytest.raises(RuntimeError, match="Invalid value"):
+                await async_call_llm(
+                    task="session_search",
+                    messages=[{"role": "user", "content": "x"}],
+                    temperature=0.3,
+                    max_tokens=500,
+                )
+        assert client.chat.completions.create.await_count == 1
@@ -0,0 +1,390 @@
+"""Tests for cron job context_from feature (issue #5439 Option C)."""
+
+import sys
+from pathlib import Path
+
+import pytest
+
+sys.path.insert(0, str(Path(__file__).parent.parent.parent))
+
+
+@pytest.fixture
+def cron_env(tmp_path, monkeypatch):
+    """Isolated cron environment with temp HERMES_HOME."""
+    hermes_home = tmp_path / ".hermes"
+    hermes_home.mkdir()
+    (hermes_home / "cron").mkdir()
+    (hermes_home / "cron" / "output").mkdir()
+    monkeypatch.setenv("HERMES_HOME", str(hermes_home))
+
+    import cron.jobs as jobs_mod
+    monkeypatch.setattr(jobs_mod, "HERMES_DIR", hermes_home)
+    monkeypatch.setattr(jobs_mod, "CRON_DIR", hermes_home / "cron")
+    monkeypatch.setattr(jobs_mod, "JOBS_FILE", hermes_home / "cron" / "jobs.json")
+    monkeypatch.setattr(jobs_mod, "OUTPUT_DIR", hermes_home / "cron" / "output")
+
+    return hermes_home
+
+
+class TestJobContextFromField:
+    """Test that context_from is stored and retrieved correctly."""
+
+    def test_create_job_with_context_from_string(self, cron_env):
+        from cron.jobs import create_job, get_job
+
+        job_a = create_job(prompt="Find news", schedule="every 1h")
+        job_b = create_job(
+            prompt="Summarize findings",
+            schedule="every 2h",
+            context_from=job_a["id"],
+        )
+
+        assert job_b["context_from"] == [job_a["id"]]
+        loaded = get_job(job_b["id"])
+        assert loaded["context_from"] == [job_a["id"]]
+
+    def test_create_job_with_context_from_list(self, cron_env):
+        from cron.jobs import create_job, get_job
+
+        job_a = create_job(prompt="Find news", schedule="every 1h")
+        job_b = create_job(prompt="Find weather", schedule="every 1h")
+        job_c = create_job(
+            prompt="Summarize everything",
+            schedule="every 2h",
+            context_from=[job_a["id"], job_b["id"]],
+        )
+
+        assert job_c["context_from"] == [job_a["id"], job_b["id"]]
+
+    def test_create_job_without_context_from(self, cron_env):
+        from cron.jobs import create_job
+
+        job = create_job(prompt="Hello", schedule="every 1h")
+        assert job.get("context_from") is None
+
+    def test_context_from_empty_string_normalized_to_none(self, cron_env):
+        from cron.jobs import create_job
+
+        job = create_job(prompt="Hello", schedule="every 1h", context_from="")
+        assert job.get("context_from") is None
+
+    def test_context_from_empty_list_normalized_to_none(self, cron_env):
+        from cron.jobs import create_job
+
+        job = create_job(prompt="Hello", schedule="every 1h", context_from=[])
+        assert job.get("context_from") is None
+
+
+class TestBuildJobPromptContextFrom:
+    """Test that _build_job_prompt() injects context from referenced jobs."""
+
+    def test_injects_latest_output(self, cron_env):
+        from cron.jobs import create_job, OUTPUT_DIR
+        from cron.scheduler import _build_job_prompt
+
+        job_a = create_job(prompt="Find news", schedule="every 1h")
+
+        # Записываем output для job_a
+        output_dir = OUTPUT_DIR / job_a["id"]
+        output_dir.mkdir(parents=True, exist_ok=True)
+        (output_dir / "2026-04-22_10-00-00.md").write_text(
+            "Today's top story: AI is everywhere.", encoding="utf-8"
+        )
+
+        job_b = create_job(
+            prompt="Summarize the news",
+            schedule="every 2h",
+            context_from=job_a["id"],
+        )
+
+        prompt = _build_job_prompt(job_b)
+        assert "Today's top story: AI is everywhere." in prompt
+        assert f"Output from job '{job_a['id']}'" in prompt
+
+    def test_uses_most_recent_output(self, cron_env):
+        from cron.jobs import create_job, OUTPUT_DIR
+        from cron.scheduler import _build_job_prompt
+        import time
+
+        job_a = create_job(prompt="Find news", schedule="every 1h")
+        output_dir = OUTPUT_DIR / job_a["id"]
+        output_dir.mkdir(parents=True, exist_ok=True)
+
+        old_file = output_dir / "2026-04-22_08-00-00.md"
+        old_file.write_text("Old output", encoding="utf-8")
+        time.sleep(0.01)
+        new_file = output_dir / "2026-04-22_10-00-00.md"
+        new_file.write_text("New output", encoding="utf-8")
+
+        job_b = create_job(
+            prompt="Summarize", schedule="every 2h", context_from=job_a["id"]
+        )
+        prompt = _build_job_prompt(job_b)
+        assert "New output" in prompt
+        assert "Old output" not in prompt
+
+    def test_graceful_when_no_output_yet(self, cron_env):
+        from cron.jobs import create_job
+        from cron.scheduler import _build_job_prompt
+
+        job_a = create_job(prompt="Find news", schedule="every 1h")
+        job_b = create_job(
+            prompt="Summarize", schedule="every 2h", context_from=job_a["id"]
+        )
+
+        # job_a never ran — output dir does not exist
+        # expect silent skip: no placeholder injected, base prompt intact
+        prompt = _build_job_prompt(job_b)
+        assert "no output" not in prompt.lower()
+        assert "not found" not in prompt.lower()
+        assert "Summarize" in prompt
+
+    def test_injects_multiple_context_jobs(self, cron_env):
+        from cron.jobs import create_job, OUTPUT_DIR
+        from cron.scheduler import _build_job_prompt
+
+        job_a = create_job(prompt="Find news", schedule="every 1h")
+        job_b = create_job(prompt="Find weather", schedule="every 1h")
+
+        for job, content in [(job_a, "News: AI boom"), (job_b, "Weather: Sunny")]:
+            out_dir = OUTPUT_DIR / job["id"]
+            out_dir.mkdir(parents=True, exist_ok=True)
+            (out_dir / "2026-04-22_10-00-00.md").write_text(content, encoding="utf-8")
+
+        job_c = create_job(
+            prompt="Daily briefing",
+            schedule="every 2h",
+            context_from=[job_a["id"], job_b["id"]],
+        )
+        prompt = _build_job_prompt(job_c)
+        assert "News: AI boom" in prompt
+        assert "Weather: Sunny" in prompt
+
+    def test_context_injected_before_prompt(self, cron_env):
+        """Context should appear before the job's own prompt."""
+        from cron.jobs import create_job, OUTPUT_DIR
+        from cron.scheduler import _build_job_prompt
+
+        job_a = create_job(prompt="Find data", schedule="every 1h")
+        out_dir = OUTPUT_DIR / job_a["id"]
+        out_dir.mkdir(parents=True, exist_ok=True)
+        (out_dir / "2026-04-22_10-00-00.md").write_text("Context data", encoding="utf-8")
+
+        job_b = create_job(
+            prompt="Process the data above",
+            schedule="every 2h",
+            context_from=job_a["id"],
+        )
+        prompt = _build_job_prompt(job_b)
+        context_pos = prompt.find("Context data")
+        prompt_pos = prompt.find("Process the data above")
+        assert context_pos < prompt_pos
+
+    def test_output_truncated_at_8k_chars(self, cron_env):
+        """Output longer than 8000 chars should be truncated."""
+        from cron.jobs import create_job, OUTPUT_DIR
+        from cron.scheduler import _build_job_prompt
+
+        job_a = create_job(prompt="Find data", schedule="every 1h")
+        out_dir = OUTPUT_DIR / job_a["id"]
+        out_dir.mkdir(parents=True, exist_ok=True)
+        big_output = "x" * 10000
+        (out_dir / "2026-04-22_10-00-00.md").write_text(big_output, encoding="utf-8")
+
+        job_b = create_job(
+            prompt="Process", schedule="every 2h", context_from=job_a["id"]
+        )
+        prompt = _build_job_prompt(job_b)
+        assert "truncated" in prompt
+        assert "x" * 10000 not in prompt
+
+    def test_graceful_when_file_deleted_between_listing_and_reading(self, cron_env):
+        """Job should not crash if output file is deleted mid-read."""
+        from cron.jobs import create_job, OUTPUT_DIR
+        from cron.scheduler import _build_job_prompt
+        from unittest.mock import patch
+
+        job_a = create_job(prompt="Find data", schedule="every 1h")
+        out_dir = OUTPUT_DIR / job_a["id"]
+        out_dir.mkdir(parents=True, exist_ok=True)
+        (out_dir / "2026-04-22_10-00-00.md").write_text("Some output", encoding="utf-8")
+
+        job_b = create_job(
+            prompt="Process", schedule="every 2h", context_from=job_a["id"]
+        )
+
+        # Simulate file deleted between glob() and read_text()
+        original_read = Path.read_text
+        def mock_read_text(self, *args, **kwargs):
+            if self.suffix == ".md":
+                raise FileNotFoundError("file deleted mid-read")
+            return original_read(self, *args, **kwargs)
+
+        with patch.object(Path, "read_text", mock_read_text):
+            prompt = _build_job_prompt(job_b)
+
+        # Job should not crash, prompt should still contain the base prompt
+        assert "Process" in prompt
+
+    def test_graceful_when_permission_error(self, cron_env):
+        """Job should not crash if output directory is not readable."""
+        from cron.jobs import create_job, OUTPUT_DIR
+        from cron.scheduler import _build_job_prompt
+        from unittest.mock import patch
+
+        job_a = create_job(prompt="Find data", schedule="every 1h")
+        out_dir = OUTPUT_DIR / job_a["id"]
+        out_dir.mkdir(parents=True, exist_ok=True)
+        (out_dir / "2026-04-22_10-00-00.md").write_text("Some output", encoding="utf-8")
+
+        job_b = create_job(
+            prompt="Process", schedule="every 2h", context_from=job_a["id"]
+        )
+
+        # Simulate permission error on read
+        original_read = Path.read_text
+        def mock_read_text(self, *args, **kwargs):
+            if self.suffix == ".md":
+                raise PermissionError("permission denied")
+            return original_read(self, *args, **kwargs)
+
+        with patch.object(Path, "read_text", mock_read_text):
+            prompt = _build_job_prompt(job_b)
+
+        # Job should not crash, prompt should still contain the base prompt
+        assert "Process" in prompt
+
+    def test_invalid_job_id_skipped(self, cron_env):
+        """context_from with path traversal job_id should be skipped."""
+        from cron.jobs import create_job
+        from cron.scheduler import _build_job_prompt
+
+        job = create_job(prompt="Process", schedule="every 2h")
+        # Manually inject invalid context_from (simulating tampered jobs.json)
+        job["context_from"] = ["../../../etc/passwd"]
+        prompt = _build_job_prompt(job)
+        # Should not crash and should not inject anything malicious
+        assert "Process" in prompt
+        assert "etc/passwd" not in prompt
+
+
+
+class TestUpdateContextFrom:
+    """Verify the cronjob tool's `update` action wires context_from through.
+
+    Without this, the create-path stores the field but users can never modify
+    or clear it via the tool (schema promises "pass an empty array to clear").
+    """
+
+    def test_update_adds_context_from_to_existing_job(self, cron_env):
+        from cron.jobs import create_job, get_job
+        from tools.cronjob_tools import cronjob
+        import json
+
+        job_a = create_job(prompt="Find news", schedule="every 1h")
+        job_b = create_job(prompt="Summarize", schedule="every 2h")
+        assert job_b.get("context_from") is None
+
+        result = json.loads(cronjob(
+            action="update",
+            job_id=job_b["id"],
+            context_from=job_a["id"],
+        ))
+        assert result["success"] is True
+
+        reloaded = get_job(job_b["id"])
+        assert reloaded["context_from"] == [job_a["id"]]
+
+    def test_update_changes_context_from_reference(self, cron_env):
+        from cron.jobs import create_job, get_job
+        from tools.cronjob_tools import cronjob
+        import json
+
+        job_a = create_job(prompt="Find news", schedule="every 1h")
+        job_a2 = create_job(prompt="Find weather", schedule="every 1h")
+        job_b = create_job(
+            prompt="Summarize", schedule="every 2h", context_from=job_a["id"],
+        )
+        assert job_b["context_from"] == [job_a["id"]]
+
+        result = json.loads(cronjob(
+            action="update",
+            job_id=job_b["id"],
+            context_from=[job_a2["id"]],
+        ))
+        assert result["success"] is True
+        assert get_job(job_b["id"])["context_from"] == [job_a2["id"]]
+
+    def test_update_clears_context_from_with_empty_list(self, cron_env):
+        from cron.jobs import create_job, get_job
+        from tools.cronjob_tools import cronjob
+        import json
+
+        job_a = create_job(prompt="Find news", schedule="every 1h")
+        job_b = create_job(
+            prompt="Summarize", schedule="every 2h", context_from=job_a["id"],
+        )
+        assert get_job(job_b["id"])["context_from"] == [job_a["id"]]
+
+        result = json.loads(cronjob(
+            action="update",
+            job_id=job_b["id"],
+            context_from=[],
+        ))
+        assert result["success"] is True
+        assert get_job(job_b["id"])["context_from"] is None
+
+    def test_update_clears_context_from_with_empty_string(self, cron_env):
+        from cron.jobs import create_job, get_job
+        from tools.cronjob_tools import cronjob
+        import json
+
+        job_a = create_job(prompt="Find news", schedule="every 1h")
+        job_b = create_job(
+            prompt="Summarize", schedule="every 2h", context_from=job_a["id"],
+        )
+
+        result = json.loads(cronjob(
+            action="update",
+            job_id=job_b["id"],
+            context_from="",
+        ))
+        assert result["success"] is True
+        assert get_job(job_b["id"])["context_from"] is None
+
+    def test_update_rejects_unknown_job_reference(self, cron_env):
+        from cron.jobs import create_job
+        from tools.cronjob_tools import cronjob
+        import json
+
+        job_b = create_job(prompt="Summarize", schedule="every 2h")
+
+        result = json.loads(cronjob(
+            action="update",
+            job_id=job_b["id"],
+            context_from=["deadbeef0000"],
+        ))
+        assert result["success"] is False
+        assert "not found" in result["error"]
+
+    def test_update_preserves_context_from_when_not_passed(self, cron_env):
+        """Updating other fields must not clobber context_from."""
+        from cron.jobs import create_job, get_job
+        from tools.cronjob_tools import cronjob
+        import json
+
+        job_a = create_job(prompt="Find news", schedule="every 1h")
+        job_b = create_job(
+            prompt="Summarize", schedule="every 2h", context_from=job_a["id"],
+        )
+
+        # Update an unrelated field
+        result = json.loads(cronjob(
+            action="update",
+            job_id=job_b["id"],
+            prompt="Summarize v2",
+        ))
+        assert result["success"] is True
+        reloaded = get_job(job_b["id"])
+        assert reloaded["prompt"] == "Summarize v2"
+        assert reloaded["context_from"] == [job_a["id"]]
@@ -601,3 +601,189 @@ class TestImagegenModelPicker:
            _configure_imagegen_model("fal", config)
        assert isinstance(config["image_gen"], dict)
        assert config["image_gen"]["model"] == "fal-ai/flux-2/klein/9b"
+
+
+def test_save_platform_tools_normalizes_numeric_entries():
+    """YAML may parse bare numeric toolset names as int. They should be
+    normalized to str so they survive the save round-trip.
+    """
+    config = {
+        "platform_toolsets": {
+            "cli": ["web", "terminal", 12306, "custom-mcp"]
+        }
+    }
+
+    with patch("hermes_cli.tools_config.save_config"):
+        _save_platform_tools(config, "cli", {"web", "browser"})
+
+    saved = config["platform_toolsets"]["cli"]
+    assert "12306" in saved
+    assert 12306 not in saved
+
+
+def test_save_platform_tools_clears_no_mcp_sentinel():
+    """`hermes tools` has no UI for no_mcp, so saving from the picker clears
+    the sentinel unconditionally — otherwise a user who once set no_mcp by
+    hand could never re-enable MCP servers through the UI.
+    """
+    config = {
+        "platform_toolsets": {
+            "cli": ["web", "terminal", "no_mcp"]
+        }
+    }
+
+    with patch("hermes_cli.tools_config.save_config"):
+        _save_platform_tools(config, "cli", {"web", "browser"})
+
+    saved = config["platform_toolsets"]["cli"]
+    assert "no_mcp" not in saved
+
+
+def test_save_platform_tools_preserves_mcp_server_names():
+    """Non-sentinel passthrough entries (MCP server names) must still survive
+    the save — we only clear `no_mcp`, not every non-configurable entry.
+    """
+    config = {
+        "platform_toolsets": {
+            "cli": ["web", "terminal", "custom-mcp", "another-mcp"]
+        }
+    }
+
+    with patch("hermes_cli.tools_config.save_config"):
+        _save_platform_tools(config, "cli", {"web", "browser"})
+
+    saved = config["platform_toolsets"]["cli"]
+    assert "custom-mcp" in saved
+    assert "another-mcp" in saved
+
+
+def test_get_platform_tools_recovers_non_configurable_toolsets_from_composite():
+    """Non-configurable toolsets whose tools are in the composite but not in
+    CONFIGURABLE_TOOLSETS should still appear in the result.
+    """
+    from toolsets import TOOLSETS
+    from hermes_cli.tools_config import PLATFORMS
+    from unittest.mock import patch as mock_patch
+
+    fake_toolsets = dict(TOOLSETS)
+    fake_toolsets["_test_platform_tool"] = {
+        "description": "test",
+        "tools": ["_test_special_tool"],
+        "includes": [],
+    }
+    fake_toolsets["hermes-_test_platform"] = {
+        "description": "test composite",
+        "tools": ["web_search", "web_extract", "terminal", "process", "_test_special_tool"],
+        "includes": [],
+    }
+
+    test_platforms = {
+        "_test_platform": {"label": "Test", "default_toolset": "hermes-_test_platform"},
+    }
+
+    with mock_patch("hermes_cli.tools_config.PLATFORMS", {**PLATFORMS, **test_platforms}):
+        with mock_patch("toolsets.TOOLSETS", fake_toolsets):
+            enabled = _get_platform_tools({}, "_test_platform")
+
+    assert "_test_platform_tool" in enabled
+    assert "web" in enabled
+    assert "terminal" in enabled
+
+
+def test_get_platform_tools_second_pass_skips_fully_claimed_toolsets():
+    """Toolsets whose tools are fully covered by configurable keys should NOT
+    be added by the second pass (prevents 'search', 'hermes-acp' noise).
+    """
+    enabled = _get_platform_tools({}, "cli")
+
+    assert "search" not in enabled
+
+
+def test_get_platform_tools_discord_both_off_by_default():
+    """Both `discord` and `discord_admin` are opt-in via `hermes tools`,
+    even on the Discord platform itself.  Users shouldn't auto-inherit 19
+    extra tools just because DISCORD_BOT_TOKEN is set."""
+    enabled = _get_platform_tools({}, "discord")
+    assert "discord" not in enabled
+    assert "discord_admin" not in enabled
+
+
+def test_discord_toolsets_in_configurable_toolsets():
+    keys = {ts_key for ts_key, _, _ in CONFIGURABLE_TOOLSETS}
+    assert "discord" in keys
+    assert "discord_admin" in keys
+
+
+def test_discord_toolsets_in_default_off():
+    assert "discord" in _DEFAULT_OFF_TOOLSETS
+    assert "discord_admin" in _DEFAULT_OFF_TOOLSETS
+
+
+def test_discord_toolsets_not_available_on_other_platforms():
+    """Platform-scoping: discord / discord_admin should not appear on CLI,
+    Telegram, etc. — not even as an opt-in."""
+    from hermes_cli.tools_config import _toolset_allowed_for_platform
+    for plat in ["cli", "telegram", "slack", "whatsapp", "signal"]:
+        assert not _toolset_allowed_for_platform("discord", plat), (
+            f"`discord` toolset leaked onto {plat}"
+        )
+        assert not _toolset_allowed_for_platform("discord_admin", plat), (
+            f"`discord_admin` toolset leaked onto {plat}"
+        )
+    assert _toolset_allowed_for_platform("discord", "discord")
+    assert _toolset_allowed_for_platform("discord_admin", "discord")
+
+
+def test_discord_toolsets_user_enabled_are_honored():
+    """When the user opts in via `hermes tools`, the toolset appears."""
+    config = {"platform_toolsets": {"discord": ["web", "terminal", "discord"]}}
+    enabled = _get_platform_tools(config, "discord")
+    assert "discord" in enabled
+    assert "discord_admin" not in enabled
+
+
+def test_save_platform_tools_strips_restricted_toolsets():
+    """Hand-edited or all-platforms checklist with `discord` selected for
+    Telegram must be stripped at save time."""
+    from hermes_cli.tools_config import _save_platform_tools
+    config = {}
+    _save_platform_tools(config, "telegram", {"web", "terminal", "discord", "discord_admin"})
+    saved = config["platform_toolsets"]["telegram"]
+    assert "discord" not in saved
+    assert "discord_admin" not in saved
+    assert "web" in saved
+    assert "terminal" in saved
+
+
+def test_get_platform_tools_feishu_includes_doc_and_drive():
+    enabled = _get_platform_tools({}, "feishu")
+    assert "feishu_doc" in enabled
+    assert "feishu_drive" in enabled
+
+
+def test_get_platform_tools_feishu_tools_not_on_other_platforms():
+    for plat in ["cli", "telegram", "discord"]:
+        enabled = _get_platform_tools({}, plat)
+        assert "feishu_doc" not in enabled, f"feishu_doc leaked onto {plat}"
+        assert "feishu_drive" not in enabled, f"feishu_drive leaked onto {plat}"
+
+
+def test_get_effective_configurable_toolsets_dedupes_bundled_plugins():
+    """Bundled plugins (plugins/spotify) share their toolset key with the
+    built-in CONFIGURABLE_TOOLSETS entry. The effective list must not list
+    them twice — otherwise `hermes tools` → "reconfigure existing" shows
+    the same toolset two rows in a row.
+    """
+    from hermes_cli.tools_config import _get_effective_configurable_toolsets
+
+    all_ts = _get_effective_configurable_toolsets()
+    keys = [ts_key for ts_key, _, _ in all_ts]
+    assert len(keys) == len(set(keys)), (
+        f"duplicate toolset keys in effective list: "
+        f"{[k for k in keys if keys.count(k) > 1]}"
+    )
+    # Spotify specifically — the bug that motivated the dedupe.
+    spotify_rows = [t for t in all_ts if t[0] == "spotify"]
+    assert len(spotify_rows) == 1, spotify_rows
+    # Built-in label wins over the plugin label.
+    assert spotify_rows[0][1] == "🎵 Spotify"
@@ -1678,6 +1678,45 @@ class TestDashboardPluginManifestExtensions:
        entry = next(p for p in plugins if p["name"] == "mixed-slots")
        assert entry["slots"] == ["sidebar", "header-right"]

+    def test_page_scoped_slots_preserved(self, tmp_path, monkeypatch):
+        """Page-scoped slot names (e.g. ``sessions:top``) round-trip through
+        the manifest loader untouched.  The backend has no allowlist — the
+        frontend ``<PluginSlot name="...">`` placements decide what actually
+        renders — but the loader must not mangle colons in slot names."""
+        monkeypatch.setenv("HERMES_HOME", str(tmp_path))
+        self._write_plugin(tmp_path, "page-slots", {
+            "name": "page-slots",
+            "label": "Page Slots",
+            "tab": {"path": "/page-slots", "hidden": True},
+            "slots": [
+                "sessions:top",
+                "analytics:bottom",
+                "logs:top",
+                "skills:bottom",
+                "config:top",
+                "env:bottom",
+                "docs:top",
+                "cron:bottom",
+                "chat:top",
+            ],
+            "entry": "dist/index.js",
+        })
+        from hermes_cli import web_server
+        web_server._dashboard_plugins_cache = None
+        plugins = web_server._get_dashboard_plugins(force_rescan=True)
+        entry = next(p for p in plugins if p["name"] == "page-slots")
+        assert entry["slots"] == [
+            "sessions:top",
+            "analytics:bottom",
+            "logs:top",
+            "skills:bottom",
+            "config:top",
+            "env:bottom",
+            "docs:top",
+            "cron:bottom",
+            "chat:top",
+        ]
+

 # ---------------------------------------------------------------------------
 # /api/pty WebSocket — terminal bridge for the dashboard "Chat" tab.
@@ -1925,34 +1964,3 @@ class TestPtyWebSocket:
            ):
                pass
        assert exc.value.code == 4400
-
-
-class TestEnvVarUpdateValidation:
-    """PUT /api/env must reject empty values to prevent .env key destruction."""
-
-    def test_rejects_empty_value(self):
-        from hermes_cli.web_server import EnvVarUpdate
-        import pydantic
-
-        with pytest.raises(pydantic.ValidationError):
-            EnvVarUpdate(key="SOME_KEY", value="")
-
-    def test_rejects_whitespace_only_value(self):
-        from hermes_cli.web_server import EnvVarUpdate
-        import pydantic
-
-        with pytest.raises(pydantic.ValidationError):
-            EnvVarUpdate(key="SOME_KEY", value="   ")
-
-    def test_accepts_nonempty_value(self):
-        from hermes_cli.web_server import EnvVarUpdate
-
-        update = EnvVarUpdate(key="SOME_KEY", value="sk-abc123")
-        assert update.value == "sk-abc123"
-
-    def test_rejects_empty_key(self):
-        from hermes_cli.web_server import EnvVarUpdate
-        import pydantic
-
-        with pytest.raises(pydantic.ValidationError):
-            EnvVarUpdate(key="", value="some-value")
@@ -41,6 +41,9 @@ def _make_agent(
    agent.tool_progress_callback = None
    agent._compression_warning = None
    agent._aux_compression_context_length_config = None
+    # Tools feed into the headroom calculation in _check_compression_model_feasibility.
+    # Tests that want to assert specific threshold values can override this.
+    agent.tools = []

    compressor = MagicMock(spec=ContextCompressor)
    compressor.context_length = main_context
@@ -82,8 +85,9 @@ def test_auto_corrects_threshold_when_aux_context_below_threshold(mock_get_clien
    assert "threshold:" in messages[0]
    # Warning stored for gateway replay
    assert agent._compression_warning is not None
-    # Threshold on the live compressor was actually lowered
-    assert agent.context_compressor.threshold_tokens == 80_000
+    # Threshold on the live compressor was actually lowered, accounting for
+    # the request-overhead headroom (empty tools list → ~12K headroom only).
+    assert agent.context_compressor.threshold_tokens == 68_000


@patch("agent.model_metadata.get_model_context_length", return_value=32_768)
@@ -147,15 +151,14 @@ def test_feasibility_check_passes_live_main_runtime():
        agent._emit_status = lambda msg: None
        agent._check_compression_model_feasibility()

-    mock_get_client.assert_called_once_with(
-        "compression",
-        main_runtime={
-            "model": "gpt-5.4",
-            "provider": "openai-codex",
+    # Called for both compression + flush_memories; verify compression call present
+    assert any(
+        c == (("compression",), {"main_runtime": {
+            "model": "gpt-5.4", "provider": "openai-codex",
            "base_url": "https://chatgpt.com/backend-api/codex",
-            "api_key": "codex-token",
-            "api_mode": "codex_responses",
-        },
+            "api_key": "codex-token", "api_mode": "codex_responses",
+        }})
+        for c in mock_get_client.call_args_list
    )


@@ -175,11 +178,12 @@ def test_feasibility_check_passes_config_context_length(mock_get_client, mock_ct
    agent._emit_status = lambda msg: None
    agent._check_compression_model_feasibility()

-    mock_ctx_len.assert_called_once_with(
-        "custom/big-model",
-        base_url="http://custom-endpoint:8080/v1",
-        api_key="sk-custom",
-        config_context_length=1_000_000,
+    # First call is the compression model
+    assert mock_ctx_len.call_args_list[0] == (
+        ("custom/big-model",),
+        {"base_url": "http://custom-endpoint:8080/v1",
+         "api_key": "sk-custom", "config_context_length": 1_000_000,
+         "provider": "openrouter"},
    )


@@ -197,11 +201,11 @@ def test_feasibility_check_ignores_invalid_context_length(mock_get_client, mock_
    agent._emit_status = lambda msg: None
    agent._check_compression_model_feasibility()

-    mock_ctx_len.assert_called_once_with(
-        "custom/model",
-        base_url="http://custom:8080/v1",
-        api_key="sk-test",
-        config_context_length=None,
+    assert mock_ctx_len.call_args_list[0] == (
+        ("custom/model",),
+        {"base_url": "http://custom:8080/v1",
+         "api_key": "sk-test", "config_context_length": None,
+         "provider": "openrouter"},
    )


@@ -249,12 +253,10 @@ def test_init_feasibility_check_uses_aux_context_override_from_config():
        )

    assert agent._aux_compression_context_length_config == 1_000_000
-    mock_ctx_len.assert_called_once_with(
-        "custom/big-model",
-        base_url="http://custom-endpoint:8080/v1",
-        api_key="sk-custom",
-        config_context_length=1_000_000,
-    )
+    c0 = mock_ctx_len.call_args_list[0]
+    assert c0.args == ("custom/big-model",)
+    assert c0.kwargs["base_url"] == "http://custom-endpoint:8080/v1"
+    assert c0.kwargs["config_context_length"] == 1_000_000


@patch("agent.auxiliary_client.get_text_auxiliary_client")
@@ -304,8 +306,10 @@ def test_exception_does_not_crash(mock_get_client):

@patch("agent.model_metadata.get_model_context_length", return_value=100_000)
@patch("agent.auxiliary_client.get_text_auxiliary_client")
-def test_exact_threshold_boundary_no_warning(mock_get_client, mock_ctx_len):
-    """No warning when aux context exactly equals the threshold."""
+def test_exact_threshold_boundary_triggers_headroom_correction(mock_get_client, mock_ctx_len):
+    """When aux context exactly equals the threshold, headroom deduction
+    still fires — flush_memories adds system prompt + tool schema on top
+    of the conversation messages, so threshold must be lowered."""
    agent = _make_agent(main_context=200_000, threshold_percent=0.50)
    mock_client = MagicMock()
    mock_client.base_url = "https://openrouter.ai/api/v1"
@@ -317,7 +321,10 @@ def test_exact_threshold_boundary_no_warning(mock_get_client, mock_ctx_len):

    agent._check_compression_model_feasibility()

-    assert len(messages) == 0
+    # 100K - headroom < 100K → auto-corrects
+    assert len(messages) == 1
+    assert "Auto-lowered" in messages[0]
+    assert agent.context_compressor.threshold_tokens < 100_000


@patch("agent.model_metadata.get_model_context_length", return_value=99_999)
@@ -339,7 +346,93 @@ def test_just_below_threshold_auto_corrects(mock_get_client, mock_ctx_len):
    assert len(messages) == 1
    assert "small-model" in messages[0]
    assert "Auto-lowered" in messages[0]
-    assert agent.context_compressor.threshold_tokens == 99_999
+    assert agent.context_compressor.threshold_tokens == 87_999
+
+
+# ── Headroom for system prompt + tool schemas ────────────────────────
+
+
+@patch("agent.model_metadata.get_model_context_length", return_value=128_000)
+@patch("agent.auxiliary_client.get_text_auxiliary_client")
+def test_auto_lowered_threshold_reserves_headroom_for_tools_and_system(mock_get_client, mock_ctx_len):
+    """When aux context binds the threshold, new_threshold must leave room
+    for the system prompt and tool schemas that auxiliary callers
+    (compression summariser, flush_memories) prepend to the message list.
+
+    Without headroom, a full-budget message window + ~25K system/tool
+    overhead overflows the aux model with HTTP 400.  Regression guard for
+    the flush_memories-on-busy-toolset overflow path.
+    """
+    # Main context 200K, threshold 70% = 140K.  Aux pins at 128K (below
+    # threshold → triggers auto-correct).
+    agent = _make_agent(main_context=200_000, threshold_percent=0.70)
+
+    # Build a realistic tool schema load.
+    agent.tools = [
+        {
+            "type": "function",
+            "function": {
+                "name": f"tool_{i}",
+                "description": "x" * 200,
+                "parameters": {"type": "object", "properties": {"arg": {"type": "string", "description": "y" * 120}}},
+            },
+        }
+        for i in range(50)
+    ]
+
+    mock_client = MagicMock()
+    mock_client.base_url = "https://openrouter.ai/api/v1"
+    mock_client.api_key = "sk-aux"
+    mock_get_client.return_value = (mock_client, "model-with-128k")
+
+    agent._emit_status = lambda msg: None
+    agent._check_compression_model_feasibility()
+
+    new_threshold = agent.context_compressor.threshold_tokens
+
+    # Must have strictly reserved headroom: new_threshold < aux_context.
+    assert new_threshold < 128_000, (
+        f"threshold {new_threshold} did not reserve headroom below aux=128,000 "
+        f"— system prompt + tools would overflow the aux model"
+    )
+    # Must respect the 64K hard floor.
+    from agent.model_metadata import MINIMUM_CONTEXT_LENGTH
+    assert new_threshold >= MINIMUM_CONTEXT_LENGTH
+
+
+@patch("agent.model_metadata.get_model_context_length", return_value=80_000)
+@patch("agent.auxiliary_client.get_text_auxiliary_client")
+def test_headroom_floors_at_minimum_context(mock_get_client, mock_ctx_len):
+    """If headroom subtraction would push below 64K floor, clamp to 64K
+    rather than refusing the session — the aux is still workable for a
+    smaller message window.
+    """
+    # Aux at 80K, with enough tools to push headroom > 16K → naive subtract
+    # would land at < 64K.  The max(..., MINIMUM_CONTEXT_LENGTH) clamp must
+    # keep the session running.
+    agent = _make_agent(main_context=200_000, threshold_percent=0.50)
+    agent.tools = [
+        {
+            "type": "function",
+            "function": {
+                "name": f"tool_{i}",
+                "description": "z" * 2_000,  # fat descriptions
+                "parameters": {},
+            },
+        }
+        for i in range(30)
+    ]
+
+    mock_client = MagicMock()
+    mock_client.base_url = "https://openrouter.ai/api/v1"
+    mock_client.api_key = "sk-aux"
+    mock_get_client.return_value = (mock_client, "small-aux-model")
+
+    agent._emit_status = lambda msg: None
+    agent._check_compression_model_feasibility()
+
+    from agent.model_metadata import MINIMUM_CONTEXT_LENGTH
+    assert agent.context_compressor.threshold_tokens == MINIMUM_CONTEXT_LENGTH


 # ── Two-phase: __init__ + run_conversation replay ───────────────────
@@ -327,3 +327,72 @@ class TestFlushMemoriesCodexFallback:
        mock_stream.assert_called_once()
        mock_memory.assert_called_once()
        assert mock_memory.call_args.kwargs["content"] == "Codex flush test"
+
+    @pytest.mark.parametrize(
+        "provider,base_url",
+        [
+            # chatgpt.com/backend-api/codex — rejects temperature unconditionally
+            ("openai-codex", "https://chatgpt.com/backend-api/codex"),
+            # Native OpenAI Responses — rejects temperature on gpt-5/o-series reasoning models
+            ("openai", "https://api.openai.com/v1"),
+            # Copilot Responses — rejects temperature on reasoning models
+            ("copilot", "https://api.githubcopilot.com"),
+        ],
+    )
+    def test_codex_fallback_never_sends_temperature(self, monkeypatch, provider, base_url):
+        """Regression for the ``⚠ Auxiliary memory flush failed: HTTP 400:
+        Unsupported parameter: temperature`` error.
+
+        The codex_responses fallback must strip temperature before calling
+        _run_codex_stream — the Responses API does not accept it on any
+        supported backend, matching the transport's behavior."""
+        agent = _make_agent(monkeypatch, api_mode="codex_responses", provider=provider)
+        agent.base_url = base_url
+
+        codex_response = SimpleNamespace(
+            output=[
+                SimpleNamespace(
+                    type="function_call",
+                    call_id="call_1",
+                    name="memory",
+                    arguments=json.dumps({
+                        "action": "add",
+                        "target": "notes",
+                        "content": "no-temp test",
+                    }),
+                ),
+            ],
+            usage=SimpleNamespace(input_tokens=50, output_tokens=10, total_tokens=60),
+            status="completed",
+            model="gpt-5.5",
+        )
+
+        with patch("agent.auxiliary_client.call_llm", side_effect=RuntimeError("no provider")), \
+             patch.object(agent, "_run_codex_stream", return_value=codex_response) as mock_stream, \
+             patch.object(agent, "_build_api_kwargs") as mock_build, \
+             patch("tools.memory_tool.memory_tool", return_value="Saved."):
+            # Simulate a transport that (correctly) never includes temperature,
+            # but also verify we strip any stray temperature the fallback used
+            # to inject before the fix.
+            mock_build.return_value = {
+                "model": "gpt-5.5",
+                "instructions": "test",
+                "input": [],
+                "tools": [],
+                "max_output_tokens": 4096,
+                # Intentionally poison the dict to prove we pop it:
+                "temperature": 0.3,
+            }
+            messages = [
+                {"role": "user", "content": "Hello"},
+                {"role": "assistant", "content": "Hi"},
+                {"role": "user", "content": "Save this"},
+            ]
+            agent.flush_memories(messages)
+
+        mock_stream.assert_called_once()
+        sent_kwargs = mock_stream.call_args.args[0]
+        assert "temperature" not in sent_kwargs, (
+            f"codex_responses fallback must strip temperature before calling "
+            f"_run_codex_stream, got: {sent_kwargs.get('temperature')!r}"
+        )
@@ -0,0 +1,219 @@
+"""Tests for flush_memories context-overflow prevention.
+
+1. _check_compression_model_feasibility now also resolves the
+   flush_memories auxiliary model and uses min(compression, flush) as the
+   effective aux context.
+2. Headroom is always deducted before comparing aux_context vs threshold
+   (not only when aux_context < threshold).
+3. flush_memories() trims oversized conversations before the LLM call as
+   defence-in-depth for paths that bypass preflight compression.
+"""
+
+import sys
+import types
+from types import SimpleNamespace
+from unittest.mock import patch, MagicMock
+
+sys.modules.setdefault("fire", types.SimpleNamespace(Fire=lambda *a, **k: None))
+sys.modules.setdefault("firecrawl", types.SimpleNamespace(Firecrawl=object))
+sys.modules.setdefault("fal_client", types.SimpleNamespace())
+
+import run_agent
+
+
+# ── Helpers ──────────────────────────────────────────────────────────────
+
+
+class _FakeOpenAI:
+    def __init__(self, **kw):
+        self.api_key = kw.get("api_key", "test")
+        self.base_url = kw.get("base_url", "http://test")
+
+    def close(self):
+        pass
+
+
+def _make_agent(monkeypatch, **kw):
+    monkeypatch.setattr(run_agent, "get_tool_definitions", lambda **k: [
+        {"type": "function", "function": {
+            "name": "memory", "description": "m",
+            "parameters": {"type": "object", "properties": {
+                "action": {"type": "string"},
+                "target": {"type": "string"},
+                "content": {"type": "string"},
+            }},
+        }},
+    ])
+    monkeypatch.setattr(run_agent, "check_toolset_requirements", lambda: {})
+    monkeypatch.setattr(run_agent, "OpenAI", _FakeOpenAI)
+    agent = run_agent.AIAgent(
+        api_key="test-key", base_url="https://test.example.com/v1",
+        provider=kw.get("provider", "openrouter"),
+        api_mode=kw.get("api_mode", "chat_completions"),
+        max_iterations=4, quiet_mode=True,
+        skip_context_files=True, skip_memory=True,
+    )
+    agent._memory_store = MagicMock()
+    agent._memory_flush_min_turns = 1
+    agent._user_turn_count = 5
+    return agent
+
+
+def _make_msgs(n, chars=400):
+    return [{"role": "user" if i % 2 == 0 else "assistant",
+             "content": f"M{i}: " + "x" * max(0, chars - 6)}
+            for i in range(n)]
+
+
+def _noop_response():
+    return SimpleNamespace(
+        choices=[SimpleNamespace(
+            finish_reason="stop",
+            message=SimpleNamespace(content="Nothing.", tool_calls=None),
+        )],
+        usage=SimpleNamespace(prompt_tokens=50, completion_tokens=10, total_tokens=60),
+    )
+
+
+# ── Feasibility: flush model + always-deduct headroom ────────────────────
+
+
+class TestFeasibilityFixes:
+
+    def test_smaller_flush_model_lowers_effective_context(self, monkeypatch):
+        """flush_memories model with smaller context drives the threshold."""
+        agent = _make_agent(monkeypatch)
+        agent.context_compressor.context_length = 200_000
+        agent.context_compressor.threshold_tokens = 100_000
+
+        fc = SimpleNamespace(base_url="http://test", api_key="k")
+
+        def _aux(task, **kw):
+            if task == "compression":
+                return fc, "big-model"
+            return fc, "small-flush-model"
+
+        def _ctx(model, **kw):
+            return 200_000 if model == "big-model" else 80_000
+
+        with patch("agent.auxiliary_client.get_text_auxiliary_client", side_effect=_aux), \
+             patch("agent.model_metadata.get_model_context_length", side_effect=_ctx):
+            agent._check_compression_model_feasibility()
+
+        assert agent.context_compressor.threshold_tokens < 100_000
+
+    def test_same_model_overhead_still_triggers_correction(self, monkeypatch):
+        """The primary bug: aux == main model, aux_context > threshold, but
+        threshold + overhead > aux_context.  Headroom must fire even when
+        aux_context >= threshold."""
+        agent = _make_agent(monkeypatch)
+        agent.context_compressor.context_length = 128_000
+        agent.context_compressor.threshold_tokens = 120_000
+
+        fc = SimpleNamespace(base_url="http://test", api_key="k")
+
+        with patch("agent.auxiliary_client.get_text_auxiliary_client",
+                    return_value=(fc, "same-model")), \
+             patch("agent.model_metadata.get_model_context_length",
+                    return_value=128_000):
+            agent._check_compression_model_feasibility()
+
+        # 128K - headroom (~12.1K) ≈ 115.9K < 120K → threshold lowered
+        assert agent.context_compressor.threshold_tokens < 120_000
+
+    def test_flush_resolution_failure_is_non_fatal(self, monkeypatch):
+        """If flush model resolution raises, check proceeds with compression model."""
+        agent = _make_agent(monkeypatch)
+        agent.context_compressor.context_length = 200_000
+        agent.context_compressor.threshold_tokens = 100_000
+
+        fc = SimpleNamespace(base_url="http://test", api_key="k")
+        n = [0]
+
+        def _aux(task, **kw):
+            n[0] += 1
+            if task == "flush_memories":
+                raise RuntimeError("boom")
+            return fc, "model"
+
+        with patch("agent.auxiliary_client.get_text_auxiliary_client", side_effect=_aux), \
+             patch("agent.model_metadata.get_model_context_length", return_value=200_000):
+            agent._check_compression_model_feasibility()
+
+        assert n[0] == 2  # both tasks attempted
+
+
+# ── flush_memories trimming ──────────────────────────────────────────────
+
+
+class TestFlushMemoriesTrimming:
+
+    def test_oversized_conversation_trimmed(self, monkeypatch):
+        agent = _make_agent(monkeypatch)
+        agent._cached_system_prompt = "System."
+        messages = _make_msgs(200, chars=500)
+
+        fc = SimpleNamespace(base_url="http://test", api_key="k")
+        with patch("agent.auxiliary_client.get_text_auxiliary_client",
+                    return_value=(fc, "small")), \
+             patch("agent.model_metadata.get_model_context_length",
+                    return_value=8_000), \
+             patch("agent.auxiliary_client.call_llm",
+                    return_value=_noop_response()) as mock:
+            agent.flush_memories(messages)
+
+        sent = mock.call_args.kwargs.get("messages", [])
+        assert len(sent) < 100
+
+    def test_small_conversation_untouched(self, monkeypatch):
+        agent = _make_agent(monkeypatch)
+        agent._cached_system_prompt = "System."
+        messages = [
+            {"role": "user", "content": "Hi"},
+            {"role": "assistant", "content": "Hey"},
+            {"role": "user", "content": "Save"},
+        ]
+
+        fc = SimpleNamespace(base_url="http://test", api_key="k")
+        with patch("agent.auxiliary_client.get_text_auxiliary_client",
+                    return_value=(fc, "big")), \
+             patch("agent.model_metadata.get_model_context_length",
+                    return_value=200_000), \
+             patch("agent.auxiliary_client.call_llm",
+                    return_value=_noop_response()) as mock:
+            agent.flush_memories(messages)
+
+        sent = mock.call_args.kwargs.get("messages", [])
+        assert len(sent) == 5  # sys + 3 conv + flush
+
+    def test_trim_failure_does_not_block_flush(self, monkeypatch):
+        agent = _make_agent(monkeypatch)
+        messages = _make_msgs(10, chars=100)
+
+        with patch("agent.auxiliary_client.get_text_auxiliary_client",
+                    side_effect=RuntimeError("no provider")), \
+             patch("agent.auxiliary_client.call_llm",
+                    return_value=_noop_response()) as mock:
+            agent.flush_memories(messages)
+            assert mock.called
+
+    def test_sentinel_cleaned_after_trim(self, monkeypatch):
+        agent = _make_agent(monkeypatch)
+        messages = [
+            {"role": "user", "content": "Hi"},
+            {"role": "assistant", "content": "Hey"},
+            {"role": "user", "content": "Save"},
+        ]
+        n = len(messages)
+
+        fc = SimpleNamespace(base_url="http://test", api_key="k")
+        with patch("agent.auxiliary_client.get_text_auxiliary_client",
+                    return_value=(fc, "m")), \
+             patch("agent.model_metadata.get_model_context_length",
+                    return_value=128_000), \
+             patch("agent.auxiliary_client.call_llm",
+                    return_value=_noop_response()):
+            agent.flush_memories(messages)
+
+        assert len(messages) == n
+        assert not any(m.get("_flush_sentinel") for m in messages)
@@ -200,8 +200,8 @@ class TestToolsetConsistency:
    def test_hermes_platforms_share_core_tools(self):
        """All hermes-* platform toolsets share the same core tools.

-        Platform-specific additions (e.g. ``discord_server`` on
-        hermes-discord, gated on DISCORD_BOT_TOKEN) are allowed on top —
+        Platform-specific additions (e.g. ``discord`` / ``discord_admin``
+        on hermes-discord, gated on DISCORD_BOT_TOKEN) are allowed on top —
        the invariant is that the core set is identical across platforms.
        """
        platforms = ["hermes-cli", "hermes-telegram", "hermes-discord", "hermes-whatsapp", "hermes-slack", "hermes-signal", "hermes-homeassistant"]
@@ -2128,5 +2128,103 @@ class TestOrchestratorEndToEnd(unittest.TestCase):
        self.assertFalse(built_agents[2]["is_orchestrator_prompt"])


+class TestSubagentApprovalCallback(unittest.TestCase):
+    """Subagent worker threads must have a non-interactive approval callback
+    installed so dangerous-command prompts don't fall back to input() and
+    deadlock the parent's prompt_toolkit TUI.
+
+    Governed by delegation.subagent_auto_approve:
+      false (default) → _subagent_auto_deny
+      true            → _subagent_auto_approve
+    """
+
+    def test_auto_deny_returns_deny(self):
+        from tools.delegate_tool import _subagent_auto_deny
+        self.assertEqual(
+            _subagent_auto_deny("rm -rf /tmp/x", "dangerous"),
+            "deny",
+        )
+
+    def test_auto_approve_returns_once(self):
+        from tools.delegate_tool import _subagent_auto_approve
+        self.assertEqual(
+            _subagent_auto_approve("rm -rf /tmp/x", "dangerous"),
+            "once",
+        )
+
+    @patch("tools.delegate_tool._load_config", return_value={})
+    def test_getter_defaults_to_deny(self, _mock_cfg):
+        from tools.delegate_tool import (
+            _get_subagent_approval_callback,
+            _subagent_auto_deny,
+        )
+        self.assertIs(_get_subagent_approval_callback(), _subagent_auto_deny)
+
+    @patch(
+        "tools.delegate_tool._load_config",
+        return_value={"subagent_auto_approve": False},
+    )
+    def test_getter_explicit_false_is_deny(self, _mock_cfg):
+        from tools.delegate_tool import (
+            _get_subagent_approval_callback,
+            _subagent_auto_deny,
+        )
+        self.assertIs(_get_subagent_approval_callback(), _subagent_auto_deny)
+
+    @patch(
+        "tools.delegate_tool._load_config",
+        return_value={"subagent_auto_approve": True},
+    )
+    def test_getter_true_is_approve(self, _mock_cfg):
+        from tools.delegate_tool import (
+            _get_subagent_approval_callback,
+            _subagent_auto_approve,
+        )
+        self.assertIs(_get_subagent_approval_callback(), _subagent_auto_approve)
+
+    @patch(
+        "tools.delegate_tool._load_config",
+        return_value={"subagent_auto_approve": "yes"},
+    )
+    def test_getter_truthy_string_is_approve(self, _mock_cfg):
+        """is_truthy_value accepts 'yes'/'1'/'true' as truthy."""
+        from tools.delegate_tool import (
+            _get_subagent_approval_callback,
+            _subagent_auto_approve,
+        )
+        self.assertIs(_get_subagent_approval_callback(), _subagent_auto_approve)
+
+    def test_executor_initializer_installs_callback_in_worker(self):
+        """The initializer sets the callback on the worker thread's TLS,
+        not the parent's — verifies the fix actually scopes to workers.
+        """
+        from concurrent.futures import ThreadPoolExecutor
+        from tools.terminal_tool import (
+            set_approval_callback as _set_cb,
+            _get_approval_callback,
+        )
+        from tools.delegate_tool import _subagent_auto_deny
+
+        # Parent thread has no callback.
+        _set_cb(None)
+        self.assertIsNone(_get_approval_callback())
+
+        seen = []
+
+        def worker():
+            seen.append(_get_approval_callback())
+
+        with ThreadPoolExecutor(
+            max_workers=1,
+            initializer=_set_cb,
+            initargs=(_subagent_auto_deny,),
+        ) as executor:
+            executor.submit(worker).result()
+
+        self.assertEqual(seen, [_subagent_auto_deny])
+        # Parent's callback slot is still empty (TLS isolates threads).
+        self.assertIsNone(_get_approval_callback())
+
+
 if __name__ == "__main__":
    unittest.main()
@@ -11,6 +11,8 @@ import pytest
 from tools.discord_tool import (
    DiscordAPIError,
    _ACTIONS,
+    _ADMIN_ACTIONS,
+    _CORE_ACTIONS,
    _available_actions,
    _build_schema,
    _channel_type_name,
@@ -21,8 +23,11 @@ from tools.discord_tool import (
    _load_allowed_actions_config,
    _reset_capability_cache,
    check_discord_tool_requirements,
-    discord_server,
+    discord_admin_handler,
+    discord_core,
    get_dynamic_schema,
+    get_dynamic_schema_admin,
+    get_dynamic_schema_core,
 )


@@ -147,32 +152,32 @@ class TestDiscordRequest:
 class TestDiscordServerValidation:
    def test_no_token(self, monkeypatch):
        monkeypatch.delenv("DISCORD_BOT_TOKEN", raising=False)
-        result = json.loads(discord_server(action="list_guilds"))
+        result = json.loads(discord_admin_handler(action="list_guilds"))
        assert "error" in result
        assert "DISCORD_BOT_TOKEN" in result["error"]

    def test_unknown_action(self, monkeypatch):
        monkeypatch.setenv("DISCORD_BOT_TOKEN", "test-token")
-        result = json.loads(discord_server(action="bad_action"))
+        result = json.loads(discord_core(action="bad_action"))
        assert "error" in result
        assert "Unknown action" in result["error"]
        assert "available_actions" in result

    def test_missing_required_guild_id(self, monkeypatch):
        monkeypatch.setenv("DISCORD_BOT_TOKEN", "test-token")
-        result = json.loads(discord_server(action="list_channels"))
+        result = json.loads(discord_admin_handler(action="list_channels"))
        assert "error" in result
        assert "guild_id" in result["error"]

    def test_missing_required_channel_id(self, monkeypatch):
        monkeypatch.setenv("DISCORD_BOT_TOKEN", "test-token")
-        result = json.loads(discord_server(action="fetch_messages"))
+        result = json.loads(discord_core(action="fetch_messages"))
        assert "error" in result
        assert "channel_id" in result["error"]

    def test_missing_multiple_params(self, monkeypatch):
        monkeypatch.setenv("DISCORD_BOT_TOKEN", "test-token")
-        result = json.loads(discord_server(action="add_role"))
+        result = json.loads(discord_admin_handler(action="add_role"))
        assert "error" in result
        assert "guild_id" in result["error"]
        assert "user_id" in result["error"]
@@ -191,7 +196,7 @@ class TestListGuilds:
            {"id": "111", "name": "Test Server", "icon": "abc", "owner": True, "permissions": "123"},
            {"id": "222", "name": "Other Server", "icon": None, "owner": False, "permissions": "456"},
        ]
-        result = json.loads(discord_server(action="list_guilds"))
+        result = json.loads(discord_admin_handler(action="list_guilds"))
        assert result["count"] == 2
        assert result["guilds"][0]["name"] == "Test Server"
        assert result["guilds"][1]["id"] == "222"
@@ -219,7 +224,7 @@ class TestServerInfo:
            "premium_subscription_count": 5,
            "verification_level": 1,
        }
-        result = json.loads(discord_server(action="server_info", guild_id="111"))
+        result = json.loads(discord_admin_handler(action="server_info", guild_id="111"))
        assert result["name"] == "My Server"
        assert result["member_count"] == 42
        assert result["online_count"] == 10
@@ -242,7 +247,7 @@ class TestListChannels:
            {"id": "12", "name": "voice", "type": 2, "position": 1, "parent_id": "10", "topic": None, "nsfw": False},
            {"id": "13", "name": "no-category", "type": 0, "position": 0, "parent_id": None, "topic": None, "nsfw": False},
        ]
-        result = json.loads(discord_server(action="list_channels", guild_id="111"))
+        result = json.loads(discord_admin_handler(action="list_channels", guild_id="111"))
        assert result["total_channels"] == 3  # excludes the category itself
        groups = result["channel_groups"]
        # Uncategorized first
@@ -257,7 +262,7 @@ class TestListChannels:
    def test_empty_guild(self, mock_req, monkeypatch):
        monkeypatch.setenv("DISCORD_BOT_TOKEN", "test-token")
        mock_req.return_value = []
-        result = json.loads(discord_server(action="list_channels", guild_id="111"))
+        result = json.loads(discord_admin_handler(action="list_channels", guild_id="111"))
        assert result["total_channels"] == 0


@@ -274,7 +279,7 @@ class TestChannelInfo:
            "topic": "Welcome!", "nsfw": False, "position": 0,
            "parent_id": "10", "rate_limit_per_user": 0, "last_message_id": "999",
        }
-        result = json.loads(discord_server(action="channel_info", channel_id="11"))
+        result = json.loads(discord_admin_handler(action="channel_info", channel_id="11"))
        assert result["name"] == "general"
        assert result["type"] == "text"
        assert result["guild_id"] == "111"
@@ -293,7 +298,7 @@ class TestListRoles:
            {"id": "2", "name": "Admin", "position": 2, "color": 16711680, "mentionable": True, "managed": False, "hoist": True},
            {"id": "3", "name": "Mod", "position": 1, "color": 255, "mentionable": True, "managed": False, "hoist": True},
        ]
-        result = json.loads(discord_server(action="list_roles", guild_id="111"))
+        result = json.loads(discord_admin_handler(action="list_roles", guild_id="111"))
        assert result["count"] == 3
        # Should be sorted by position descending
        assert result["roles"][0]["name"] == "Admin"
@@ -317,7 +322,7 @@ class TestMemberInfo:
            "joined_at": "2024-01-01T00:00:00Z",
            "premium_since": None,
        }
-        result = json.loads(discord_server(action="member_info", guild_id="111", user_id="42"))
+        result = json.loads(discord_admin_handler(action="member_info", guild_id="111", user_id="42"))
        assert result["username"] == "testuser"
        assert result["nickname"] == "Testy"
        assert result["roles"] == ["2", "3"]
@@ -334,7 +339,7 @@ class TestSearchMembers:
        mock_req.return_value = [
            {"user": {"id": "42", "username": "testuser", "global_name": "Test", "bot": False}, "nick": None, "roles": []},
        ]
-        result = json.loads(discord_server(action="search_members", guild_id="111", query="test"))
+        result = json.loads(discord_core(action="search_members", guild_id="111", query="test"))
        assert result["count"] == 1
        assert result["members"][0]["username"] == "testuser"
        mock_req.assert_called_once_with(
@@ -346,7 +351,7 @@ class TestSearchMembers:
    def test_search_members_limit_capped(self, mock_req, monkeypatch):
        monkeypatch.setenv("DISCORD_BOT_TOKEN", "test-token")
        mock_req.return_value = []
-        discord_server(action="search_members", guild_id="111", query="x", limit=200)
+        discord_core(action="search_members", guild_id="111", query="x", limit=200)
        call_params = mock_req.call_args[1]["params"]
        assert call_params["limit"] == "100"  # Capped at 100

@@ -370,7 +375,7 @@ class TestFetchMessages:
                "pinned": False,
            },
        ]
-        result = json.loads(discord_server(action="fetch_messages", channel_id="11"))
+        result = json.loads(discord_core(action="fetch_messages", channel_id="11"))
        assert result["count"] == 1
        assert result["messages"][0]["content"] == "Hello world"
        assert result["messages"][0]["author"]["username"] == "user1"
@@ -379,7 +384,7 @@ class TestFetchMessages:
    def test_fetch_messages_with_pagination(self, mock_req, monkeypatch):
        monkeypatch.setenv("DISCORD_BOT_TOKEN", "test-token")
        mock_req.return_value = []
-        discord_server(action="fetch_messages", channel_id="11", before="999", limit=10)
+        discord_core(action="fetch_messages", channel_id="11", before="999", limit=10)
        call_params = mock_req.call_args[1]["params"]
        assert call_params["before"] == "999"
        assert call_params["limit"] == "10"
@@ -396,7 +401,7 @@ class TestListPins:
        mock_req.return_value = [
            {"id": "500", "content": "Important announcement", "author": {"username": "admin"}, "timestamp": "2024-01-01T00:00:00Z"},
        ]
-        result = json.loads(discord_server(action="list_pins", channel_id="11"))
+        result = json.loads(discord_admin_handler(action="list_pins", channel_id="11"))
        assert result["count"] == 1
        assert result["pinned_messages"][0]["content"] == "Important announcement"

@@ -410,7 +415,7 @@ class TestPinUnpin:
    def test_pin_message(self, mock_req, monkeypatch):
        monkeypatch.setenv("DISCORD_BOT_TOKEN", "test-token")
        mock_req.return_value = None  # 204
-        result = json.loads(discord_server(action="pin_message", channel_id="11", message_id="500"))
+        result = json.loads(discord_admin_handler(action="pin_message", channel_id="11", message_id="500"))
        assert result["success"] is True
        mock_req.assert_called_once_with("PUT", "/channels/11/pins/500", "test-token")

@@ -418,7 +423,7 @@ class TestPinUnpin:
    def test_unpin_message(self, mock_req, monkeypatch):
        monkeypatch.setenv("DISCORD_BOT_TOKEN", "test-token")
        mock_req.return_value = None
-        result = json.loads(discord_server(action="unpin_message", channel_id="11", message_id="500"))
+        result = json.loads(discord_admin_handler(action="unpin_message", channel_id="11", message_id="500"))
        assert result["success"] is True


@@ -431,7 +436,7 @@ class TestCreateThread:
    def test_create_standalone_thread(self, mock_req, monkeypatch):
        monkeypatch.setenv("DISCORD_BOT_TOKEN", "test-token")
        mock_req.return_value = {"id": "800", "name": "New Thread"}
-        result = json.loads(discord_server(action="create_thread", channel_id="11", name="New Thread"))
+        result = json.loads(discord_core(action="create_thread", channel_id="11", name="New Thread"))
        assert result["success"] is True
        assert result["thread_id"] == "800"
        # Verify the API call
@@ -444,7 +449,7 @@ class TestCreateThread:
    def test_create_thread_from_message(self, mock_req, monkeypatch):
        monkeypatch.setenv("DISCORD_BOT_TOKEN", "test-token")
        mock_req.return_value = {"id": "801", "name": "Discussion"}
-        result = json.loads(discord_server(
+        result = json.loads(discord_core(
            action="create_thread", channel_id="11", name="Discussion", message_id="1001",
        ))
        assert result["success"] is True
@@ -463,7 +468,7 @@ class TestRoleManagement:
    def test_add_role(self, mock_req, monkeypatch):
        monkeypatch.setenv("DISCORD_BOT_TOKEN", "test-token")
        mock_req.return_value = None
-        result = json.loads(discord_server(
+        result = json.loads(discord_admin_handler(
            action="add_role", guild_id="111", user_id="42", role_id="2",
        ))
        assert result["success"] is True
@@ -475,7 +480,7 @@ class TestRoleManagement:
    def test_remove_role(self, mock_req, monkeypatch):
        monkeypatch.setenv("DISCORD_BOT_TOKEN", "test-token")
        mock_req.return_value = None
-        result = json.loads(discord_server(
+        result = json.loads(discord_admin_handler(
            action="remove_role", guild_id="111", user_id="42", role_id="2",
        ))
        assert result["success"] is True
@@ -490,15 +495,23 @@ class TestErrorHandling:
    def test_api_error_handled(self, mock_req, monkeypatch):
        monkeypatch.setenv("DISCORD_BOT_TOKEN", "test-token")
        mock_req.side_effect = DiscordAPIError(403, '{"message": "Missing Access"}')
-        result = json.loads(discord_server(action="list_guilds"))
+        result = json.loads(discord_admin_handler(action="list_guilds"))
        assert "error" in result
        assert "403" in result["error"]

    @patch("tools.discord_tool._discord_request")
-    def test_unexpected_error_handled(self, mock_req, monkeypatch):
+    def test_unexpected_error_handled_admin(self, mock_req, monkeypatch):
        monkeypatch.setenv("DISCORD_BOT_TOKEN", "test-token")
        mock_req.side_effect = RuntimeError("something broke")
-        result = json.loads(discord_server(action="list_guilds"))
+        result = json.loads(discord_admin_handler(action="list_guilds"))
+        assert "error" in result
+        assert "something broke" in result["error"]
+
+    @patch("tools.discord_tool._discord_request")
+    def test_unexpected_error_handled_core(self, mock_req, monkeypatch):
+        monkeypatch.setenv("DISCORD_BOT_TOKEN", "test-token")
+        mock_req.side_effect = RuntimeError("something broke")
+        result = json.loads(discord_core(action="fetch_messages", channel_id="11"))
        assert "error" in result
        assert "something broke" in result["error"]

@@ -508,79 +521,109 @@ class TestErrorHandling:
 # ---------------------------------------------------------------------------

 class TestRegistration:
-    def test_tool_registered(self):
+    def test_core_tool_registered(self):
        from tools.registry import registry
-        entry = registry._tools.get("discord_server")
+        entry = registry._tools.get("discord")
        assert entry is not None
-        assert entry.schema["name"] == "discord_server"
+        assert entry.schema["name"] == "discord"
        assert entry.toolset == "discord"
        assert entry.check_fn is not None
        assert entry.requires_env == ["DISCORD_BOT_TOKEN"]

-    def test_schema_actions(self):
-        """Static schema should list all actions (the model_tools post-processing
-        narrows this per-session; static registration is the superset)."""
+    def test_admin_tool_registered(self):
        from tools.registry import registry
-        entry = registry._tools["discord_server"]
-        actions = entry.schema["parameters"]["properties"]["action"]["enum"]
-        expected = [
-            "list_guilds", "server_info", "list_channels", "channel_info",
-            "list_roles", "member_info", "search_members", "fetch_messages",
-            "list_pins", "pin_message", "unpin_message", "create_thread",
-            "add_role", "remove_role",
-        ]
-        assert set(actions) == set(expected)
-        assert set(_ACTIONS.keys()) == set(expected)
+        entry = registry._tools.get("discord_admin")
+        assert entry is not None
+        assert entry.schema["name"] == "discord_admin"
+        assert entry.toolset == "discord_admin"
+        assert entry.check_fn is not None
+        assert entry.requires_env == ["DISCORD_BOT_TOKEN"]
+
+    def test_core_schema_actions(self):
+        """Core static schema should list only core actions."""
+        from tools.registry import registry
+        entry = registry._tools["discord"]
+        actions = set(entry.schema["parameters"]["properties"]["action"]["enum"])
+        assert actions == {"fetch_messages", "search_members", "create_thread"}
+
+    def test_admin_schema_actions(self):
+        """Admin static schema should list only admin actions."""
+        from tools.registry import registry
+        entry = registry._tools["discord_admin"]
+        actions = set(entry.schema["parameters"]["properties"]["action"]["enum"])
+        expected_admin = set(_ACTIONS.keys()) - {"fetch_messages", "search_members", "create_thread"}
+        assert actions == expected_admin
+
+    def test_all_actions_covered(self):
+        """Core + admin actions should cover all known actions."""
+        assert set(_CORE_ACTIONS.keys()) | set(_ADMIN_ACTIONS.keys()) == set(_ACTIONS.keys())
+        assert set(_CORE_ACTIONS.keys()) & set(_ADMIN_ACTIONS.keys()) == set()

    def test_schema_parameter_bounds(self):
        from tools.registry import registry
-        entry = registry._tools["discord_server"]
+        entry = registry._tools["discord"]
        props = entry.schema["parameters"]["properties"]
        assert props["limit"]["minimum"] == 1
        assert props["limit"]["maximum"] == 100
        assert props["auto_archive_duration"]["enum"] == [60, 1440, 4320, 10080]

-    def test_schema_description_is_action_manifest(self):
-        """The top-level description should include the action manifest
-        (one-line signatures per action) so the model can find required
-        params without re-reading every parameter description."""
+    def test_core_schema_description(self):
+        """Core schema description should mention core actions."""
        from tools.registry import registry
-        entry = registry._tools["discord_server"]
+        entry = registry._tools["discord"]
        desc = entry.schema["description"]
-        # Spot-check a few entries
-        assert "list_guilds()" in desc
        assert "fetch_messages(channel_id)" in desc
+        assert "search_members(guild_id, query)" in desc
+        assert "create_thread(channel_id, name)" in desc
+        # Admin actions should NOT be in core description
+        assert "list_guilds()" not in desc
+        assert "add_role(" not in desc
+
+    def test_admin_schema_description(self):
+        """Admin schema description should mention admin actions."""
+        from tools.registry import registry
+        entry = registry._tools["discord_admin"]
+        desc = entry.schema["description"]
+        assert "list_guilds()" in desc
        assert "add_role(guild_id, user_id, role_id)" in desc
+        # Core actions should NOT be in admin description
+        assert "fetch_messages(" not in desc
+        assert "create_thread(" not in desc

    def test_handler_callable(self):
        from tools.registry import registry
-        entry = registry._tools["discord_server"]
+        entry = registry._tools["discord"]
        assert callable(entry.handler)
+        entry_admin = registry._tools["discord_admin"]
+        assert callable(entry_admin.handler)


 # ---------------------------------------------------------------------------
-# Toolset: discord_server only in hermes-discord
+# Toolset: discord / discord_admin only in hermes-discord
 # ---------------------------------------------------------------------------

 class TestToolsetInclusion:
-    def test_discord_server_in_hermes_discord_toolset(self):
+    def test_discord_tools_in_hermes_discord_toolset(self):
        from toolsets import TOOLSETS
-        assert "discord_server" in TOOLSETS["hermes-discord"]["tools"]
+        assert "discord" in TOOLSETS["hermes-discord"]["tools"]
+        assert "discord_admin" in TOOLSETS["hermes-discord"]["tools"]

-    def test_discord_server_not_in_core_tools(self):
+    def test_discord_tools_not_in_core_tools(self):
        from toolsets import _HERMES_CORE_TOOLS
-        assert "discord_server" not in _HERMES_CORE_TOOLS
+        assert "discord" not in _HERMES_CORE_TOOLS
+        assert "discord_admin" not in _HERMES_CORE_TOOLS

-    def test_discord_server_not_in_other_toolsets(self):
+    def test_discord_tools_not_in_other_toolsets(self):
        from toolsets import TOOLSETS
        for name, ts in TOOLSETS.items():
-            if name == "hermes-discord":
+            if name in ("hermes-discord", "hermes-gateway", "discord", "discord_admin"):
                continue
-            # The gateway toolset might include it if it unions all platform tools
-            if name == "hermes-gateway":
-                continue
-            assert "discord_server" not in ts.get("tools", []), (
-                f"discord_server should not be in toolset '{name}'"
+            tools = ts.get("tools", [])
+            assert "discord" not in tools or name == "discord", (
+                f"discord tool should not be in toolset '{name}'"
+            )
+            assert "discord_admin" not in tools or name == "discord_admin", (
+                f"discord_admin tool should not be in toolset '{name}'"
            )


@@ -798,40 +841,69 @@ class TestDynamicSchema:
    @patch("tools.discord_tool._discord_request")
    def test_no_token_returns_none(self, mock_req, monkeypatch):
        monkeypatch.delenv("DISCORD_BOT_TOKEN", raising=False)
-        assert get_dynamic_schema() is None
+        assert get_dynamic_schema_core() is None
+        assert get_dynamic_schema_admin() is None
        mock_req.assert_not_called()

    @patch("tools.discord_tool._discord_request")
-    def test_full_intents_full_schema(self, mock_req, monkeypatch):
+    def test_full_intents_core_schema(self, mock_req, monkeypatch):
        monkeypatch.setenv("DISCORD_BOT_TOKEN", "tok")
        monkeypatch.setattr(
            "hermes_cli.config.load_config",
            lambda: {"discord": {"server_actions": ""}},
        )
        mock_req.return_value = {"flags": (1 << 14) | (1 << 18)}
-        schema = get_dynamic_schema()
-        actions = schema["parameters"]["properties"]["action"]["enum"]
-        assert set(actions) == set(_ACTIONS.keys())
-        # No content warning
+        schema = get_dynamic_schema_core()
+        actions = set(schema["parameters"]["properties"]["action"]["enum"])
+        assert actions == set(_CORE_ACTIONS.keys())
+        assert schema["name"] == "discord"
+
+    @patch("tools.discord_tool._discord_request")
+    def test_full_intents_admin_schema(self, mock_req, monkeypatch):
+        monkeypatch.setenv("DISCORD_BOT_TOKEN", "tok")
+        monkeypatch.setattr(
+            "hermes_cli.config.load_config",
+            lambda: {"discord": {"server_actions": ""}},
+        )
+        mock_req.return_value = {"flags": (1 << 14) | (1 << 18)}
+        schema = get_dynamic_schema_admin()
+        actions = set(schema["parameters"]["properties"]["action"]["enum"])
+        assert actions == set(_ADMIN_ACTIONS.keys())
+        assert schema["name"] == "discord_admin"
+        # No content warning when MESSAGE_CONTENT is enabled
        assert "MESSAGE_CONTENT" not in schema["description"]

    @patch("tools.discord_tool._discord_request")
-    def test_no_members_intent_removes_member_actions_from_schema(
+    def test_no_members_intent_removes_member_actions_from_admin_schema(
        self, mock_req, monkeypatch,
    ):
+        """member_info is an admin action; it should be hidden when
+        GUILD_MEMBERS intent is missing."""
        monkeypatch.setenv("DISCORD_BOT_TOKEN", "tok")
        monkeypatch.setattr(
            "hermes_cli.config.load_config",
            lambda: {"discord": {"server_actions": ""}},
        )
        mock_req.return_value = {"flags": 1 << 18}  # only MESSAGE_CONTENT
-        schema = get_dynamic_schema()
+        schema = get_dynamic_schema_admin()
+        actions = schema["parameters"]["properties"]["action"]["enum"]
+        assert "member_info" not in actions
+        assert "member_info" not in schema["description"]
+
+    @patch("tools.discord_tool._discord_request")
+    def test_no_members_intent_hides_search_members_from_core(
+        self, mock_req, monkeypatch,
+    ):
+        """search_members is a core action gated by GUILD_MEMBERS intent."""
+        monkeypatch.setenv("DISCORD_BOT_TOKEN", "tok")
+        monkeypatch.setattr(
+            "hermes_cli.config.load_config",
+            lambda: {"discord": {"server_actions": ""}},
+        )
+        mock_req.return_value = {"flags": 1 << 18}  # only MESSAGE_CONTENT
+        schema = get_dynamic_schema_core()
        actions = schema["parameters"]["properties"]["action"]["enum"]
        assert "search_members" not in actions
-        assert "member_info" not in actions
-        # Manifest description should also not advertise them
-        assert "search_members" not in schema["description"]
-        assert "member_info" not in schema["description"]

    @patch("tools.discord_tool._discord_request")
    def test_no_message_content_adds_warning_note(self, mock_req, monkeypatch):
@@ -841,41 +913,53 @@ class TestDynamicSchema:
            lambda: {"discord": {"server_actions": ""}},
        )
        mock_req.return_value = {"flags": 1 << 14}  # only GUILD_MEMBERS
-        schema = get_dynamic_schema()
+        schema = get_dynamic_schema_core()
        assert "MESSAGE_CONTENT" in schema["description"]
        # But fetch_messages is still available
        actions = schema["parameters"]["properties"]["action"]["enum"]
        assert "fetch_messages" in actions

    @patch("tools.discord_tool._discord_request")
-    def test_config_allowlist_narrows_schema(self, mock_req, monkeypatch):
+    def test_config_allowlist_narrows_admin_schema(self, mock_req, monkeypatch):
        monkeypatch.setenv("DISCORD_BOT_TOKEN", "tok")
        monkeypatch.setattr(
            "hermes_cli.config.load_config",
            lambda: {"discord": {"server_actions": "list_guilds,list_channels"}},
        )
        mock_req.return_value = {"flags": (1 << 14) | (1 << 18)}
-        schema = get_dynamic_schema()
+        schema = get_dynamic_schema_admin()
        actions = schema["parameters"]["properties"]["action"]["enum"]
        assert actions == ["list_guilds", "list_channels"]
-        # Manifest description should only show allowed ones (check for
-        # the signature marker, which is specific to manifest lines)
        assert "list_guilds()" in schema["description"]
        assert "add_role(" not in schema["description"]
-        assert "create_thread(" not in schema["description"]

    @patch("tools.discord_tool._discord_request")
-    def test_empty_allowlist_with_valid_values_hides_tool(self, mock_req, monkeypatch):
+    def test_empty_allowlist_with_valid_values_hides_tools(self, mock_req, monkeypatch):
        """If the allowlist resolves to zero valid actions (e.g. all names
-        were typos), get_dynamic_schema returns None so the tool is dropped
-        entirely rather than showing an empty enum."""
+        were typos), get_dynamic_schema returns None so the tool is dropped."""
        monkeypatch.setenv("DISCORD_BOT_TOKEN", "tok")
        monkeypatch.setattr(
            "hermes_cli.config.load_config",
            lambda: {"discord": {"server_actions": "typo_one,typo_two"}},
        )
        mock_req.return_value = {"flags": (1 << 14) | (1 << 18)}
-        assert get_dynamic_schema() is None
+        assert get_dynamic_schema_core() is None
+        assert get_dynamic_schema_admin() is None
+
+    @patch("tools.discord_tool._discord_request")
+    def test_backward_compat_wrapper(self, mock_req, monkeypatch):
+        """get_dynamic_schema() should delegate to get_dynamic_schema_core()."""
+        monkeypatch.setenv("DISCORD_BOT_TOKEN", "tok")
+        monkeypatch.setattr(
+            "hermes_cli.config.load_config",
+            lambda: {"discord": {"server_actions": ""}},
+        )
+        mock_req.return_value = {"flags": (1 << 14) | (1 << 18)}
+        schema = get_dynamic_schema()
+        assert schema is not None
+        assert schema["name"] == "discord"
+        actions = set(schema["parameters"]["properties"]["action"]["enum"])
+        assert actions == set(_CORE_ACTIONS.keys())


 # ---------------------------------------------------------------------------
@@ -890,7 +974,7 @@ class TestRuntimeAllowlistEnforcement:
            "hermes_cli.config.load_config",
            lambda: {"discord": {"server_actions": "list_guilds"}},
        )
-        result = json.loads(discord_server(action="add_role", guild_id="1", user_id="2", role_id="3"))
+        result = json.loads(discord_admin_handler(action="add_role", guild_id="1", user_id="2", role_id="3"))
        assert "error" in result
        assert "disabled by config" in result["error"]
        mock_req.assert_not_called()
@@ -903,7 +987,7 @@ class TestRuntimeAllowlistEnforcement:
            lambda: {"discord": {"server_actions": "list_guilds"}},
        )
        mock_req.return_value = []
-        result = json.loads(discord_server(action="list_guilds"))
+        result = json.loads(discord_admin_handler(action="list_guilds"))
        assert "guilds" in result


@@ -930,7 +1014,7 @@ class Test403Enrichment:
            lambda: {"discord": {"server_actions": ""}},
        )
        mock_req.side_effect = DiscordAPIError(403, '{"message":"Missing Permissions"}')
-        result = json.loads(discord_server(
+        result = json.loads(discord_admin_handler(
            action="add_role", guild_id="1", user_id="2", role_id="3",
        ))
        assert "error" in result
@@ -944,7 +1028,7 @@ class Test403Enrichment:
            lambda: {"discord": {"server_actions": ""}},
        )
        mock_req.side_effect = DiscordAPIError(500, "server error")
-        result = json.loads(discord_server(action="list_guilds"))
+        result = json.loads(discord_admin_handler(action="list_guilds"))
        assert "500" in result["error"]
        assert "MANAGE_ROLES" not in result["error"]

@@ -961,10 +1045,10 @@ class TestModelToolsIntegration:
        _reset_capability_cache()

    @patch("tools.discord_tool._discord_request")
-    def test_discord_server_schema_rebuilt_by_get_tool_definitions(
+    def test_discord_admin_schema_rebuilt_by_get_tool_definitions(
        self, mock_req, monkeypatch,
    ):
-        """When model_tools.get_tool_definitions runs with discord_server
+        """When model_tools.get_tool_definitions runs with discord_admin
        available, it should replace the static schema with the dynamic one."""
        monkeypatch.setenv("DISCORD_BOT_TOKEN", "tok")
        monkeypatch.setattr(
@@ -976,16 +1060,16 @@ class TestModelToolsIntegration:

        from model_tools import get_tool_definitions
        tools = get_tool_definitions(enabled_toolsets=["hermes-discord"], quiet_mode=True)
-        discord_tool = next(
-            (t for t in tools if t.get("function", {}).get("name") == "discord_server"),
+        discord_admin_tool = next(
+            (t for t in tools if t.get("function", {}).get("name") == "discord_admin"),
            None,
        )
-        assert discord_tool is not None, "discord_server should be in the schema"
-        actions = discord_tool["function"]["parameters"]["properties"]["action"]["enum"]
+        assert discord_admin_tool is not None, "discord_admin should be in the schema"
+        actions = discord_admin_tool["function"]["parameters"]["properties"]["action"]["enum"]
        assert actions == ["list_guilds", "server_info"]

    @patch("tools.discord_tool._discord_request")
-    def test_discord_server_dropped_when_allowlist_empties_it(
+    def test_discord_tools_dropped_when_allowlist_empties_them(
        self, mock_req, monkeypatch,
    ):
        monkeypatch.setenv("DISCORD_BOT_TOKEN", "tok")
@@ -998,4 +1082,6 @@ class TestModelToolsIntegration:
        from model_tools import get_tool_definitions
        tools = get_tool_definitions(enabled_toolsets=["hermes-discord"], quiet_mode=True)
        names = [t.get("function", {}).get("name") for t in tools]
+        assert "discord" not in names
+        assert "discord_admin" not in names
        assert "discord_server" not in names
@@ -19,9 +19,11 @@ from unittest.mock import patch
 from tools.process_registry import (
    ProcessRegistry,
    ProcessSession,
-    WATCH_MAX_PER_WINDOW,
-    WATCH_WINDOW_SECONDS,
-    WATCH_OVERLOAD_KILL_SECONDS,
+    WATCH_MIN_INTERVAL_SECONDS,
+    WATCH_STRIKE_LIMIT,
+    WATCH_GLOBAL_MAX_PER_WINDOW,
+    WATCH_GLOBAL_WINDOW_SECONDS,
+    WATCH_GLOBAL_COOLDOWN_SECONDS,
 )


@@ -129,10 +131,15 @@ class TestCheckWatchPatterns:
        assert registry.completion_queue.empty()

    def test_hit_counter_increments(self, registry):
-        """Each delivered notification increments _watch_hits."""
+        """Each delivered notification increments _watch_hits.
+
+        With 1/15s rate limit, we need to reset cooldown between calls.
+        """
        session = _make_session(watch_patterns=["X"])
        registry._check_watch_patterns(session, "X\n")
        assert session._watch_hits == 1
+        # Reset cooldown so the second match gets delivered.
+        session._watch_cooldown_until = 0.0
        registry._check_watch_patterns(session, "X\n")
        assert session._watch_hits == 2

@@ -148,100 +155,114 @@ class TestCheckWatchPatterns:


 # =========================================================================
-# Rate limiting
+# Per-session rate limiting: 1 notification per 15s, 3 strikes → disable
 # =========================================================================

-class TestRateLimiting:
-    def test_within_window_limit(self, registry):
-        """Notifications within the rate limit all get delivered."""
+class TestPerSessionRateLimit:
+    def test_first_match_delivers(self, registry):
+        """A fresh session with no prior cooldown delivers the first match."""
        session = _make_session(watch_patterns=["E"])
-        for i in range(WATCH_MAX_PER_WINDOW):
-            registry._check_watch_patterns(session, f"E {i}\n")
-        assert registry.completion_queue.qsize() == WATCH_MAX_PER_WINDOW
+        registry._check_watch_patterns(session, "E first\n")
+        assert registry.completion_queue.qsize() == 1
+        evt = registry.completion_queue.get_nowait()
+        assert evt["type"] == "watch_match"
+        assert session._watch_hits == 1
+        # Cooldown is now armed.
+        assert session._watch_cooldown_until > 0

-    def test_exceeds_window_limit(self, registry):
-        """Notifications beyond the rate limit are suppressed."""
+    def test_second_match_within_cooldown_is_suppressed(self, registry):
+        """A second match inside the 15s cooldown is dropped and counted."""
        session = _make_session(watch_patterns=["E"])
-        for i in range(WATCH_MAX_PER_WINDOW + 5):
-            registry._check_watch_patterns(session, f"E {i}\n")
-        # Only WATCH_MAX_PER_WINDOW should be in the queue
-        assert registry.completion_queue.qsize() == WATCH_MAX_PER_WINDOW
-        assert session._watch_suppressed == 5
-
-    def test_window_resets(self, registry):
-        """After the window expires, notifications can flow again."""
-        session = _make_session(watch_patterns=["E"])
-        # Fill the window
-        for i in range(WATCH_MAX_PER_WINDOW):
-            registry._check_watch_patterns(session, f"E {i}\n")
-        # One more should be suppressed
-        registry._check_watch_patterns(session, "E extra\n")
+        registry._check_watch_patterns(session, "E first\n")
+        assert registry.completion_queue.qsize() == 1
+        # Immediately trigger another match — well inside cooldown.
+        registry._check_watch_patterns(session, "E second\n")
+        # Still only one notification.
+        assert registry.completion_queue.qsize() == 1
        assert session._watch_suppressed == 1
+        assert session._watch_consecutive_strikes == 1

-        # Fast-forward past window
-        session._watch_window_start = time.time() - WATCH_WINDOW_SECONDS - 1
-        registry._check_watch_patterns(session, "E after reset\n")
-        # Should deliver now (window reset)
-        assert registry.completion_queue.qsize() == WATCH_MAX_PER_WINDOW + 1
-
-    def test_suppressed_count_in_next_delivery(self, registry):
-        """Suppressed count is reported in the next successful delivery."""
+    def test_many_drops_inside_window_count_as_ONE_strike(self, registry):
+        """Multiple suppressions inside the same cooldown window = 1 strike."""
        session = _make_session(watch_patterns=["E"])
-        for i in range(WATCH_MAX_PER_WINDOW):
-            registry._check_watch_patterns(session, f"E {i}\n")
-        # Suppress 3 more
-        for i in range(3):
-            registry._check_watch_patterns(session, f"E suppressed {i}\n")
-        assert session._watch_suppressed == 3
+        registry._check_watch_patterns(session, "E\n")
+        for _ in range(10):
+            registry._check_watch_patterns(session, "E\n")
+        assert session._watch_consecutive_strikes == 1
+        assert session._watch_suppressed == 10

-        # Fast-forward past window to allow delivery
-        session._watch_window_start = time.time() - WATCH_WINDOW_SECONDS - 1
-        registry._check_watch_patterns(session, "E back\n")
-        # Drain to the last event
-        last_evt = None
-        while not registry.completion_queue.empty():
-            last_evt = registry.completion_queue.get_nowait()
-        assert last_evt["suppressed"] == 3
-        assert session._watch_suppressed == 0  # reset after delivery
-
-
-# =========================================================================
-# Overload kill switch
-# =========================================================================
-
-class TestOverloadKillSwitch:
-    def test_sustained_overload_disables(self, registry):
-        """Sustained overload beyond threshold permanently disables watching."""
+    def test_three_strikes_disables_watch_and_promotes_to_notify(self, registry):
+        """Three consecutive strike windows → watch_disabled + notify_on_complete."""
        session = _make_session(watch_patterns=["E"])
-        # Fill the window to trigger rate limit
-        for i in range(WATCH_MAX_PER_WINDOW):
-            registry._check_watch_patterns(session, f"E {i}\n")
+        session.notify_on_complete = False

-        # Simulate sustained overload: set overload_since to past threshold
-        session._watch_overload_since = time.time() - WATCH_OVERLOAD_KILL_SECONDS - 1
-        # Force another suppressed hit
-        registry._check_watch_patterns(session, "E overload\n")
-        registry._check_watch_patterns(session, "E overload2\n")
+        for strike in range(WATCH_STRIKE_LIMIT):
+            # Emit → arms cooldown.
+            registry._check_watch_patterns(session, f"E emit {strike}\n")
+            # Attempt while inside cooldown → one strike, dropped.
+            registry._check_watch_patterns(session, f"E drop {strike}\n")
+            # Fast-forward past the cooldown for the NEXT iteration, BUT leave
+            # the strike candidate set so the cooldown-expiry branch sees
+            # "this was a strike window" and doesn't reset the counter.
+            session._watch_cooldown_until = time.time() - 0.01

+        # After WATCH_STRIKE_LIMIT strikes, the next attempt should find
+        # the session disabled.
        assert session._watch_disabled is True
-        # Should have a watch_disabled event in the queue
+        assert session.notify_on_complete is True
+        # One watch_disabled summary event should be in the queue.
        disabled_evts = []
+        matches = 0
        while not registry.completion_queue.empty():
            evt = registry.completion_queue.get_nowait()
            if evt.get("type") == "watch_disabled":
                disabled_evts.append(evt)
+            elif evt.get("type") == "watch_match":
+                matches += 1
        assert len(disabled_evts) == 1
-        assert "too many matches" in disabled_evts[0]["message"]
+        assert "notify_on_complete" in disabled_evts[0]["message"]
+        # We should have had exactly WATCH_STRIKE_LIMIT emissions before disable.
+        assert matches == WATCH_STRIKE_LIMIT

-    def test_overload_resets_on_delivery(self, registry):
-        """Overload timer resets when a notification gets through."""
+    def test_clean_window_resets_strike_counter(self, registry):
+        """A cooldown that expires with zero drops resets the consecutive counter."""
        session = _make_session(watch_patterns=["E"])
-        # Start overload tracking
-        session._watch_overload_since = time.time() - 10
-        # But window allows delivery → overload should reset
-        registry._check_watch_patterns(session, "E ok\n")
-        assert session._watch_overload_since == 0.0
-        assert session._watch_disabled is False
+        # Emit + drop inside window → 1 strike.
+        registry._check_watch_patterns(session, "E emit\n")
+        registry._check_watch_patterns(session, "E drop\n")
+        assert session._watch_consecutive_strikes == 1
+
+        # Fast-forward past cooldown. No match arrived during the window —
+        # strike_candidate stays False from the prior window's reset, but
+        # it was True during that window. On the NEXT emission, the
+        # cooldown-expiry branch checks strike_candidate. Since we emitted
+        # at the start of this new window and no drop has happened, the
+        # reset branch should fire.
+        session._watch_cooldown_until = time.time() - 0.01
+        # Clear strike candidate to simulate "this cooldown had no drops".
+        session._watch_strike_candidate = False
+        registry._check_watch_patterns(session, "E clean\n")
+        assert session._watch_consecutive_strikes == 0
+
+    def test_suppressed_count_in_next_delivery(self, registry):
+        """Suppressed count from a strike window is reported in the next emit."""
+        session = _make_session(watch_patterns=["E"])
+        registry._check_watch_patterns(session, "E emit\n")
+        for _ in range(4):
+            registry._check_watch_patterns(session, "E drop\n")
+        assert session._watch_suppressed == 4
+
+        # Fast-forward past cooldown.
+        session._watch_cooldown_until = time.time() - 0.01
+        # Drain the queue so we can inspect the next emission.
+        while not registry.completion_queue.empty():
+            registry.completion_queue.get_nowait()
+
+        registry._check_watch_patterns(session, "E back\n")
+        evt = registry.completion_queue.get_nowait()
+        assert evt["type"] == "watch_match"
+        assert evt["suppressed"] == 4
+        assert session._watch_suppressed == 0  # reset after delivery


 # =========================================================================
@@ -321,3 +342,150 @@ class TestCodeExecutionBlocked:
    def test_watch_patterns_blocked(self):
        from tools.code_execution_tool import _TERMINAL_BLOCKED_PARAMS
        assert "watch_patterns" in _TERMINAL_BLOCKED_PARAMS
+
+
+# =========================================================================
+# Suppress-after-exit (anti-spam fix)
+# =========================================================================
+
+class TestSuppressAfterExit:
+    def test_match_dropped_once_session_exited(self, registry):
+        """watch_patterns notifications stop the moment session.exited is set."""
+        session = _make_session(watch_patterns=["ERROR"])
+        # Mark the process as exited BEFORE the late chunk arrives.
+        session.exited = True
+        registry._check_watch_patterns(session, "ERROR: late buffer\n")
+        assert registry.completion_queue.empty()
+        assert session._watch_hits == 0
+
+    def test_match_still_delivered_while_session_running(self, registry):
+        """Sanity: while the process is still running, matches still deliver."""
+        session = _make_session(watch_patterns=["ERROR"])
+        session.exited = False
+        registry._check_watch_patterns(session, "ERROR: oh no\n")
+        assert not registry.completion_queue.empty()
+        evt = registry.completion_queue.get_nowait()
+        assert evt["type"] == "watch_match"
+
+
+# =========================================================================
+# Mutual exclusion: notify_on_complete wins over watch_patterns
+# =========================================================================
+
+class TestMutualExclusion:
+    def test_resolver_drops_watch_when_notify_set(self):
+        """Both flags set → watch_patterns dropped with a note."""
+        from tools.terminal_tool import _resolve_notification_flag_conflict
+
+        resolved, note = _resolve_notification_flag_conflict(
+            notify_on_complete=True,
+            watch_patterns=["ERROR", "DONE"],
+            background=True,
+        )
+        assert resolved is None
+        assert "notify_on_complete" in note
+        assert "duplicate notifications" in note
+
+    def test_resolver_keeps_watch_when_notify_off(self):
+        """notify_on_complete=False → watch_patterns kept intact."""
+        from tools.terminal_tool import _resolve_notification_flag_conflict
+
+        resolved, note = _resolve_notification_flag_conflict(
+            notify_on_complete=False,
+            watch_patterns=["ERROR"],
+            background=True,
+        )
+        assert resolved == ["ERROR"]
+        assert note == ""
+
+    def test_resolver_keeps_notify_when_no_watch(self):
+        """Only notify_on_complete set → no conflict."""
+        from tools.terminal_tool import _resolve_notification_flag_conflict
+
+        resolved, note = _resolve_notification_flag_conflict(
+            notify_on_complete=True,
+            watch_patterns=None,
+            background=True,
+        )
+        assert resolved is None
+        assert note == ""
+
+    def test_resolver_inert_when_not_background(self):
+        """Without background=True, the whole thing is a no-op."""
+        from tools.terminal_tool import _resolve_notification_flag_conflict
+
+        resolved, note = _resolve_notification_flag_conflict(
+            notify_on_complete=True,
+            watch_patterns=["ERROR"],
+            background=False,
+        )
+        assert resolved == ["ERROR"]
+        assert note == ""
+
+
+# =========================================================================
+# Global circuit breaker (cross-session overflow blocker)
+# =========================================================================
+
+class TestGlobalCircuitBreaker:
+    def test_trips_after_global_threshold(self, registry):
+        """When >N matches fire across sessions in the window, breaker trips."""
+        sessions = [
+            _make_session(sid=f"proc_s{i}", watch_patterns=["E"])
+            for i in range(WATCH_GLOBAL_MAX_PER_WINDOW + 3)
+        ]
+        # Each session fires exactly one match — individually well under the
+        # per-session cap. But collectively they should trip the global cap.
+        for s in sessions:
+            registry._check_watch_patterns(s, "E hit\n")
+
+        # Drain the queue and count event types.
+        watch_matches = 0
+        overflow_tripped = 0
+        while not registry.completion_queue.empty():
+            evt = registry.completion_queue.get_nowait()
+            if evt.get("type") == "watch_match":
+                watch_matches += 1
+            elif evt.get("type") == "watch_overflow_tripped":
+                overflow_tripped += 1
+        assert watch_matches == WATCH_GLOBAL_MAX_PER_WINDOW
+        assert overflow_tripped == 1
+        assert registry._global_watch_tripped_until > 0
+
+    def test_cooldown_suppresses_and_then_releases(self, registry):
+        """After trip, further events are suppressed; cooldown expiry emits release."""
+        # Spawn enough fresh sessions to trip the global breaker.
+        sessions = [
+            _make_session(sid=f"proc_t{i}", watch_patterns=["E"])
+            for i in range(WATCH_GLOBAL_MAX_PER_WINDOW + 1)
+        ]
+        for s in sessions:
+            registry._check_watch_patterns(s, "E hit\n")
+        assert registry._global_watch_tripped_until > 0
+
+        # Further matches from BRAND-NEW sessions during cooldown are dropped.
+        q_size_before = registry.completion_queue.qsize()
+        extra1 = _make_session(sid="proc_extra1", watch_patterns=["E"])
+        extra2 = _make_session(sid="proc_extra2", watch_patterns=["E"])
+        registry._check_watch_patterns(extra1, "E hit\n")
+        registry._check_watch_patterns(extra2, "E hit\n")
+        assert registry.completion_queue.qsize() == q_size_before  # no new events
+        assert registry._global_watch_suppressed_during_trip >= 2
+
+        # Simulate cooldown expiry.
+        registry._global_watch_tripped_until = time.time() - 1
+
+        # Next call admits AND emits the release summary.
+        released_session = _make_session(sid="proc_after", watch_patterns=["E"])
+        registry._check_watch_patterns(released_session, "E hit\n")
+        released = False
+        admitted = False
+        while not registry.completion_queue.empty():
+            evt = registry.completion_queue.get_nowait()
+            if evt.get("type") == "watch_overflow_released":
+                released = True
+                assert evt["suppressed"] >= 2
+            elif evt.get("type") == "watch_match":
+                admitted = True
+        assert released
+        assert admitted
@@ -11,7 +11,7 @@ import os
 import re
 import sys
 from pathlib import Path
-from typing import Any, Dict, List, Optional
+from typing import Any, Dict, List, Optional, Union

 from hermes_constants import display_hermes_home

@@ -238,6 +238,7 @@ def cronjob(
    base_url: Optional[str] = None,
    reason: Optional[str] = None,
    script: Optional[str] = None,
+    context_from: Optional[Union[str, List[str]]] = None,
    enabled_toolsets: Optional[List[str]] = None,
    workdir: Optional[str] = None,
    task_id: str = None,
@@ -265,6 +266,18 @@ def cronjob(
                if script_error:
                    return tool_error(script_error, success=False)

+            # Validate context_from references existing jobs
+            if context_from:
+                from cron.jobs import get_job as _get_job
+                refs = [context_from] if isinstance(context_from, str) else context_from
+                for ref_id in refs:
+                    if not _get_job(ref_id):
+                        return tool_error(
+                            f"context_from job '{ref_id}' not found. "
+                            "Use cronjob(action='list') to see available jobs.",
+                            success=False,
+                        )
+
            job = create_job(
                prompt=prompt or "",
                schedule=schedule,
@@ -277,6 +290,7 @@ def cronjob(
                provider=_normalize_optional_job_value(provider),
                base_url=_normalize_optional_job_value(base_url, strip_trailing_slash=True),
                script=_normalize_optional_job_value(script),
+                context_from=context_from,
                enabled_toolsets=enabled_toolsets or None,
                workdir=_normalize_optional_job_value(workdir),
            )
@@ -368,6 +382,24 @@ def cronjob(
                    if script_error:
                        return tool_error(script_error, success=False)
                updates["script"] = _normalize_optional_job_value(script) if script else None
+            if context_from is not None:
+                # Empty string / empty list clears the field; otherwise validate
+                # each referenced job exists before storing. Normalized to a list
+                # (or None) to match the shape stored by create_job().
+                if isinstance(context_from, str):
+                    refs = [context_from.strip()] if context_from.strip() else []
+                else:
+                    refs = [str(j).strip() for j in context_from if str(j).strip()]
+                if refs:
+                    from cron.jobs import get_job as _get_job
+                    for ref_id in refs:
+                        if not _get_job(ref_id):
+                            return tool_error(
+                                f"context_from job '{ref_id}' not found. "
+                                "Use cronjob(action='list') to see available jobs.",
+                                success=False,
+                            )
+                updates["context_from"] = refs or None
            if enabled_toolsets is not None:
                updates["enabled_toolsets"] = enabled_toolsets or None
            if workdir is not None:
@@ -473,6 +505,19 @@ Important safety rule: cron-run sessions should not recursively schedule more cr
                "type": "string",
                "description": f"Optional path to a Python script that runs before each cron job execution. Its stdout is injected into the prompt as context. Use for data collection and change detection. Relative paths resolve under {display_hermes_home()}/scripts/. On update, pass empty string to clear."
            },
+            "context_from": {
+                "type": "array",
+                "items": {"type": "string"},
+                "description": (
+                    "Optional job ID or list of job IDs whose most recent completed output is "
+                    "injected into the prompt as context before each run. "
+                    "Use this to chain cron jobs: job A collects data, job B processes it. "
+                    "Each entry must be a valid job ID (from cronjob action='list'). "
+                    "Note: injects the most recent completed output — does not wait for "
+                    "upstream jobs running in the same tick. "
+                    "On update, pass an empty array to clear."
+                ),
+            },
            "enabled_toolsets": {
                "type": "array",
                "items": {"type": "string"},
@@ -526,6 +571,7 @@ registry.register(
        base_url=args.get("base_url"),
        reason=args.get("reason"),
        script=args.get("script"),
+        context_from=args.get("context_from"),
        enabled_toolsets=args.get("enabled_toolsets"),
        workdir=args.get("workdir"),
        task_id=kw.get("task_id"),
@@ -33,6 +33,7 @@ from typing import Any, Dict, List, Optional

 from toolsets import TOOLSETS
 from tools import file_state
+from tools.terminal_tool import set_approval_callback as _set_subagent_approval_cb
 from utils import base_url_hostname, is_truthy_value


@@ -47,6 +48,64 @@ DELEGATE_BLOCKED_TOOLS = frozenset(
    ]
 )

+
+# ---------------------------------------------------------------------------
+# Subagent approval callbacks
+# ---------------------------------------------------------------------------
+# Subagents run inside a ThreadPoolExecutor worker. The CLI's interactive
+# approval callback is stored in tools/terminal_tool.py's threading.local(),
+# so worker threads do NOT inherit it. Without a callback,
+# prompt_dangerous_approval() falls back to input() from the worker thread,
+# which deadlocks against the parent's prompt_toolkit TUI that owns stdin.
+#
+# Fix: install a non-interactive callback into every subagent worker thread
+# via ThreadPoolExecutor(initializer=_set_subagent_approval_cb, initargs=(cb,)).
+# The callback is chosen by the `delegation.subagent_auto_approve` config:
+#   false (default) → _subagent_auto_deny (safe; matches leaf tool blocklist)
+#   true            → _subagent_auto_approve (opt-in YOLO for cron/batch)
+# Both emit a logger.warning for audit; gateway sessions are unaffected
+# because they resolve approvals via tools/approval.py's per-session queue,
+# not through these TLS callbacks.
+def _subagent_auto_deny(command: str, description: str, **kwargs) -> str:
+    """Auto-deny dangerous commands in subagent threads (safe default).
+
+    Returns 'deny' so the subagent sees a refusal it can recover from, and
+    never calls input() (which would deadlock the parent TUI).
+    """
+    logger.warning(
+        "Subagent auto-denied dangerous command: %s (%s). "
+        "Set delegation.subagent_auto_approve: true to allow.",
+        command, description,
+    )
+    return "deny"
+
+
+def _subagent_auto_approve(command: str, description: str, **kwargs) -> str:
+    """Auto-approve dangerous commands in subagent threads (opt-in YOLO).
+
+    Only installed when delegation.subagent_auto_approve=true. Returns 'once'
+    so the subagent proceeds without blocking the parent UI.
+    """
+    logger.warning(
+        "Subagent auto-approved dangerous command: %s (%s)",
+        command, description,
+    )
+    return "once"
+
+
+def _get_subagent_approval_callback():
+    """Return the callback to install into subagent worker threads.
+
+    Config key: delegation.subagent_auto_approve (bool, default False).
+    Reads via the same _load_config() path as the rest of delegate_task so
+    priority is config.yaml > (no env override for this knob) > default.
+    """
+    cfg = _load_config()
+    val = cfg.get("subagent_auto_approve", False)
+    if is_truthy_value(val):
+        return _subagent_auto_approve
+    return _subagent_auto_deny
+
 # Build a description fragment listing toolsets available for subagents.
 # Excludes toolsets where ALL tools are blocked, composite/platform toolsets
 # (hermes-* prefixed), and scenario toolsets.
@@ -1344,7 +1403,15 @@ def _run_single_child(
        # Run child with a hard timeout to prevent indefinite blocking
        # when the child's API call or tool-level HTTP request hangs.
        child_timeout = _get_child_timeout()
-        _timeout_executor = ThreadPoolExecutor(max_workers=1)
+        _timeout_executor = ThreadPoolExecutor(
+            max_workers=1,
+            # Install a non-interactive approval callback in the worker thread
+            # so dangerous-command prompts from the subagent don't fall back to
+            # input() and deadlock the parent's prompt_toolkit TUI.
+            # Callback (deny vs approve) is governed by delegation.subagent_auto_approve.
+            initializer=_set_subagent_approval_cb,
+            initargs=(_get_subagent_approval_callback(),),
+        )
        # Capture the worker thread so the timeout diagnostic can dump its
        # Python stack (see #14726 — 0-API-call hangs are opaque without it).
        _worker_thread_holder: Dict[str, Optional[threading.Thread]] = {"t": None}
@@ -473,6 +473,12 @@ _ACTIONS = {
    "remove_role": _remove_role,
 }

+_CORE_ACTION_NAMES = frozenset({"fetch_messages", "search_members", "create_thread"})
+_ADMIN_ACTION_NAMES = frozenset(_ACTIONS.keys()) - _CORE_ACTION_NAMES
+
+_CORE_ACTIONS = {k: v for k, v in _ACTIONS.items() if k in _CORE_ACTION_NAMES}
+_ADMIN_ACTIONS = {k: v for k, v in _ACTIONS.items() if k in _ADMIN_ACTION_NAMES}
+
 # Single-source-of-truth manifest: action → (signature, one-line description).
 # Consumed by :func:`_build_schema` so the schema's top-level description
 # always matches the registered action set.
@@ -531,7 +537,7 @@ def _load_allowed_actions_config() -> Optional[List[str]]:
        from hermes_cli.config import load_config
        cfg = load_config()
    except Exception as exc:
-        logger.debug("discord_server: could not load config (%s); allowing all actions.", exc)
+        logger.debug("discord: could not load config (%s); allowing all actions.", exc)
        return None

    raw = (cfg.get("discord") or {}).get("server_actions")
@@ -586,12 +592,16 @@ def _available_actions(
 def _build_schema(
    actions: List[str],
    caps: Optional[Dict[str, Any]] = None,
-) -> Dict[str, Any]:
-    """Build the tool schema for the given filtered action list."""
+    tool_name: str = "discord",
+) -> Optional[Dict[str, Any]]:
+    """Build the tool schema for the given filtered action list.
+
+    Returns ``None`` when *actions* is empty — callers should drop the
+    tool from registration in that case.
+    """
    caps = caps or {}
    if not actions:
-        # Tool shouldn't be registered when empty, but guard anyway.
-        actions = list(_ACTIONS.keys())
+        return None

    # Action manifest lines (action-first, parameter-scoped).
    manifest_lines = [
@@ -602,24 +612,36 @@ def _build_schema(
    manifest_block = "\n".join(manifest_lines)

    content_note = ""
-    if caps.get("detected") and caps.get("has_message_content") is False:
+    affected_actions = {"fetch_messages", "list_pins"} & set(actions)
+    if affected_actions and caps.get("detected") and caps.get("has_message_content") is False:
+        names = " and ".join(sorted(affected_actions))
        content_note = (
-            "\n\nNOTE: Bot does NOT have the MESSAGE_CONTENT privileged intent. "
-            "fetch_messages and list_pins will return message metadata (author, "
+            f"\n\nNOTE: Bot does NOT have the MESSAGE_CONTENT privileged intent. "
+            f"{names} will return message metadata (author, "
            "timestamps, attachments, reactions, pin state) but `content` will be "
            "empty for messages not sent as a direct mention to the bot or in DMs. "
            "Enable the intent in the Discord Developer Portal to see all content."
        )

-    description = (
-        "Query and manage a Discord server via the REST API.\n\n"
-        "Available actions:\n"
-        f"{manifest_block}\n\n"
-        "Call list_guilds first to discover guild_ids, then list_channels for "
-        "channel_ids. Runtime errors will tell you if the bot lacks a specific "
-        "per-guild permission (e.g. MANAGE_ROLES for add_role)."
-        f"{content_note}"
-    )
+    if tool_name == "discord_admin":
+        description = (
+            "Manage a Discord server via the REST API.\n\n"
+            "Available actions:\n"
+            f"{manifest_block}\n\n"
+            "Call list_guilds first to discover guild_ids, then list_channels for "
+            "channel_ids. Runtime errors will tell you if the bot lacks a specific "
+            "per-guild permission (e.g. MANAGE_ROLES for add_role)."
+            f"{content_note}"
+        )
+    else:
+        description = (
+            "Read and participate in a Discord server.\n\n"
+            "Available actions:\n"
+            f"{manifest_block}\n\n"
+            "Use the channel_id from the current conversation context. "
+            "Use search_members to look up user IDs by name prefix."
+            f"{content_note}"
+        )

    properties: Dict[str, Any] = {
        "action": {
@@ -676,7 +698,7 @@ def _build_schema(
    }

    return {
-        "name": "discord_server",
+        "name": tool_name,
        "description": description,
        "parameters": {
            "type": "object",
@@ -686,28 +708,33 @@ def _build_schema(
    }


-def get_dynamic_schema() -> Optional[Dict[str, Any]]:
-    """Return a schema filtered by current intents + config allowlist.
-
-    Called by ``model_tools.get_tool_definitions`` as a post-processing
-    step so the schema the model sees always reflects reality. Returns
-    ``None`` when no actions are available (tool should be removed from
-    the schema list entirely).
-    """
+def _get_dynamic_schema(
+    action_subset: Dict[str, Any],
+    tool_name: str,
+) -> Optional[Dict[str, Any]]:
+    """Build a dynamic schema for *action_subset* filtered by intents + config."""
    token = _get_bot_token()
    if not token:
        return None
-
    caps = _detect_capabilities(token)
    allowlist = _load_allowed_actions_config()
-    actions = _available_actions(caps, allowlist)
+    actions = [a for a in _available_actions(caps, allowlist) if a in action_subset]
    if not actions:
-        logger.warning(
-            "discord_server: config allowlist/intents left zero available actions; "
-            "hiding tool from this session."
-        )
        return None
-    return _build_schema(actions, caps)
+    return _build_schema(actions, caps, tool_name=tool_name)
+
+
+def get_dynamic_schema_core() -> Optional[Dict[str, Any]]:
+    return _get_dynamic_schema(_CORE_ACTIONS, "discord")
+
+
+def get_dynamic_schema_admin() -> Optional[Dict[str, Any]]:
+    return _get_dynamic_schema(_ADMIN_ACTIONS, "discord_admin")
+
+
+def get_dynamic_schema() -> Optional[Dict[str, Any]]:
+    """Backward-compat wrapper — returns core schema."""
+    return get_dynamic_schema_core()


 # ---------------------------------------------------------------------------
@@ -774,11 +801,13 @@ def check_discord_tool_requirements() -> bool:


 # ---------------------------------------------------------------------------
-# Main handler
+# Handlers
 # ---------------------------------------------------------------------------

-def discord_server(
+def _run_discord_action(
    action: str,
+    valid_actions: Dict[str, Any],
+    tool_label: str,
    guild_id: str = "",
    channel_id: str = "",
    user_id: str = "",
@@ -790,18 +819,17 @@ def discord_server(
    before: str = "",
    after: str = "",
    auto_archive_duration: int = 1440,
-    task_id: str = None,
 ) -> str:
-    """Execute a Discord server action."""
+    """Shared handler logic for both discord tools."""
    token = _get_bot_token()
    if not token:
        return json.dumps({"error": "DISCORD_BOT_TOKEN not configured."})

-    action_fn = _ACTIONS.get(action)
+    action_fn = valid_actions.get(action)
    if not action_fn:
        return json.dumps({
            "error": f"Unknown action: {action}",
-            "available_actions": list(_ACTIONS.keys()),
+            "available_actions": list(valid_actions.keys()),
        })

    # Config-level allowlist gate (defense in depth — schema already filtered,
@@ -848,44 +876,64 @@ def discord_server(
            auto_archive_duration=auto_archive_duration,
        )
    except DiscordAPIError as e:
-        logger.warning("Discord API error in action '%s': %s", action, e)
+        logger.warning("Discord API error in %s action '%s': %s", tool_label, action, e)
        if e.status == 403:
            return json.dumps({"error": _enrich_403(action, e.body)})
        return json.dumps({"error": str(e)})
    except Exception as e:
-        logger.exception("Unexpected error in discord_server action '%s'", action)
+        logger.exception("Unexpected error in %s action '%s'", tool_label, action)
        return json.dumps({"error": f"Unexpected error: {e}"})


+def discord_core(action: str, **kwargs) -> str:
+    """Execute a core Discord action (fetch_messages, search_members, create_thread)."""
+    return _run_discord_action(action, _CORE_ACTIONS, "discord", **kwargs)
+
+
+def discord_admin_handler(action: str, **kwargs) -> str:
+    """Execute a Discord admin action (server management)."""
+    return _run_discord_action(action, _ADMIN_ACTIONS, "discord_admin", **kwargs)
+
+
 # ---------------------------------------------------------------------------
 # Tool registration
 # ---------------------------------------------------------------------------

-# Register with the full unfiltered schema. ``model_tools.get_tool_definitions``
-# rebuilds this per-session via ``get_dynamic_schema`` so the model only ever
-# sees intent-available, config-allowed actions. The static registration is a
-# safe baseline for tools that inspect the registry directly.
-_STATIC_SCHEMA = _build_schema(list(_ACTIONS.keys()), caps={"detected": False})
+_HANDLER_DEFAULTS = {
+    "action": "", "guild_id": "", "channel_id": "", "user_id": "",
+    "role_id": "", "message_id": "", "query": "", "name": "",
+    "limit": 50, "before": "", "after": "", "auto_archive_duration": 1440,
+}
+
+
+def _make_handler(handler_fn):
+    """Create a registry-compatible handler lambda for a discord handler."""
+    return lambda args, **kw: handler_fn(
+        **{k: args.get(k, v) for k, v in _HANDLER_DEFAULTS.items()},
+    )
+
+
+_STATIC_CORE_SCHEMA = _build_schema(
+    list(_CORE_ACTIONS.keys()), caps={"detected": False}, tool_name="discord",
+)
+_STATIC_ADMIN_SCHEMA = _build_schema(
+    list(_ADMIN_ACTIONS.keys()), caps={"detected": False}, tool_name="discord_admin",
+)

 registry.register(
-    name="discord_server",
+    name="discord",
    toolset="discord",
-    schema=_STATIC_SCHEMA,
-    handler=lambda args, **kw: discord_server(
-        action=args.get("action", ""),
-        guild_id=args.get("guild_id", ""),
-        channel_id=args.get("channel_id", ""),
-        user_id=args.get("user_id", ""),
-        role_id=args.get("role_id", ""),
-        message_id=args.get("message_id", ""),
-        query=args.get("query", ""),
-        name=args.get("name", ""),
-        limit=args.get("limit", 50),
-        before=args.get("before", ""),
-        after=args.get("after", ""),
-        auto_archive_duration=args.get("auto_archive_duration", 1440),
-        task_id=kw.get("task_id"),
-    ),
+    schema=_STATIC_CORE_SCHEMA,
+    handler=_make_handler(discord_core),
+    check_fn=check_discord_tool_requirements,
+    requires_env=["DISCORD_BOT_TOKEN"],
+)
+
+registry.register(
+    name="discord_admin",
+    toolset="discord_admin",
+    schema=_STATIC_ADMIN_SCHEMA,
+    handler=_make_handler(discord_admin_handler),
    check_fn=check_discord_tool_requirements,
    requires_env=["DISCORD_BOT_TOKEN"],
 )
@@ -58,10 +58,20 @@ MAX_OUTPUT_CHARS = 200_000      # 200KB rolling output buffer
 FINISHED_TTL_SECONDS = 1800     # Keep finished processes for 30 minutes
 MAX_PROCESSES = 64              # Max concurrent tracked processes (LRU pruning)

-# Watch pattern rate limiting
-WATCH_MAX_PER_WINDOW = 8        # Max notifications delivered per window
-WATCH_WINDOW_SECONDS = 10       # Rolling window length
-WATCH_OVERLOAD_KILL_SECONDS = 45  # Sustained overload duration before disabling watch
+# Watch pattern rate limiting — PER SESSION.
+# Hard rule: at most ONE watch-match notification every WATCH_MIN_INTERVAL_SECONDS.
+# Any match arriving inside that cooldown window is dropped and counted as a strike.
+# After WATCH_STRIKE_LIMIT consecutive strike windows, watch_patterns for that
+# session is permanently disabled and the session falls back to notify_on_complete
+# semantics (one notification when the process actually exits).
+WATCH_MIN_INTERVAL_SECONDS = 15   # Minimum spacing between consecutive watch matches
+WATCH_STRIKE_LIMIT = 3            # Strikes in a row → disable watch + promote to notify_on_complete
+
+# Global circuit breaker — across all sessions. Secondary safety net so concurrent
+# siblings can't collectively flood the user even when each is under its own cap.
+WATCH_GLOBAL_MAX_PER_WINDOW = 15
+WATCH_GLOBAL_WINDOW_SECONDS = 10
+WATCH_GLOBAL_COOLDOWN_SECONDS = 30


 def format_uptime_short(seconds: int) -> str:
@@ -105,10 +115,18 @@ class ProcessSession:
    watch_patterns: List[str] = field(default_factory=list)
    _watch_hits: int = field(default=0, repr=False)          # total matches delivered
    _watch_suppressed: int = field(default=0, repr=False)    # matches dropped by rate limit
-    _watch_overload_since: float = field(default=0.0, repr=False)  # when sustained overload began
-    _watch_disabled: bool = field(default=False, repr=False) # permanently killed by overload
-    _watch_window_hits: int = field(default=0, repr=False)   # hits in current rate window
-    _watch_window_start: float = field(default=0.0, repr=False)
+    _watch_disabled: bool = field(default=False, repr=False) # permanently killed after strike limit
+    # Per-session rate limit state: at most one match every WATCH_MIN_INTERVAL_SECONDS.
+    # When an emission happens, _watch_cooldown_until is set to now + interval and
+    # _watch_strike_candidate becomes True. The next match to arrive before that
+    # deadline counts as one strike (regardless of how many matches were dropped in
+    # between — a strike is a window, not a match). After WATCH_STRIKE_LIMIT strikes
+    # in a row, watch_patterns is disabled and the session promotes to
+    # notify_on_complete.
+    _watch_last_emit_at: float = field(default=0.0, repr=False)
+    _watch_cooldown_until: float = field(default=0.0, repr=False)
+    _watch_strike_candidate: bool = field(default=False, repr=False)
+    _watch_consecutive_strikes: int = field(default=0, repr=False)
    _lock: threading.Lock = field(default_factory=threading.Lock)
    _reader_thread: Optional[threading.Thread] = field(default=None, repr=False)
    _pty: Any = field(default=None, repr=False)  # ptyprocess handle (when use_pty=True)
@@ -151,6 +169,15 @@ class ProcessRegistry:
        # via wait/poll/log.  Drain loops skip notifications for these.
        self._completion_consumed: set = set()

+        # Global watch-match circuit breaker — across all sessions.
+        # Prevents sibling processes from collectively flooding the user even
+        # when each stays under its own per-session cap.
+        self._global_watch_lock = threading.Lock()
+        self._global_watch_window_start: float = 0.0
+        self._global_watch_window_hits: int = 0
+        self._global_watch_tripped_until: float = 0.0
+        self._global_watch_suppressed_during_trip: int = 0
+
    @staticmethod
    def _clean_shell_noise(text: str) -> str:
        """Strip shell startup warnings from the beginning of output."""
@@ -163,12 +190,23 @@ class ProcessRegistry:
        """Scan new output for watch patterns and queue notifications.

        Called from reader threads with new_text being the freshly-read chunk.
-        Rate-limited: max WATCH_MAX_PER_WINDOW notifications per WATCH_WINDOW_SECONDS.
-        If sustained overload exceeds WATCH_OVERLOAD_KILL_SECONDS, watching is
-        disabled permanently for this process.
+
+        Per-session rate limit: at most ONE watch-match notification per
+        WATCH_MIN_INTERVAL_SECONDS. Any match arriving inside the cooldown
+        window is dropped and counts as ONE strike for that window. After
+        WATCH_STRIKE_LIMIT consecutive strike windows, watch_patterns is
+        disabled for this session and the session is promoted to
+        notify_on_complete semantics — one notification when the process
+        actually exits, no more mid-process spam.
        """
        if not session.watch_patterns or session._watch_disabled:
            return
+        # Suppress-after-exit: once the reader loop has declared the process
+        # exited, any late chunk we still see is post-exit noise. Dropping these
+        # prevents the "stale notifications delivered minutes after the process
+        # ended" spam when completion_queue consumers run async.
+        if session.exited:
+            return

        # Scan new text line-by-line for pattern matches
        matched_lines = []
@@ -185,55 +223,80 @@ class ProcessRegistry:
            return

        now = time.time()
+        should_disable = False
        with session._lock:
-            # Reset window if it's expired
-            if now - session._watch_window_start >= WATCH_WINDOW_SECONDS:
-                session._watch_window_hits = 0
-                session._watch_window_start = now
-
-            # Check rate limit
-            if session._watch_window_hits >= WATCH_MAX_PER_WINDOW:
+            # Case 1: still inside the cooldown from the last emission.
+            # Count this as a strike for the current window (only once per window)
+            # and drop the event. If we've hit the strike limit, disable watch
+            # and promote to notify_on_complete.
+            if session._watch_cooldown_until and now < session._watch_cooldown_until:
                session._watch_suppressed += len(matched_lines)
+                if not session._watch_strike_candidate:
+                    # First drop in this window — count one strike.
+                    session._watch_strike_candidate = True
+                    session._watch_consecutive_strikes += 1
+                    if session._watch_consecutive_strikes >= WATCH_STRIKE_LIMIT:
+                        session._watch_disabled = True
+                        # Promote to notify_on_complete so the agent still gets
+                        # exactly one notification when the process actually ends.
+                        session.notify_on_complete = True
+                        should_disable = True
+                return_early = True
+            else:
+                # Case 2: cooldown has expired.
+                # Decide whether this window was a "clean" one (no drops) or a
+                # strike window. If no strike candidate was set during the prior
+                # cooldown, reset the consecutive-strike counter — we're back to
+                # healthy emission cadence.
+                if (
+                    session._watch_cooldown_until
+                    and not session._watch_strike_candidate
+                ):
+                    session._watch_consecutive_strikes = 0
+                session._watch_strike_candidate = False

-                # Track sustained overload for kill switch
-                if session._watch_overload_since == 0.0:
-                    session._watch_overload_since = now
-                elif now - session._watch_overload_since > WATCH_OVERLOAD_KILL_SECONDS:
-                    session._watch_disabled = True
-                    self.completion_queue.put({
-                        "session_id": session.id,
-                        "session_key": session.session_key,
-                        "command": session.command,
-                        "type": "watch_disabled",
-                        "suppressed": session._watch_suppressed,
-                        "platform": session.watcher_platform,
-                        "chat_id": session.watcher_chat_id,
-                        "user_id": session.watcher_user_id,
-                        "user_name": session.watcher_user_name,
-                        "thread_id": session.watcher_thread_id,
-                        "message": (
-                            f"Watch patterns disabled for process {session.id} — "
-                            f"too many matches ({session._watch_suppressed} suppressed). "
-                            f"Use process(action='poll') to check output manually."
-                        ),
-                    })
-                return
+                # Emit the notification and start a new cooldown window.
+                session._watch_last_emit_at = now
+                session._watch_cooldown_until = now + WATCH_MIN_INTERVAL_SECONDS
+                session._watch_hits += 1
+                suppressed = session._watch_suppressed
+                session._watch_suppressed = 0
+                return_early = False

-            # Under the rate limit — deliver notification
-            session._watch_window_hits += 1
-            session._watch_hits += 1
-            # Clear overload tracker since we got a delivery through
-            session._watch_overload_since = 0.0
-
-            # Include suppressed count if any events were dropped
-            suppressed = session._watch_suppressed
-            session._watch_suppressed = 0
+        if return_early:
+            if should_disable:
+                # Emit exactly one "watch disabled, falling back to notify_on_complete"
+                # summary event so the agent/user sees why things went quiet.
+                self.completion_queue.put({
+                    "session_id": session.id,
+                    "session_key": session.session_key,
+                    "command": session.command,
+                    "type": "watch_disabled",
+                    "suppressed": session._watch_suppressed,
+                    "platform": session.watcher_platform,
+                    "chat_id": session.watcher_chat_id,
+                    "user_id": session.watcher_user_id,
+                    "user_name": session.watcher_user_name,
+                    "thread_id": session.watcher_thread_id,
+                    "message": (
+                        f"Watch patterns disabled for process {session.id} — "
+                        f"{WATCH_STRIKE_LIMIT} consecutive rate-limit windows triggered "
+                        f"(min spacing {WATCH_MIN_INTERVAL_SECONDS}s). "
+                        f"Falling back to notify_on_complete semantics; you'll get "
+                        f"exactly one notification when the process exits."
+                    ),
+                })
+            return

        # Trim matched output to a reasonable size
        output = "\n".join(matched_lines[:20])
        if len(output) > 2000:
            output = output[:2000] + "\n...(truncated)"

+        # Global circuit breaker — across all sessions (secondary safety net).
+        if not self._global_watch_admit(now):
+            return
+
        self.completion_queue.put({
            "session_id": session.id,
            "session_key": session.session_key,
@@ -249,6 +312,93 @@ class ProcessRegistry:
            "thread_id": session.watcher_thread_id,
        })

+    def _global_watch_admit(self, now: float) -> bool:
+        """Return True if this watch_match event is allowed through the global breaker.
+
+        Semantics:
+        - If we're currently in a cooldown period, drop the event and count it.
+        - Otherwise, slide the rolling window and check the global cap.
+        - If the cap is exceeded, trip the breaker for WATCH_GLOBAL_COOLDOWN_SECONDS
+          and emit ONE summary event so the agent/user sees "N notifications were
+          suppressed" instead of getting them individually.
+        - When the cooldown ends, emit a release summary and reset counters.
+        """
+        with self._global_watch_lock:
+            # Handle cooldown expiry first so we can emit the release summary.
+            if self._global_watch_tripped_until and now >= self._global_watch_tripped_until:
+                suppressed = self._global_watch_suppressed_during_trip
+                self._global_watch_tripped_until = 0.0
+                self._global_watch_suppressed_during_trip = 0
+                self._global_watch_window_start = now
+                self._global_watch_window_hits = 0
+                if suppressed > 0:
+                    # Queue a summary event outside the lock (below).
+                    release_msg = {
+                        "session_id": "",
+                        "session_key": "",
+                        "command": "",
+                        "type": "watch_overflow_released",
+                        "suppressed": suppressed,
+                        "message": (
+                            f"Watch-pattern notifications resumed. "
+                            f"{suppressed} match event(s) were suppressed during the flood."
+                        ),
+                        "platform": "",
+                        "chat_id": "",
+                        "user_id": "",
+                        "user_name": "",
+                        "thread_id": "",
+                    }
+                else:
+                    release_msg = None
+            else:
+                release_msg = None
+
+            # Still in cooldown — drop and count.
+            if self._global_watch_tripped_until and now < self._global_watch_tripped_until:
+                self._global_watch_suppressed_during_trip += 1
+                admit = False
+                trip_now = None
+            else:
+                # Slide the window.
+                if now - self._global_watch_window_start >= WATCH_GLOBAL_WINDOW_SECONDS:
+                    self._global_watch_window_start = now
+                    self._global_watch_window_hits = 0
+
+                if self._global_watch_window_hits >= WATCH_GLOBAL_MAX_PER_WINDOW:
+                    # Trip the breaker.
+                    self._global_watch_tripped_until = now + WATCH_GLOBAL_COOLDOWN_SECONDS
+                    self._global_watch_suppressed_during_trip += 1
+                    trip_now = now
+                    admit = False
+                else:
+                    self._global_watch_window_hits += 1
+                    trip_now = None
+                    admit = True
+
+        # Queue summary events outside the lock.
+        if release_msg is not None:
+            self.completion_queue.put(release_msg)
+        if trip_now is not None:
+            self.completion_queue.put({
+                "session_id": "",
+                "session_key": "",
+                "command": "",
+                "type": "watch_overflow_tripped",
+                "message": (
+                    f"Watch-pattern overflow: >{WATCH_GLOBAL_MAX_PER_WINDOW} "
+                    f"notifications in {WATCH_GLOBAL_WINDOW_SECONDS}s across all processes. "
+                    f"Suppressing further watch_match events for "
+                    f"{WATCH_GLOBAL_COOLDOWN_SECONDS}s."
+                ),
+                "platform": "",
+                "chat_id": "",
+                "user_id": "",
+                "user_name": "",
+                "thread_id": "",
+            })
+        return admit
+
    @staticmethod
    def _is_host_pid_alive(pid: Optional[int]) -> bool:
        """Best-effort liveness check for host-visible PIDs."""
@@ -1388,6 +1388,33 @@ def _foreground_background_guidance(command: str) -> str | None:
    return None


+def _resolve_notification_flag_conflict(
+    *,
+    notify_on_complete: bool,
+    watch_patterns,
+    background: bool,
+) -> tuple:
+    """Decide what to do when both notify_on_complete and watch_patterns are set.
+
+    These flags produce duplicate, delayed notifications when combined — one
+    notification per watch-pattern match AND one on process exit, with async
+    delivery that can spam the user long after the process ends. When both are
+    set, we drop watch_patterns in favor of notify_on_complete (the more useful
+    "let me know when it's done" signal) and return a human-readable note.
+
+    Returns:
+        (watch_patterns_to_use, conflict_note). conflict_note is "" when there
+        is no conflict.
+    """
+    if background and notify_on_complete and watch_patterns:
+        note = (
+            "watch_patterns ignored because notify_on_complete=True; "
+            "these two flags produce duplicate notifications when combined"
+        )
+        return None, note
+    return watch_patterns, ""
+
+
 def terminal_tool(
    command: str,
    background: bool = False,
@@ -1410,8 +1437,8 @@ def terminal_tool(
        force: If True, skip dangerous command check (use after user confirms)
        workdir: Working directory for this command (optional, uses session cwd if not set)
        pty: If True, use pseudo-terminal for interactive CLI tools (local backend only)
-        notify_on_complete: If True and background=True, auto-notify the agent when the process exits
-        watch_patterns: List of strings to watch for in background output; fires a notification on first match per pattern. Use ONLY for mid-process signals (errors, readiness markers) that appear before exit. For end-of-run markers use notify_on_complete instead — stacking both produces duplicate, delayed notifications.
+        notify_on_complete: If True and background=True, you'll be notified exactly once when the process exits. The right choice for almost every long task. MUTUALLY EXCLUSIVE with watch_patterns.
+        watch_patterns: List of strings to watch for in background output. HARD rate limit: 1 notification per 15s per process. After 3 strike windows in a row, watch_patterns is disabled and the session is auto-promoted to notify_on_complete. Use ONLY for rare, one-shot mid-process signals on long-lived processes (server readiness, migration-done markers). NEVER use in loops/batch jobs — error patterns there will hit the strike limit and get disabled. MUTUALLY EXCLUSIVE with notify_on_complete — set one, not both.

    Returns:
        str: JSON string with output, exit_code, and error fields
@@ -1701,6 +1728,22 @@ def terminal_tool(
                        proc_session.watcher_user_name = _gw_user_name
                        proc_session.watcher_thread_id = _gw_thread_id

+                # Mutual exclusion: if both notify_on_complete and watch_patterns
+                # are set, drop watch_patterns. The combination produces duplicate
+                # notifications (one per match + one on exit) that deliver
+                # asynchronously and can spam the user long after the process ends.
+                # notify_on_complete is the more useful signal for "let me know
+                # when the task finishes"; watch_patterns should be reserved for
+                # standalone mid-process signals on long-lived processes.
+                watch_patterns, conflict_note = _resolve_notification_flag_conflict(
+                    notify_on_complete=bool(notify_on_complete),
+                    watch_patterns=watch_patterns,
+                    background=bool(background),
+                )
+                if conflict_note:
+                    logger.warning("background proc %s: %s", proc_session.id, conflict_note)
+                    result_data["watch_patterns_ignored"] = conflict_note
+
                # Mark for agent notification on completion
                if notify_on_complete and background:
                    proc_session.notify_on_complete = True
@@ -2039,13 +2082,13 @@ TERMINAL_SCHEMA = {
            },
            "notify_on_complete": {
                "type": "boolean",
-                "description": "When true (and background=true), you'll be automatically notified when the process finishes — no polling needed. Use this for tasks that take a while (tests, builds, deployments) so you can keep working on other things in the meantime.",
+                "description": "When true (and background=true), you'll be automatically notified exactly once when the process finishes. **This is the right choice for almost every long-running task** — tests, builds, deployments, multi-item batch jobs, anything that takes over a minute and has a defined end. Use this and keep working on other things; the system notifies you on exit. MUTUALLY EXCLUSIVE with watch_patterns — when both are set, watch_patterns is dropped.",
                "default": False
            },
            "watch_patterns": {
                "type": "array",
                "items": {"type": "string"},
-                "description": "Strings to watch for in background process output. Fires a notification the first time each pattern matches a line of output. **Use ONLY for mid-process signals** you want to react to before the process exits — errors, readiness markers, intermediate step markers (e.g. [\"ERROR\", \"Traceback\", \"listening on port\"]). Do NOT use for end-of-run markers (summary headers, 'DONE', 'PASS' printed right before exit) — use `notify_on_complete` for that instead. Stacking end-of-run patterns on top of `notify_on_complete` produces duplicate, delayed notifications that arrive after you've already moved on, since delivery is asynchronous and continues after the process exits."
+                "description": "Strings to watch for in background process output. HARD RATE LIMIT: at most 1 notification per 15 seconds per process — matches arriving inside the cooldown are dropped. After 3 consecutive 15-second windows with dropped matches, watch_patterns is automatically disabled for that process and promoted to notify_on_complete behavior (one notification on exit, no more mid-process spam). USE ONLY for truly rare, one-shot mid-process signals on LONG-LIVED processes that will never exit on their own — e.g. ['Application startup complete'] on a server so you know when to hit its endpoint, or ['migration done'] on a daemon. DO NOT use for: (1) end-of-run markers like 'DONE'/'PASS' — use notify_on_complete instead; (2) error patterns like 'ERROR'/'Traceback' in loops or multi-item batch jobs — they fire on every iteration and you'll hit the strike limit fast; (3) anything you'd ever combine with notify_on_complete. When in doubt, choose notify_on_complete. MUTUALLY EXCLUSIVE with notify_on_complete — set one, not both."
            }
        },
        "required": ["command"]
@@ -202,6 +202,18 @@ TOOLSETS = {
        "includes": []
    },

+    "discord": {
+        "description": "Discord read and participate tools (fetch messages, search members, create threads)",
+        "tools": ["discord"],
+        "includes": [],
+    },
+
+    "discord_admin": {
+        "description": "Discord server management (list channels/roles, pin messages, assign roles)",
+        "tools": ["discord_admin"],
+        "includes": [],
+    },
+
    "feishu_doc": {
        "description": "Read Feishu/Lark document content",
        "tools": ["feishu_doc_read"],
@@ -326,8 +338,8 @@ TOOLSETS = {
    "hermes-discord": {
        "description": "Discord bot toolset - full access (terminal has safety checks via dangerous command approval)",
        "tools": _HERMES_CORE_TOOLS + [
-            # Discord server introspection & management (gated on DISCORD_BOT_TOKEN via check_fn)
-            "discord_server",
+            "discord",
+            "discord_admin",
        ],
        "includes": []
    },
@@ -388,7 +400,13 @@ TOOLSETS = {

    "hermes-feishu": {
        "description": "Feishu/Lark bot toolset - enterprise messaging via Feishu/Lark (full access)",
-        "tools": _HERMES_CORE_TOOLS,
+        "tools": _HERMES_CORE_TOOLS + [
+            "feishu_doc_read",
+            "feishu_drive_list_comments",
+            "feishu_drive_list_comment_replies",
+            "feishu_drive_reply_comment",
+            "feishu_drive_add_comment",
+        ],
        "includes": []
    },

@@ -15,6 +15,7 @@ import { Badge } from "@/components/ui/badge";
 import { Button } from "@/components/ui/button";
 import { usePageHeader } from "@/contexts/usePageHeader";
 import { useI18n } from "@/i18n";
+import { PluginSlot } from "@/plugins";

 const PERIODS = [
  { label: "7d", days: 7 },
@@ -350,6 +351,7 @@ export default function AnalyticsPage() {

  return (
    <div className="flex flex-col gap-6">
+      <PluginSlot name="analytics:top" />
      {loading && !data && (
        <div className="flex items-center justify-center py-24">
          <div className="h-6 w-6 animate-spin rounded-full border-2 border-primary border-t-transparent" />
@@ -409,6 +411,7 @@ export default function AnalyticsPage() {
          </CardContent>
        </Card>
      )}
+      <PluginSlot name="analytics:bottom" />
    </div>
  );
 }
@@ -32,6 +32,7 @@ import { useSearchParams } from "react-router-dom";
 import { ChatSidebar } from "@/components/ChatSidebar";
 import { usePageHeader } from "@/contexts/usePageHeader";
 import { useI18n } from "@/i18n";
+import { PluginSlot } from "@/plugins";

 function buildWsUrl(
  token: string,
@@ -670,6 +671,7 @@ export default function ChatPage() {

  return (
    <div className="flex min-h-0 flex-1 flex-col gap-2 normal-case">
+      <PluginSlot name="chat:top" />
      {mobileModelToolsPortal}

      {banner && (
@@ -732,6 +734,7 @@ export default function ChatPage() {
          </div>
        )}
      </div>
+      <PluginSlot name="chat:bottom" />
    </div>
  );
 }
@@ -39,6 +39,7 @@ import { Input } from "@/components/ui/input";
 import { Badge } from "@/components/ui/badge";
 import { useI18n } from "@/i18n";
 import { usePageHeader } from "@/contexts/usePageHeader";
+import { PluginSlot } from "@/plugins";

 /* ------------------------------------------------------------------ */
 /*  Helpers                                                            */
@@ -313,6 +314,7 @@ export default function ConfigPage() {

  return (
    <div className="flex flex-col gap-4">
+      <PluginSlot name="config:top" />
      <Toast toast={toast} />

      {/* ═══════════════ Header Bar ═══════════════ */}
@@ -505,6 +507,7 @@ export default function ConfigPage() {
          </div>
        </div>
      )}
+      <PluginSlot name="config:bottom" />
    </div>
  );
 }
@@ -14,6 +14,7 @@ import { Input } from "@/components/ui/input";
 import { Label } from "@/components/ui/label";
 import { Select, SelectOption } from "@/components/ui/select";
 import { useI18n } from "@/i18n";
+import { PluginSlot } from "@/plugins";

 function formatTime(iso?: string | null): string {
  if (!iso) return "—";
@@ -149,6 +150,7 @@ export default function CronPage() {

  return (
    <div className="flex flex-col gap-6">
+      <PluginSlot name="cron:top" />
      <Toast toast={toast} />

      <DeleteConfirmDialog
@@ -346,6 +348,7 @@ export default function CronPage() {
          </Card>
        ))}
      </div>
+      <PluginSlot name="cron:bottom" />
    </div>
  );
 }
@@ -4,6 +4,7 @@ import { useI18n } from "@/i18n";
 import { usePageHeader } from "@/contexts/usePageHeader";
 import { buttonVariants } from "@/components/ui/button";
 import { cn } from "@/lib/utils";
+import { PluginSlot } from "@/plugins";

 export const HERMES_DOCS_URL = "https://hermes-agent.nousresearch.com/docs/";

@@ -38,6 +39,7 @@ export default function DocsPage() {
        "pt-1 sm:pt-2",
      )}
    >
+      <PluginSlot name="docs:top" />
      <iframe
        title={t.app.nav.documentation}
        src={HERMES_DOCS_URL}
@@ -49,6 +51,7 @@ export default function DocsPage() {
        sandbox="allow-scripts allow-same-origin allow-popups allow-forms"
        referrerPolicy="no-referrer-when-downgrade"
      />
+      <PluginSlot name="docs:bottom" />
    </div>
  );
 }
@@ -27,6 +27,7 @@ import { Button } from "@/components/ui/button";
 import { Input } from "@/components/ui/input";
 import { Label } from "@/components/ui/label";
 import { useI18n } from "@/i18n";
+import { PluginSlot } from "@/plugins";

 /* ------------------------------------------------------------------ */
 /*  Provider grouping                                                  */
@@ -511,6 +512,7 @@ export default function EnvPage() {

  return (
    <div className="flex flex-col gap-6">
+      <PluginSlot name="env:top" />
      <Toast toast={toast} />

      <DeleteConfirmDialog
@@ -610,6 +612,7 @@ export default function EnvPage() {
          </Card>
        );
      })}
+      <PluginSlot name="env:bottom" />
    </div>
  );
 }
@@ -9,6 +9,7 @@ import { Label } from "@/components/ui/label";
 import { FilterGroup, Segmented } from "@/components/ui/segmented";
 import { useI18n } from "@/i18n";
 import { usePageHeader } from "@/contexts/usePageHeader";
+import { PluginSlot } from "@/plugins";

 const FILES = ["agent", "errors", "gateway"] as const;
 const LEVELS = ["ALL", "DEBUG", "INFO", "WARNING", "ERROR"] as const;
@@ -141,6 +142,7 @@ export default function LogsPage() {

  return (
    <div className="flex flex-col gap-4">
+      <PluginSlot name="logs:top" />
      {/* ═══════════════ Filter toolbar ═══════════════ */}
      <div
        role="toolbar"
@@ -215,6 +217,7 @@ export default function LogsPage() {
          </div>
        </CardContent>
      </Card>
+      <PluginSlot name="logs:bottom" />
    </div>
  );
 }
@@ -46,6 +46,7 @@ import { useSystemActions } from "@/contexts/useSystemActions";
 import { useToast } from "@/hooks/useToast";
 import { useI18n } from "@/i18n";
 import { usePageHeader } from "@/contexts/usePageHeader";
+import { PluginSlot } from "@/plugins";
 import { isDashboardEmbeddedChatEnabled } from "@/lib/dashboard-flags";

 const SOURCE_CONFIG: Record<string, { icon: typeof Terminal; color: string }> =
@@ -612,6 +613,7 @@ export default function SessionsPage() {

  return (
    <div className="flex flex-col gap-4">
+      <PluginSlot name="sessions:top" />
      <Toast toast={toast} />

      <DeleteConfirmDialog
@@ -834,6 +836,7 @@ export default function SessionsPage() {
          )}
        </>
      )}
+      <PluginSlot name="sessions:bottom" />
    </div>
  );
 }
@@ -25,6 +25,7 @@ import { Input } from "@/components/ui/input";
 import { Switch } from "@/components/ui/switch";
 import { useI18n } from "@/i18n";
 import { usePageHeader } from "@/contexts/usePageHeader";
+import { PluginSlot } from "@/plugins";

 /* ------------------------------------------------------------------ */
 /*  Types & helpers                                                    */
@@ -251,6 +252,7 @@ export default function SkillsPage() {

  return (
    <div className="flex flex-col gap-4">
+      <PluginSlot name="skills:top" />
      <Toast toast={toast} />

      {/* ═══════════════ Filter panel + Content ═══════════════ */}
@@ -509,6 +511,7 @@ export default function SkillsPage() {
          )}
        </div>
      </div>
+      <PluginSlot name="skills:bottom" />
    </div>
  );
 }
@@ -18,6 +18,7 @@ import React, { Fragment, useEffect, useState } from "react";
 /** Slot locations the built-in shell renders. Plugins declaring any of
 *  these in their manifest's `slots` field get wired in automatically.
 *
+ *  Shell-wide slots:
 *  - `backdrop`         — rendered inside `<Backdrop />`, above the noise layer
 *  - `header-left`      — injected before the Hermes brand in the top bar
 *  - `header-right`     — injected before the theme/language switchers
@@ -31,8 +32,31 @@ import React, { Fragment, useEffect, useState } from "react";
 *  - `overlay`          — fixed-position layer above everything else;
 *                         useful for chrome (scanlines, vignettes) the
 *                         theme's customCSS can't achieve alone
+ *
+ *  Page-scoped slots (rendered inside a specific built-in page — use these
+ *  to inject widgets, cards, or toolbars into existing pages without
+ *  overriding the whole route):
+ *  - `sessions:top`     — top of /sessions page (above session list)
+ *  - `sessions:bottom`  — bottom of /sessions page
+ *  - `analytics:top`    — top of /analytics page
+ *  - `analytics:bottom` — bottom of /analytics page
+ *  - `logs:top`         — top of /logs page (above filter toolbar)
+ *  - `logs:bottom`      — bottom of /logs page (below log viewer)
+ *  - `cron:top`         — top of /cron page
+ *  - `cron:bottom`      — bottom of /cron page
+ *  - `skills:top`       — top of /skills page
+ *  - `skills:bottom`    — bottom of /skills page
+ *  - `config:top`       — top of /config page
+ *  - `config:bottom`    — bottom of /config page
+ *  - `env:top`          — top of /env (Keys) page
+ *  - `env:bottom`       — bottom of /env (Keys) page
+ *  - `docs:top`         — top of /docs page (above the docs iframe)
+ *  - `docs:bottom`      — bottom of /docs page
+ *  - `chat:top`         — top of /chat page (above the composer, when embedded chat is on)
+ *  - `chat:bottom`      — bottom of /chat page
 */
 export const KNOWN_SLOT_NAMES = [
+  // Shell-wide
  "backdrop",
  "header-left",
  "header-right",
@@ -43,6 +67,25 @@ export const KNOWN_SLOT_NAMES = [
  "footer-left",
  "footer-right",
  "overlay",
+  // Page-scoped
+  "sessions:top",
+  "sessions:bottom",
+  "analytics:top",
+  "analytics:bottom",
+  "logs:top",
+  "logs:bottom",
+  "cron:top",
+  "cron:bottom",
+  "skills:top",
+  "skills:bottom",
+  "config:top",
+  "config:bottom",
+  "env:top",
+  "env:bottom",
+  "docs:top",
+  "docs:bottom",
+  "chat:top",
+  "chat:bottom",
 ] as const;

 export type KnownSlotName = (typeof KNOWN_SLOT_NAMES)[number];
@@ -1,336 +0,0 @@
---
-sidebar_position: 16
-title: "Dashboard Plugins"
-description: "Build custom tabs and extensions for the Hermes web dashboard"
---
-
-# Dashboard Plugins
-
-Dashboard plugins let you add custom tabs to the web dashboard. A plugin can display its own UI, call the Hermes API, and optionally register backend endpoints — all without touching the dashboard source code.
-
-## Quick Start
-
-Create a plugin directory with a manifest and a JS file:
-
-```bash
-mkdir -p ~/.hermes/plugins/my-plugin/dashboard/dist
-```
-
-**manifest.json:**
-
-```json
-{
-  "name": "my-plugin",
-  "label": "My Plugin",
-  "icon": "Sparkles",
-  "version": "1.0.0",
-  "tab": {
-    "path": "/my-plugin",
-    "position": "after:skills"
-  },
-  "entry": "dist/index.js"
-}
-```
-
-**dist/index.js:**
-
-```javascript
-(function () {
-  var SDK = window.__HERMES_PLUGIN_SDK__;
-  var React = SDK.React;
-  var Card = SDK.components.Card;
-  var CardHeader = SDK.components.CardHeader;
-  var CardTitle = SDK.components.CardTitle;
-  var CardContent = SDK.components.CardContent;
-
-  function MyPage() {
-    return React.createElement(Card, null,
-      React.createElement(CardHeader, null,
-        React.createElement(CardTitle, null, "My Plugin")
-      ),
-      React.createElement(CardContent, null,
-        React.createElement("p", { className: "text-sm text-muted-foreground" },
-          "Hello from my custom dashboard tab!"
-        )
-      )
-    );
-  }
-
-  window.__HERMES_PLUGINS__.register("my-plugin", MyPage);
-})();
-```
-
-Refresh the dashboard — your tab appears in the navigation bar.
-
-## Plugin Structure
-
-Plugins live inside the standard `~/.hermes/plugins/` directory. The dashboard extension is a `dashboard/` subfolder:
-
-```
-~/.hermes/plugins/my-plugin/
-  plugin.yaml              # optional — existing CLI/gateway plugin manifest
-  __init__.py              # optional — existing CLI/gateway hooks
-  dashboard/               # dashboard extension
-    manifest.json          # required — tab config, icon, entry point
-    dist/
-      index.js             # required — pre-built JS bundle
-      style.css            # optional — custom CSS
-    plugin_api.py          # optional — backend API routes
-```
-
-A single plugin can extend both the CLI/gateway (via `plugin.yaml` + `__init__.py`) and the dashboard (via `dashboard/`) from one directory.
-
-## Manifest Reference
-
-The `manifest.json` file describes your plugin to the dashboard:
-
-```json
-{
-  "name": "my-plugin",
-  "label": "My Plugin",
-  "description": "What this plugin does",
-  "icon": "Sparkles",
-  "version": "1.0.0",
-  "tab": {
-    "path": "/my-plugin",
-    "position": "after:skills"
-  },
-  "entry": "dist/index.js",
-  "css": "dist/style.css",
-  "api": "plugin_api.py"
-}
-```
-
-| Field | Required | Description |
-|-------|----------|-------------|
-| `name` | Yes | Unique plugin identifier (lowercase, hyphens ok) |
-| `label` | Yes | Display name shown in the nav tab |
-| `description` | No | Short description |
-| `icon` | No | Lucide icon name (default: `Puzzle`) |
-| `version` | No | Semver version string |
-| `tab.path` | Yes | URL path for the tab (e.g. `/my-plugin`) |
-| `tab.position` | No | Where to insert the tab: `end` (default), `after:<tab>`, `before:<tab>` |
-| `entry` | Yes | Path to the JS bundle relative to `dashboard/` |
-| `css` | No | Path to a CSS file to inject |
-| `api` | No | Path to a Python file with FastAPI routes |
-
-### Tab Position
-
-The `position` field controls where your tab appears in the navigation:
-
- `"end"` — after all built-in tabs (default)
- `"after:skills"` — after the Skills tab
- `"before:config"` — before the Config tab
- `"after:cron"` — after the Cron tab
-
-The value after the colon is the path segment of the target tab (without the leading slash).
-
-### Available Icons
-
-Plugins can use any of these Lucide icon names:
-
-`Activity`, `BarChart3`, `Clock`, `Code`, `Database`, `Eye`, `FileText`, `Globe`, `Heart`, `KeyRound`, `MessageSquare`, `Package`, `Puzzle`, `Settings`, `Shield`, `Sparkles`, `Star`, `Terminal`, `Wrench`, `Zap`
-
-Unrecognized icon names fall back to `Puzzle`.
-
-## Plugin SDK
-
-Plugins don't bundle React or UI components — they use the SDK exposed on `window.__HERMES_PLUGIN_SDK__`. This avoids version conflicts and keeps plugin bundles tiny.
-
-### SDK Contents
-
-```javascript
-var SDK = window.__HERMES_PLUGIN_SDK__;
-
-// React
-SDK.React              // React instance
-SDK.hooks.useState     // React hooks
-SDK.hooks.useEffect
-SDK.hooks.useCallback
-SDK.hooks.useMemo
-SDK.hooks.useRef
-SDK.hooks.useContext
-SDK.hooks.createContext
-
-// API
-SDK.api                // Hermes API client (getStatus, getSessions, etc.)
-SDK.fetchJSON          // Raw fetch for custom endpoints — handles auth automatically
-
-// UI Components (shadcn/ui style)
-SDK.components.Card
-SDK.components.CardHeader
-SDK.components.CardTitle
-SDK.components.CardContent
-SDK.components.Badge
-SDK.components.Button
-SDK.components.Input
-SDK.components.Label
-SDK.components.Select
-SDK.components.SelectOption
-SDK.components.Separator
-SDK.components.Tabs
-SDK.components.TabsList
-SDK.components.TabsTrigger
-
-// Utilities
-SDK.utils.cn           // Tailwind class merger (clsx + twMerge)
-SDK.utils.timeAgo      // "5m ago" from unix timestamp
-SDK.utils.isoTimeAgo   // "5m ago" from ISO string
-
-// Hooks
-SDK.useI18n            // i18n translations
-SDK.useTheme           // Current theme info
-```
-
-### Using SDK.fetchJSON
-
-For calling your plugin's backend API endpoints:
-
-```javascript
-SDK.fetchJSON("/api/plugins/my-plugin/data")
-  .then(function (result) {
-    console.log(result);
-  })
-  .catch(function (err) {
-    console.error("API call failed:", err);
-  });
-```
-
-`fetchJSON` automatically injects the session auth token, handles errors, and parses JSON.
-
-### Using Existing API Methods
-
-The `SDK.api` object has methods for all built-in Hermes endpoints:
-
-```javascript
-// Fetch agent status
-SDK.api.getStatus().then(function (status) {
-  console.log("Version:", status.version);
-});
-
-// List sessions
-SDK.api.getSessions(10).then(function (resp) {
-  console.log("Sessions:", resp.sessions.length);
-});
-```
-
-## Backend API Routes
-
-Plugins can register FastAPI routes by setting the `api` field in the manifest. Create a Python file that exports a `router`:
-
-```python
-# plugin_api.py
-from fastapi import APIRouter
-
-router = APIRouter()
-
-@router.get("/data")
-async def get_data():
-    return {"items": ["one", "two", "three"]}
-
-@router.post("/action")
-async def do_action(body: dict):
-    return {"ok": True, "received": body}
-```
-
-Routes are mounted at `/api/plugins/<name>/`, so the above becomes:
- `GET /api/plugins/my-plugin/data`
- `POST /api/plugins/my-plugin/action`
-
-Plugin API routes bypass session token authentication since the dashboard server only binds to localhost.
-
-### Accessing Hermes Internals
-
-Backend routes can import from the hermes-agent codebase:
-
-```python
-from fastapi import APIRouter
-from hermes_state import SessionDB
-from hermes_cli.config import load_config
-
-router = APIRouter()
-
-@router.get("/session-count")
-async def session_count():
-    db = SessionDB()
-    try:
-        count = len(db.list_sessions(limit=9999))
-        return {"count": count}
-    finally:
-        db.close()
-```
-
-## Custom CSS
-
-If your plugin needs custom styles, add a CSS file and reference it in the manifest:
-
-```json
-{
-  "css": "dist/style.css"
-}
-```
-
-The CSS file is injected as a `<link>` tag when the plugin loads. Use specific class names to avoid conflicts with the dashboard's existing styles.
-
-```css
-/* dist/style.css */
-.my-plugin-chart {
-  border: 1px solid var(--color-border);
-  background: var(--color-card);
-  padding: 1rem;
-}
-```
-
-You can use the dashboard's CSS custom properties (e.g. `--color-border`, `--color-foreground`) to match the active theme.
-
-## Plugin Loading Flow
-
-1. Dashboard loads — `main.tsx` exposes the SDK on `window.__HERMES_PLUGIN_SDK__`
-2. `App.tsx` calls `usePlugins()` which fetches `GET /api/dashboard/plugins`
-3. For each plugin: CSS `<link>` injected (if declared), JS `<script>` loaded
-4. Plugin JS calls `window.__HERMES_PLUGINS__.register(name, Component)`
-5. Dashboard adds the tab to navigation and mounts the component as a route
-
-Plugins have up to 2 seconds to register after their script loads. If a plugin fails to load, the dashboard continues without it.
-
-## Plugin Discovery
-
-The dashboard scans these directories for `dashboard/manifest.json`:
-
-1. **User plugins:** `~/.hermes/plugins/<name>/dashboard/manifest.json`
-2. **Bundled plugins:** `<repo>/plugins/<name>/dashboard/manifest.json`
-3. **Project plugins:** `./.hermes/plugins/<name>/dashboard/manifest.json` (only when `HERMES_ENABLE_PROJECT_PLUGINS` is set)
-
-User plugins take precedence — if the same plugin name exists in multiple sources, the user version wins.
-
-To force re-scanning after adding a new plugin without restarting the server:
-
-```bash
-curl http://127.0.0.1:9119/api/dashboard/plugins/rescan
-```
-
-## Plugin API Endpoints
-
-| Endpoint | Method | Description |
-|----------|--------|-------------|
-| `/api/dashboard/plugins` | GET | List discovered plugins |
-| `/api/dashboard/plugins/rescan` | GET | Force re-scan for new plugins |
-| `/dashboard-plugins/<name>/<path>` | GET | Serve plugin static assets |
-| `/api/plugins/<name>/*` | * | Plugin-registered API routes |
-
-## Example Plugin
-
-The repository includes an example plugin at `plugins/example-dashboard/` that demonstrates:
-
- Using SDK components (Card, Badge, Button)
- Calling a backend API route
- Registering via `window.__HERMES_PLUGINS__.register()`
-
-To try it, run `hermes dashboard` — the "Example" tab appears after Skills.
-
-## Tips
-
- **No build step required** — write plain JavaScript IIFEs. If you prefer JSX, use any bundler (esbuild, Vite, webpack) targeting IIFE output with React as an external.
- **Keep bundles small** — React and all UI components are provided by the SDK. Your bundle should only contain your plugin logic.
- **Use theme variables** — reference `var(--color-*)` in CSS to automatically match whatever theme the user has selected.
- **Test locally** — run `hermes dashboard --no-open` and use browser dev tools to verify your plugin loads and registers correctly.
@@ -0,0 +1,904 @@
+---
+sidebar_position: 17
+title: "Extending the Dashboard"
+description: "Build themes and plugins for the Hermes web dashboard — palettes, typography, layouts, custom tabs, shell slots, page-scoped slots, and backend API routes"
+---
+
+# Extending the Dashboard
+
+The Hermes web dashboard (`hermes dashboard`) is built to be reskinned and extended without forking the codebase. Three layers are exposed:
+
+1. **Themes** — YAML files that repaint the dashboard's palette, typography, layout, and per-component chrome. Drop a file in `~/.hermes/dashboard-themes/`; it appears in the theme switcher.
+2. **UI plugins** — a directory with `manifest.json` + a JavaScript bundle that registers a tab, replaces a built-in page, augments one via page-scoped slots, or injects components into named shell slots.
+3. **Backend plugins** — a Python file inside that plugin directory that exposes a FastAPI `router`; routes are mounted under `/api/plugins/<name>/` and called from the plugin's UI.
+
+All three are **drop-in at runtime**: no repo clone, no `npm run build`, no patching the dashboard source. This page is the canonical reference for all three.
+
+If you just want to use the dashboard, see [Web Dashboard](./web-dashboard). If you want to reskin the terminal CLI (not the web dashboard), see [Skins & Themes](./skins) — the CLI skin system is unrelated to dashboard themes.
+
+:::note How the pieces compose
+Themes and plugins are independent but synergistic. A theme can stand alone (just a YAML file). A plugin can stand alone (just a tab). Together they let you build a complete visual reskin with custom HUDs — the bundled `strike-freedom-cockpit` demo does exactly that. See [Combined theme + plugin demo](#combined-theme--plugin-demo).
+:::
+
+---
+
+## Table of contents
+
+- [Themes](#themes)
+  - [Quick start — your first theme](#quick-start--your-first-theme)
+  - [Palette, typography, layout](#palette-typography-layout)
+  - [Layout variants](#layout-variants)
+  - [Theme assets (images as CSS vars)](#theme-assets-images-as-css-vars)
+  - [Component chrome overrides](#component-chrome-overrides)
+  - [Color overrides](#color-overrides)
+  - [Raw `customCSS`](#raw-customcss)
+  - [Built-in themes](#built-in-themes)
+  - [Full theme YAML reference](#full-theme-yaml-reference)
+- [Plugins](#plugins)
+  - [Quick start — your first plugin](#quick-start--your-first-plugin)
+  - [Directory layout](#directory-layout)
+  - [Manifest reference](#manifest-reference)
+  - [The Plugin SDK](#the-plugin-sdk)
+  - [Shell slots](#shell-slots)
+  - [Replacing built-in pages (`tab.override`)](#replacing-built-in-pages-taboverride)
+  - [Augmenting built-in pages (page-scoped slots)](#augmenting-built-in-pages-page-scoped-slots)
+  - [Slot-only plugins (`tab.hidden`)](#slot-only-plugins-tabhidden)
+  - [Backend API routes](#backend-api-routes)
+  - [Custom CSS per plugin](#custom-css-per-plugin)
+  - [Plugin discovery & reload](#plugin-discovery--reload)
+- [Combined theme + plugin demo](#combined-theme--plugin-demo)
+- [API reference](#api-reference)
+- [Troubleshooting](#troubleshooting)
+
+---
+
+## Themes
+
+Themes are YAML files stored in `~/.hermes/dashboard-themes/`. The file name doesn't matter (the theme's `name:` field is what the system uses), but convention is `<name>.yaml`. Every field is optional — missing keys fall back to the built-in `default` theme, so a theme can be as small as one color.
+
+### Quick start — your first theme
+
+```bash
+mkdir -p ~/.hermes/dashboard-themes
+```
+
+```yaml
+# ~/.hermes/dashboard-themes/neon.yaml
+name: neon
+label: Neon
+description: Pure magenta on black
+
+palette:
+  background: "#000000"
+  midground: "#ff00ff"
+```
+
+Refresh the dashboard. Click the palette icon in the header and pick **Neon**. The background goes black, text and accents go magenta, and every derived color (card, border, muted, ring, etc.) is recomputed from that 2-color triplet via `color-mix()` in CSS.
+
+That's the whole onboarding: one file, two colors. Everything below is optional refinement.
+
+### Palette, typography, layout
+
+These three blocks are the heart of a theme. Each is independent — override one, leave the others.
+
+#### Palette (3-layer)
+
+The palette is a triplet of color layers plus a warm-glow vignette color and a noise-grain multiplier. The dashboard's design-system cascade derives every shadcn-compatible token (card, popover, muted, border, primary, destructive, ring, etc.) from this triplet via CSS `color-mix()`. Overriding three colors cascades into the whole UI.
+
+| Key | Description |
+|-----|-------------|
+| `palette.background` | Deepest canvas color — typically near-black. Drives the page background and card fill. |
+| `palette.midground` | Primary text and accent. Most UI chrome reads this (foreground text, button outlines, focus rings). |
+| `palette.foreground` | Top-layer highlight. The default theme sets this to white at alpha 0 (invisible); themes that want a bright accent on top can raise its alpha. |
+| `palette.warmGlow` | `rgba(...)` string used as the vignette color by `<Backdrop />`. |
+| `palette.noiseOpacity` | 0–1.2 multiplier on the grain overlay. Lower = softer, higher = grittier. |
+
+Each layer accepts either `{hex: "#RRGGBB", alpha: 0.0–1.0}` or a bare hex string (alpha defaults to 1.0).
+
+```yaml
+palette:
+  background:
+    hex: "#05091a"
+    alpha: 1.0
+  midground: "#d8f0ff"          # bare hex, alpha = 1.0
+  foreground:
+    hex: "#ffffff"
+    alpha: 0                    # invisible top layer
+  warmGlow: "rgba(255, 199, 55, 0.24)"
+  noiseOpacity: 0.7
+```
+
+#### Typography
+
+| Key | Type | Description |
+|-----|------|-------------|
+| `fontSans` | string | CSS font-family stack for body copy (applied to `html`, `body`). |
+| `fontMono` | string | CSS font-family stack for code blocks, `<code>`, `.font-mono` utilities. |
+| `fontDisplay` | string | Optional heading/display stack. Falls back to `fontSans`. |
+| `fontUrl` | string | Optional external stylesheet URL. Injected as `<link rel="stylesheet">` in `<head>` on theme switch. Same URL is never injected twice. Works with Google Fonts, Bunny Fonts, self-hosted `@font-face` sheets — anything linkable. |
+| `baseSize` | string | Root font size — controls the rem scale. E.g. `"14px"`, `"16px"`. |
+| `lineHeight` | string | Default line-height. E.g. `"1.5"`, `"1.65"`. |
+| `letterSpacing` | string | Default letter-spacing. E.g. `"0"`, `"0.01em"`, `"-0.01em"`. |
+
+```yaml
+typography:
+  fontSans: '"Orbitron", "Eurostile", "Impact", sans-serif'
+  fontMono: '"Share Tech Mono", ui-monospace, monospace'
+  fontDisplay: '"Orbitron", "Eurostile", sans-serif'
+  fontUrl: "https://fonts.googleapis.com/css2?family=Orbitron:wght@400;500;600;700&family=Share+Tech+Mono&display=swap"
+  baseSize: "14px"
+  lineHeight: "1.5"
+  letterSpacing: "0.04em"
+```
+
+#### Layout
+
+| Key | Values | Description |
+|-----|--------|-------------|
+| `radius` | any CSS length (`"0"`, `"0.25rem"`, `"0.5rem"`, `"1rem"`, ...) | Corner-radius token. Maps to `--radius` and cascades into `--radius-sm/md/lg/xl` — every rounded element shifts together. |
+| `density` | `compact` \| `comfortable` \| `spacious` | Spacing multiplier applied as the `--spacing-mul` CSS var. `compact = 0.85×`, `comfortable = 1.0×` (default), `spacious = 1.2×`. Scales Tailwind's base spacing, so padding, gap, and space-between utilities all shift proportionally. |
+
+```yaml
+layout:
+  radius: "0"
+  density: compact
+```
+
+### Layout variants
+
+`layoutVariant` picks the overall shell layout. Defaults to `"standard"` when absent.
+
+| Variant | Behaviour |
+|---------|-----------|
+| `standard` | Single column, 1600px max-width (default). |
+| `cockpit` | Left sidebar rail (260px) + main content. Populated by plugins via the `sidebar` slot — see [Shell slots](#shell-slots). Without a plugin the rail shows a placeholder. |
+| `tiled` | Drops the max-width clamp so pages can use the full viewport width. |
+
+```yaml
+layoutVariant: cockpit
+```
+
+The current variant is exposed as `document.documentElement.dataset.layoutVariant`, so raw CSS in `customCSS` can target it via `:root[data-layout-variant="cockpit"] ...`.
+
+### Theme assets (images as CSS vars)
+
+Ship artwork URLs with a theme. Each named slot becomes a CSS var (`--theme-asset-<name>`) that the built-in shell and any plugin can read. The `bg` slot is automatically wired into the backdrop; other slots are plugin-facing.
+
+```yaml
+assets:
+  bg: "https://example.com/hero-bg.jpg"           # auto-wired into <Backdrop />
+  hero: "/my-images/strike-freedom.png"           # for plugin sidebars
+  crest: "/my-images/crest.svg"                   # for header-left plugins
+  logo: "/my-images/logo.png"
+  sidebar: "/my-images/rail.png"
+  header: "/my-images/header-art.png"
+  custom:
+    scanLines: "/my-images/scanlines.png"         # → --theme-asset-custom-scanLines
+```
+
+Values accept:
+
+- Bare URLs — wrapped in `url(...)` automatically.
+- Pre-wrapped `url(...)`, `linear-gradient(...)`, `radial-gradient(...)` expressions — used as-is.
+- `"none"` — explicit opt-out.
+
+Every asset is also emitted as `--theme-asset-<name>-raw` (the unwrapped URL), in case a plugin needs to pass it to `<img src>` instead of `background-image`.
+
+Plugins read these with plain CSS or JS:
+
+```javascript
+// In a plugin slot
+const hero = getComputedStyle(document.documentElement)
+  .getPropertyValue("--theme-asset-hero").trim();
+```
+
+### Component chrome overrides
+
+`componentStyles` restyles individual shell components without writing CSS selectors. Each bucket's entries become CSS vars (`--component-<bucket>-<kebab-property>`) that the shell's shared components read. So `card:` overrides apply to every `<Card>`, `header:` to the app bar, etc.
+
+```yaml
+componentStyles:
+  card:
+    clipPath: "polygon(12px 0, 100% 0, 100% calc(100% - 12px), calc(100% - 12px) 100%, 0 100%, 0 12px)"
+    background: "linear-gradient(180deg, rgba(10, 22, 52, 0.85), rgba(5, 9, 26, 0.92))"
+    boxShadow: "inset 0 0 0 1px rgba(64, 200, 255, 0.28)"
+  header:
+    background: "linear-gradient(180deg, rgba(16, 32, 72, 0.95), rgba(5, 9, 26, 0.9))"
+  tab:
+    clipPath: "polygon(6px 0, 100% 0, calc(100% - 6px) 100%, 0 100%)"
+  sidebar: {}
+  backdrop: {}
+  footer: {}
+  progress: {}
+  badge: {}
+  page: {}
+```
+
+Supported buckets: `card`, `header`, `footer`, `sidebar`, `tab`, `progress`, `badge`, `backdrop`, `page`.
+
+Property names use camelCase (`clipPath`) and are emitted as kebab (`clip-path`). Values are plain CSS strings — anything CSS accepts (`clip-path`, `border-image`, `background`, `box-shadow`, `animation`, ...).
+
+### Color overrides
+
+Most themes won't need this — the 3-layer palette derives every shadcn token. Use `colorOverrides` when you want a specific accent the derivation won't produce (a softer destructive red for a pastel theme, a specific success green for a brand).
+
+```yaml
+colorOverrides:
+  primary: "#ffce3a"
+  primaryForeground: "#05091a"
+  accent: "#3fd3ff"
+  ring: "#3fd3ff"
+  destructive: "#ff3a5e"
+  border: "rgba(64, 200, 255, 0.28)"
+```
+
+Supported keys: `card`, `cardForeground`, `popover`, `popoverForeground`, `primary`, `primaryForeground`, `secondary`, `secondaryForeground`, `muted`, `mutedForeground`, `accent`, `accentForeground`, `destructive`, `destructiveForeground`, `success`, `warning`, `border`, `input`, `ring`.
+
+Each key maps 1:1 to the `--color-<kebab>` CSS var (e.g. `primaryForeground` → `--color-primary-foreground`). Any key set here wins over the palette cascade for the active theme only — switching to another theme clears the overrides.
+
+### Raw `customCSS`
+
+For selector-level chrome that `componentStyles` can't express — pseudo-elements, animations, media queries, theme-scoped overrides — drop raw CSS into `customCSS`:
+
+```yaml
+customCSS: |
+  /* Scanline overlay — only visible when cockpit variant is active. */
+  :root[data-layout-variant="cockpit"] body::before {
+    content: "";
+    position: fixed;
+    inset: 0;
+    pointer-events: none;
+    z-index: 100;
+    background: repeating-linear-gradient(to bottom,
+      transparent 0px, transparent 2px,
+      rgba(64, 200, 255, 0.035) 3px, rgba(64, 200, 255, 0.035) 4px);
+    mix-blend-mode: screen;
+  }
+```
+
+The CSS is injected as a single scoped `<style data-hermes-theme-css>` tag on theme apply and cleaned up on theme switch. **Capped at 32 KiB per theme.**
+
+### Built-in themes
+
+Each built-in ships its own palette, typography, and layout — switching produces visible changes beyond color alone.
+
+| Theme | Palette | Typography | Layout |
+|-------|---------|------------|--------|
+| **Hermes Teal** (`default`) | Dark teal + cream | System stack, 15px | 0.5rem radius, comfortable |
+| **Midnight** (`midnight`) | Deep blue-violet | Inter + JetBrains Mono, 14px | 0.75rem radius, comfortable |
+| **Ember** (`ember`) | Warm crimson + bronze | Spectral (serif) + IBM Plex Mono, 15px | 0.25rem radius, comfortable |
+| **Mono** (`mono`) | Grayscale | IBM Plex Sans + IBM Plex Mono, 13px | 0 radius, compact |
+| **Cyberpunk** (`cyberpunk`) | Neon green on black | Share Tech Mono everywhere, 14px | 0 radius, compact |
+| **Rosé** (`rose`) | Pink + ivory | Fraunces (serif) + DM Mono, 16px | 1rem radius, spacious |
+
+Themes that reference Google Fonts (all except Hermes Teal) load the stylesheet on demand — the first time you switch to them a `<link>` tag is injected into `<head>`.
+
+### Full theme YAML reference
+
+Every knob in one file — copy and trim what you don't need:
+
+```yaml
+# ~/.hermes/dashboard-themes/ocean.yaml
+name: ocean
+label: Ocean Deep
+description: Deep sea blues with coral accents
+
+# 3-layer palette (accepts {hex, alpha} or bare hex)
+palette:
+  background:
+    hex: "#0a1628"
+    alpha: 1.0
+  midground:
+    hex: "#a8d0ff"
+    alpha: 1.0
+  foreground:
+    hex: "#ffffff"
+    alpha: 0.0
+  warmGlow: "rgba(255, 107, 107, 0.35)"
+  noiseOpacity: 0.7
+
+typography:
+  fontSans: "Poppins, system-ui, sans-serif"
+  fontMono: "Fira Code, ui-monospace, monospace"
+  fontDisplay: "Poppins, system-ui, sans-serif"   # optional
+  fontUrl: "https://fonts.googleapis.com/css2?family=Poppins:wght@400;500;600&family=Fira+Code:wght@400;500&display=swap"
+  baseSize: "15px"
+  lineHeight: "1.6"
+  letterSpacing: "-0.003em"
+
+layout:
+  radius: "0.75rem"
+  density: comfortable
+
+layoutVariant: standard        # standard | cockpit | tiled
+
+assets:
+  bg: "https://example.com/ocean-bg.jpg"
+  hero: "/my-images/kraken.png"
+  crest: "/my-images/anchor.svg"
+  logo: "/my-images/logo.png"
+  custom:
+    pattern: "/my-images/waves.svg"
+
+componentStyles:
+  card:
+    boxShadow: "inset 0 0 0 1px rgba(168, 208, 255, 0.18)"
+  header:
+    background: "linear-gradient(180deg, rgba(10, 22, 40, 0.95), rgba(5, 9, 26, 0.9))"
+
+colorOverrides:
+  destructive: "#ff6b6b"
+  ring: "#ff6b6b"
+
+customCSS: |
+  /* Any additional selector-level tweaks */
+```
+
+Refresh the dashboard after creating the file. Switch themes live from the header bar — click the palette icon. Selection persists to `config.yaml` under `dashboard.theme` and is restored on reload.
+
+---
+
+## Plugins
+
+A dashboard plugin is a directory with a `manifest.json`, a pre-built JS bundle, and optionally a CSS file and a Python file with FastAPI routes. Plugins live next to other Hermes plugins in `~/.hermes/plugins/<name>/` — the dashboard extension is a `dashboard/` subfolder inside that plugin directory, so one plugin can extend both the CLI/gateway and the dashboard from a single install.
+
+Plugins don't bundle React or UI components. They use the **Plugin SDK** exposed on `window.__HERMES_PLUGIN_SDK__`. This keeps plugin bundles tiny (typically a few KB) and avoids version conflicts.
+
+### Quick start — your first plugin
+
+Create the directory structure:
+
+```bash
+mkdir -p ~/.hermes/plugins/my-plugin/dashboard/dist
+```
+
+Write the manifest:
+
+```json
+// ~/.hermes/plugins/my-plugin/dashboard/manifest.json
+{
+  "name": "my-plugin",
+  "label": "My Plugin",
+  "icon": "Sparkles",
+  "version": "1.0.0",
+  "tab": {
+    "path": "/my-plugin",
+    "position": "after:skills"
+  },
+  "entry": "dist/index.js"
+}
+```
+
+Write the JS bundle (a plain IIFE — no build step needed):
+
+```javascript
+// ~/.hermes/plugins/my-plugin/dashboard/dist/index.js
+(function () {
+  "use strict";
+
+  const SDK = window.__HERMES_PLUGIN_SDK__;
+  const { React } = SDK;
+  const { Card, CardHeader, CardTitle, CardContent } = SDK.components;
+
+  function MyPage() {
+    return React.createElement(Card, null,
+      React.createElement(CardHeader, null,
+        React.createElement(CardTitle, null, "My Plugin"),
+      ),
+      React.createElement(CardContent, null,
+        React.createElement("p", { className: "text-sm text-muted-foreground" },
+          "Hello from my custom dashboard tab.",
+        ),
+      ),
+    );
+  }
+
+  window.__HERMES_PLUGINS__.register("my-plugin", MyPage);
+})();
+```
+
+Refresh the dashboard — your tab appears in the nav bar, after **Skills**.
+
+:::tip Skip React.createElement
+If you prefer JSX, use any bundler (esbuild, Vite, rollup) with React as an external and IIFE output. The only hard requirement is that the final file is a single JS file loadable via `<script>`. React is never bundled; it comes from `SDK.React`.
+:::
+
+### Directory layout
+
+```
+~/.hermes/plugins/my-plugin/
+├── plugin.yaml              # optional — existing CLI/gateway plugin manifest
+├── __init__.py              # optional — existing CLI/gateway hooks
+└── dashboard/               # dashboard extension
+    ├── manifest.json        # required — tab config, icon, entry point
+    ├── dist/
+    │   ├── index.js         # required — pre-built JS bundle (IIFE)
+    │   └── style.css        # optional — custom CSS
+    └── plugin_api.py        # optional — backend API routes (FastAPI)
+```
+
+A single plugin directory can carry three orthogonal extensions:
+
+- `plugin.yaml` + `__init__.py` — CLI/gateway plugin ([see plugins page](./plugins)).
+- `dashboard/manifest.json` + `dashboard/dist/index.js` — dashboard UI plugin.
+- `dashboard/plugin_api.py` — dashboard backend routes.
+
+None of them are required; include only the layers you need.
+
+### Manifest reference
+
+```json
+{
+  "name": "my-plugin",
+  "label": "My Plugin",
+  "description": "What this plugin does",
+  "icon": "Sparkles",
+  "version": "1.0.0",
+  "tab": {
+    "path": "/my-plugin",
+    "position": "after:skills",
+    "override": "/",
+    "hidden": false
+  },
+  "slots": ["sidebar", "header-left"],
+  "entry": "dist/index.js",
+  "css": "dist/style.css",
+  "api": "plugin_api.py"
+}
+```
+
+| Field | Required | Description |
+|-------|----------|-------------|
+| `name` | Yes | Unique plugin identifier. Lowercase, hyphens ok. Used in URLs and registration. |
+| `label` | Yes | Display name shown in the nav tab. |
+| `description` | No | Short description (shown in dashboard admin surfaces). |
+| `icon` | No | Lucide icon name. Defaults to `Puzzle`. Unknown names fall back to `Puzzle`. |
+| `version` | No | Semver string. Defaults to `0.0.0`. |
+| `tab.path` | Yes | URL path for the tab (e.g. `/my-plugin`). |
+| `tab.position` | No | Where to insert the tab. `"end"` (default), `"after:<path>"`, or `"before:<path>"` — value after the colon is the **path segment** of the target tab (no leading slash). Examples: `"after:skills"`, `"before:config"`. |
+| `tab.override` | No | Set to a built-in route path (`"/"`, `"/sessions"`, `"/config"`, ...) to **replace** that page instead of adding a new tab. See [Replacing built-in pages](#replacing-built-in-pages-taboverride). |
+| `tab.hidden` | No | When true, register the component and any slots without adding a tab to the nav. Used by slot-only plugins. See [Slot-only plugins](#slot-only-plugins-tabhidden). |
+| `slots` | No | Named shell slots this plugin populates. **Documentation aid only** — actual registration happens from the JS bundle via `registerSlot()`. Listing slots here makes discovery surfaces more informative. |
+| `entry` | Yes | Path to the JS bundle relative to `dashboard/`. Defaults to `dist/index.js`. |
+| `css` | No | Path to a CSS file to inject as a `<link>` tag. |
+| `api` | No | Path to a Python file with FastAPI routes. Mounted at `/api/plugins/<name>/`. |
+
+#### Available icons
+
+Plugins use Lucide icon names. The dashboard maps these by name — unknown names silently fall back to `Puzzle`.
+
+Currently mapped: `Activity`, `BarChart3`, `Clock`, `Code`, `Database`, `Eye`, `FileText`, `Globe`, `Heart`, `KeyRound`, `MessageSquare`, `Package`, `Puzzle`, `Settings`, `Shield`, `Sparkles`, `Star`, `Terminal`, `Wrench`, `Zap`.
+
+Need a different icon? Open a PR to `web/src/App.tsx`'s `ICON_MAP` — pure additive change.
+
+### The Plugin SDK
+
+Everything a plugin needs is on `window.__HERMES_PLUGIN_SDK__`. Plugins should never import React directly.
+
+```javascript
+const SDK = window.__HERMES_PLUGIN_SDK__;
+
+// React + hooks
+SDK.React                    // the React instance
+SDK.hooks.useState
+SDK.hooks.useEffect
+SDK.hooks.useCallback
+SDK.hooks.useMemo
+SDK.hooks.useRef
+SDK.hooks.useContext
+SDK.hooks.createContext
+
+// UI components (shadcn/ui primitives)
+SDK.components.Card
+SDK.components.CardHeader
+SDK.components.CardTitle
+SDK.components.CardContent
+SDK.components.Badge
+SDK.components.Button
+SDK.components.Input
+SDK.components.Label
+SDK.components.Select
+SDK.components.SelectOption
+SDK.components.Separator
+SDK.components.Tabs
+SDK.components.TabsList
+SDK.components.TabsTrigger
+SDK.components.PluginSlot    // render a named slot (useful for nested plugin UIs)
+
+// Hermes API client + raw fetcher
+SDK.api                      // typed client — getStatus, getSessions, getConfig, ...
+SDK.fetchJSON                // raw fetch for custom endpoints (plugin-registered routes)
+
+// Utilities
+SDK.utils.cn                 // Tailwind class merger (clsx + twMerge)
+SDK.utils.timeAgo            // "5m ago" from unix timestamp
+SDK.utils.isoTimeAgo         // "5m ago" from ISO string
+
+// Hooks
+SDK.useI18n                  // i18n hook for multi-language plugins
+```
+
+#### Calling your plugin's backend
+
+```javascript
+SDK.fetchJSON("/api/plugins/my-plugin/data")
+  .then((data) => console.log(data))
+  .catch((err) => console.error("API call failed:", err));
+```
+
+`fetchJSON` injects the session auth token, surfaces errors as thrown exceptions, and parses JSON automatically.
+
+#### Calling built-in Hermes endpoints
+
+```javascript
+// Agent status
+SDK.api.getStatus().then((s) => console.log("Version:", s.version));
+
+// Recent sessions
+SDK.api.getSessions(10).then((resp) => console.log(resp.sessions.length));
+```
+
+See [Web Dashboard → REST API](./web-dashboard#rest-api) for the full list.
+
+### Shell slots
+
+Slots let a plugin inject components into named locations of the app shell — the cockpit sidebar, the header, the footer, an overlay layer — without claiming a whole tab. Multiple plugins can populate the same slot; they render stacked in registration order.
+
+Register from inside the plugin bundle:
+
+```javascript
+window.__HERMES_PLUGINS__.registerSlot("my-plugin", "sidebar", MySidebar);
+window.__HERMES_PLUGINS__.registerSlot("my-plugin", "header-left", MyCrest);
+```
+
+#### Slot catalogue
+
+**Shell-wide slots** (render anywhere in the app chrome):
+
+| Slot | Location |
+|------|----------|
+| `backdrop` | Inside the `<Backdrop />` layer stack, above the noise layer. |
+| `header-left` | Before the Hermes brand in the top bar. |
+| `header-right` | Before the theme/language switchers in the top bar. |
+| `header-banner` | Full-width strip below the nav. |
+| `sidebar` | Cockpit sidebar rail — **only rendered when `layoutVariant === "cockpit"`**. |
+| `pre-main` | Above the route outlet (inside `<main>`). |
+| `post-main` | Below the route outlet (inside `<main>`). |
+| `footer-left` | Footer cell content (replaces default). |
+| `footer-right` | Footer cell content (replaces default). |
+| `overlay` | Fixed-position layer above everything else. Useful for chrome (scanlines, vignettes) `customCSS` can't achieve alone. |
+
+**Page-scoped slots** (render only on the named built-in page — use these to inject widgets, cards, or toolbars into an existing page without overriding the whole route):
+
+| Slot | Where it renders |
+|------|------------------|
+| `sessions:top` / `sessions:bottom` | Top / bottom of the `/sessions` page. |
+| `analytics:top` / `analytics:bottom` | Top / bottom of the `/analytics` page. |
+| `logs:top` / `logs:bottom` | Top (above filter toolbar) / bottom (below log viewer) of `/logs`. |
+| `cron:top` / `cron:bottom` | Top / bottom of the `/cron` page. |
+| `skills:top` / `skills:bottom` | Top / bottom of the `/skills` page. |
+| `config:top` / `config:bottom` | Top / bottom of the `/config` page. |
+| `env:top` / `env:bottom` | Top / bottom of the `/env` (Keys) page. |
+| `docs:top` / `docs:bottom` | Top (above the iframe) / bottom of `/docs`. |
+| `chat:top` / `chat:bottom` | Top / bottom of `/chat` (only active when embedded chat is enabled). |
+
+Example — add a banner card to the top of the Sessions page:
+
+```javascript
+function PinnedSessionsBanner() {
+  return React.createElement(Card, null,
+    React.createElement(CardContent, { className: "py-2 text-xs" },
+      "Pinned note injected by my-plugin"),
+  );
+}
+
+window.__HERMES_PLUGINS__.registerSlot("my-plugin", "sessions:top", PinnedSessionsBanner);
+```
+
+Combine page-scoped slots with `tab.hidden: true` if your plugin only augments existing pages and doesn't need a sidebar tab of its own.
+
+The shell only renders `<PluginSlot name="..." />` for the slots above. Additional names are accepted by the registry for nested plugin UIs — a plugin can expose its own slots via `SDK.components.PluginSlot`.
+
+#### Re-registration and HMR
+
+If the same `(plugin, slot)` pair is registered twice, the later call replaces the earlier one — this matches how React HMR expects plugin re-mounts to behave.
+
+### Replacing built-in pages (`tab.override`)
+
+Setting `tab.override` to a built-in route path makes the plugin's component replace that page instead of adding a new tab. Useful when a theme wants a custom home page (`/`) but wants to keep the rest of the dashboard intact.
+
+```json
+{
+  "name": "my-home",
+  "label": "Home",
+  "tab": {
+    "path": "/my-home",
+    "override": "/",
+    "position": "end"
+  },
+  "entry": "dist/index.js"
+}
+```
+
+With `override` set:
+
+- The original page component at `/` is removed from the router.
+- Your plugin renders at `/` instead.
+- No nav tab is added for `tab.path` (the override is the point).
+
+Only one plugin can override a given path. If two plugins claim the same override, the first wins and the second is ignored with a dev-mode warning.
+
+If you only need to add a card or toolbar to an existing page without taking it over, use [page-scoped slots](#augmenting-built-in-pages-page-scoped-slots) instead.
+
+### Augmenting built-in pages (page-scoped slots)
+
+Full replacement via `tab.override` is heavy — your plugin now owns the entire page, including any future updates we ship to it. Most of the time you just want to add a banner, card, or toolbar to an existing page. That's what **page-scoped slots** are for.
+
+Every built-in page exposes `<page>:top` and `<page>:bottom` slots rendered at the top and bottom of its content area. Your plugin populates one by calling `registerSlot()` — the built-in page keeps working normally, and your component renders alongside it.
+
+Available slots: `sessions:*`, `analytics:*`, `logs:*`, `cron:*`, `skills:*`, `config:*`, `env:*`, `docs:*`, `chat:*` (each with `:top` and `:bottom`). See the full catalogue in [Shell slots → Slot catalogue](#slot-catalogue).
+
+Minimal example — pin a banner to the top of the Sessions page:
+
+```json
+// ~/.hermes/plugins/session-notes/dashboard/manifest.json
+{
+  "name": "session-notes",
+  "label": "Session Notes",
+  "tab": { "path": "/session-notes", "hidden": true },
+  "slots": ["sessions:top"],
+  "entry": "dist/index.js"
+}
+```
+
+```javascript
+// ~/.hermes/plugins/session-notes/dashboard/dist/index.js
+(function () {
+  const SDK = window.__HERMES_PLUGIN_SDK__;
+  const { React } = SDK;
+  const { Card, CardContent } = SDK.components;
+
+  function Banner() {
+    return React.createElement(Card, null,
+      React.createElement(CardContent, { className: "py-2 text-xs" },
+        "Remember to label important sessions before archiving."),
+    );
+  }
+
+  // Placeholder for the hidden tab.
+  window.__HERMES_PLUGINS__.register("session-notes", function () { return null; });
+
+  // The real work.
+  window.__HERMES_PLUGINS__.registerSlot("session-notes", "sessions:top", Banner);
+})();
+```
+
+Key points:
+
+- `tab.hidden: true` keeps the plugin out of the sidebar — it has no standalone page.
+- The `slots` manifest field is documentation only. The actual binding happens in the JS bundle via `registerSlot()`.
+- Multiple plugins can claim the same page-scoped slot. They render stacked in registration order.
+- Zero footprint when no plugin registers: the built-in page renders exactly as before.
+
+The bundled `example-dashboard` plugin ships a live demo that injects a banner into `sessions:top` — install it to see the pattern end-to-end.
+
+### Slot-only plugins (`tab.hidden`)
+
+When `tab.hidden: true`, the plugin registers its component (for direct URL visits) and any slots, but never adds a tab to the navigation. Used by plugins that only exist to inject into slots — a header crest, a sidebar HUD, an overlay.
+
+```json
+{
+  "name": "header-crest",
+  "label": "Header Crest",
+  "tab": {
+    "path": "/header-crest",
+    "position": "end",
+    "hidden": true
+  },
+  "slots": ["header-left"],
+  "entry": "dist/index.js"
+}
+```
+
+The bundle still calls `register()` with a placeholder component (good practice in case someone hits the URL directly) and then `registerSlot()` to do the real work.
+
+### Backend API routes
+
+Plugins can register FastAPI routes by setting `api` in the manifest. Create the file and export a `router`:
+
+```python
+# ~/.hermes/plugins/my-plugin/dashboard/plugin_api.py
+from fastapi import APIRouter
+
+router = APIRouter()
+
+@router.get("/data")
+async def get_data():
+    return {"items": ["one", "two", "three"]}
+
+@router.post("/action")
+async def do_action(body: dict):
+    return {"ok": True, "received": body}
+```
+
+Routes are mounted under `/api/plugins/<name>/`, so the above becomes:
+
+- `GET  /api/plugins/my-plugin/data`
+- `POST /api/plugins/my-plugin/action`
+
+Plugin API routes bypass session-token authentication since the dashboard server binds to localhost by default. **Don't expose the dashboard on a public interface with `--host 0.0.0.0` if you run untrusted plugins** — their routes become reachable too.
+
+#### Accessing Hermes internals
+
+Backend routes run inside the dashboard process, so they can import from the hermes-agent codebase directly:
+
+```python
+from fastapi import APIRouter
+from hermes_state import SessionDB
+from hermes_cli.config import load_config
+
+router = APIRouter()
+
+@router.get("/session-count")
+async def session_count():
+    db = SessionDB()
+    try:
+        count = len(db.list_sessions(limit=9999))
+        return {"count": count}
+    finally:
+        db.close()
+
+@router.get("/config-snapshot")
+async def config_snapshot():
+    cfg = load_config()
+    return {"model": cfg.get("model", {})}
+```
+
+### Custom CSS per plugin
+
+If your plugin needs styles beyond Tailwind classes and inline `style=`, add a CSS file and reference it in the manifest:
+
+```json
+{
+  "css": "dist/style.css"
+}
+```
+
+The file is injected as a `<link>` tag on plugin load. Use specific class names to avoid conflicts with the dashboard's styles, and reference the dashboard's CSS vars to stay theme-aware:
+
+```css
+/* dist/style.css */
+.my-plugin-chart {
+  border: 1px solid var(--color-border);
+  background: var(--color-card);
+  color: var(--color-card-foreground);
+  padding: 1rem;
+}
+.my-plugin-chart:hover {
+  border-color: var(--color-ring);
+}
+```
+
+The dashboard exposes every shadcn token as `--color-*` plus theme extras (`--theme-asset-*`, `--component-<bucket>-*`, `--radius`, `--spacing-mul`). Reference those and your plugin automatically reskins with the active theme.
+
+### Plugin discovery & reload
+
+The dashboard scans three directories for `dashboard/manifest.json`:
+
+| Priority | Directory | Source label |
+|----------|-----------|--------------|
+| 1 (wins on conflict) | `~/.hermes/plugins/<name>/dashboard/` | `user` |
+| 2 | `<repo>/plugins/memory/<name>/dashboard/` | `bundled` |
+| 2 | `<repo>/plugins/<name>/dashboard/` | `bundled` |
+| 3 | `./.hermes/plugins/<name>/dashboard/` | `project` — only when `HERMES_ENABLE_PROJECT_PLUGINS` is set |
+
+Discovery results are cached per dashboard process. After adding a new plugin, either:
+
+```bash
+# Force a rescan without restart
+curl http://127.0.0.1:9119/api/dashboard/plugins/rescan
+```
+
+…or restart `hermes dashboard`.
+
+#### Plugin load lifecycle
+
+1. Dashboard loads. `main.tsx` exposes the SDK on `window.__HERMES_PLUGIN_SDK__` and the registry on `window.__HERMES_PLUGINS__`.
+2. `App.tsx` calls `usePlugins()` → fetches `GET /api/dashboard/plugins`.
+3. For each manifest: CSS `<link>` is injected (if declared), then a `<script>` tag loads the JS bundle.
+4. The plugin's IIFE runs and calls `window.__HERMES_PLUGINS__.register(name, Component)` — and optionally `.registerSlot(name, slot, Component)` for each slot.
+5. The dashboard resolves the registered component against the manifest, adds the tab to navigation (unless `hidden`), and mounts the component as a route.
+
+Plugins have up to **2 seconds** after their script loads to call `register()`. After that the dashboard stops waiting and finishes initial render. If a plugin later registers, it still appears — the nav is reactive.
+
+If a plugin's script fails to load (404, syntax error, exception during IIFE), the dashboard logs a warning to the browser console and continues without it.
+
+---
+
+## Combined theme + plugin demo
+
+The repo ships `plugins/strike-freedom-cockpit/` as a complete reskin demo. It pairs a theme YAML with a slot-only plugin to produce a cockpit-style HUD without forking the dashboard.
+
+**What it demonstrates:**
+
+- A full theme using palette, typography, `fontUrl`, `layoutVariant: cockpit`, `assets`, `componentStyles` (notched card corners, gradient backgrounds), `colorOverrides`, and `customCSS` (scanline overlay).
+- A slot-only plugin (`tab.hidden: true`) that registers into three slots:
+  - `sidebar` — an MS-STATUS panel with live telemetry bars driven by `SDK.api.getStatus()`.
+  - `header-left` — a faction crest that reads `--theme-asset-crest` from the active theme.
+  - `footer-right` — a custom tagline replacing the default org line.
+- The plugin reads theme-supplied artwork via CSS vars, so swapping themes changes the hero/crest without plugin code changes.
+
+**Install:**
+
+```bash
+# Theme
+cp plugins/strike-freedom-cockpit/theme/strike-freedom.yaml \
+   ~/.hermes/dashboard-themes/
+
+# Plugin
+cp -r plugins/strike-freedom-cockpit ~/.hermes/plugins/
+```
+
+Open the dashboard, pick **Strike Freedom** from the theme switcher. The cockpit sidebar appears, the crest shows in the header, the tagline replaces the footer. Switch back to **Hermes Teal** and the plugin remains installed but invisible (the `sidebar` slot only renders under the `cockpit` layout variant).
+
+Read the plugin source (`plugins/strike-freedom-cockpit/dashboard/dist/index.js`) to see how it reads CSS vars, guards against older dashboards without slot support, and registers three slots from one bundle.
+
+---
+
+## API reference
+
+### Theme endpoints
+
+| Endpoint | Method | Description |
+|----------|--------|-------------|
+| `/api/dashboard/themes` | GET | List available themes + active name. Built-ins return `{name, label, description}`; user themes also include a `definition` field with the full normalised theme object. |
+| `/api/dashboard/theme` | PUT | Set active theme. Body: `{"name": "midnight"}`. Persists to `config.yaml` under `dashboard.theme`. |
+
+### Plugin endpoints
+
+| Endpoint | Method | Description |
+|----------|--------|-------------|
+| `/api/dashboard/plugins` | GET | List discovered plugins (with manifests, minus internal fields). |
+| `/api/dashboard/plugins/rescan` | GET | Force re-scan the plugin directories without restarting. |
+| `/dashboard-plugins/<name>/<path>` | GET | Serve static assets from a plugin's `dashboard/` directory. Path traversal is blocked. |
+| `/api/plugins/<name>/*` | * | Plugin-registered backend routes. |
+
+### SDK on `window`
+
+| Global | Type | Provider |
+|--------|------|----------|
+| `window.__HERMES_PLUGIN_SDK__` | object | `registry.ts` — React, hooks, UI components, API client, utils. |
+| `window.__HERMES_PLUGINS__.register(name, Component)` | function | Register a plugin's main component. |
+| `window.__HERMES_PLUGINS__.registerSlot(name, slot, Component)` | function | Register into a named shell slot. |
+
+---
+
+## Troubleshooting
+
+**My theme doesn't appear in the picker.**
+Check that the file is in `~/.hermes/dashboard-themes/` and ends in `.yaml` or `.yml`. Refresh the page. Run `curl http://127.0.0.1:9119/api/dashboard/themes` — your theme should be in the response. If the YAML has a parse error, the dashboard logs to `errors.log` under `~/.hermes/logs/`.
+
+**My plugin's tab doesn't show up.**
+1. Check the manifest is at `~/.hermes/plugins/<name>/dashboard/manifest.json` (note the `dashboard/` subdirectory).
+2. `curl http://127.0.0.1:9119/api/dashboard/plugins/rescan` to force re-discovery.
+3. Open browser dev tools → Network — confirm `manifest.json`, `index.js`, and any CSS loaded without 404s.
+4. Open browser dev tools → Console — look for errors during the IIFE or `window.__HERMES_PLUGINS__ is undefined` (indicates the SDK didn't initialize, usually a React render crash earlier).
+5. Verify your bundle calls `window.__HERMES_PLUGINS__.register(...)` with the **same name** as `manifest.json:name`.
+
+**Slot-registered components don't render.**
+The `sidebar` slot only renders when the active theme has `layoutVariant: cockpit`. Other slots always render. If you're registering into a slot with no hits, add `console.log` inside `registerSlot` to confirm the plugin bundle ran at all.
+
+**Plugin backend routes return 404.**
+1. Confirm the manifest has `"api": "plugin_api.py"` pointing to an existing file inside `dashboard/`.
+2. Restart `hermes dashboard` — plugin API routes are mounted once at startup, **not** on rescan.
+3. Check that `plugin_api.py` exports a module-level `router = APIRouter()`. Other export names are not picked up.
+4. Tail `~/.hermes/logs/errors.log` for `Failed to load plugin <name> API routes` — import errors are logged there.
+
+**Theme change drops my color overrides.**
+`colorOverrides` are scoped to the active theme and cleared on theme switch — that's by design. If you want overrides that persist, put them in your theme's YAML, not in the live switcher.
+
+**Theme customCSS gets truncated.**
+The `customCSS` block is capped at 32 KiB per theme. Split large stylesheets across multiple themes, or switch to a plugin that injects a full stylesheet via its `css` field (no size cap).
+
+**I want to ship a plugin on PyPI.**
+Dashboard plugins are installed by directory layout, not by pip entry point. The cleanest distribution path today is a git repo the user clones into `~/.hermes/plugins/`. A pip-based installer for dashboard plugins is not currently wired up.
@@ -321,274 +321,27 @@ The frontend is built with React 19, TypeScript, Tailwind CSS v4, and shadcn/ui-

 When you run `hermes update`, the web frontend is automatically rebuilt if `npm` is available. This keeps the dashboard in sync with code updates. If `npm` isn't installed, the update skips the frontend build and `hermes dashboard` will build it on first launch.

-## Themes
+## Themes & plugins

-Themes control the dashboard's visual presentation across three layers:
+The dashboard ships with six built-in themes and can be extended with user-defined themes, plugin tabs, and backend API routes — all drop-in, no repo clone needed.

- **Palette** — colors (background, text, accents, warm glow, noise)
- **Typography** — font families, base size, line height, letter spacing
- **Layout** — corner radius and density (spacing multiplier)
+**Switch themes live** from the header bar — click the palette icon next to the language switcher. Selection persists to `config.yaml` under `dashboard.theme` and is restored on page load.

-Switch themes live from the header bar — click the palette icon next to the language switcher. Selection persists to `config.yaml` under `dashboard.theme` and is restored on page load.
+Built-in themes:

-### Built-in themes
+| Theme | Character |
+|-------|-----------|
+| **Hermes Teal** (`default`) | Dark teal + cream, system fonts, comfortable spacing |
+| **Midnight** (`midnight`) | Deep blue-violet, Inter + JetBrains Mono |
+| **Ember** (`ember`) | Warm crimson + bronze, Spectral serif + IBM Plex Mono |
+| **Mono** (`mono`) | Grayscale, IBM Plex, compact |
+| **Cyberpunk** (`cyberpunk`) | Neon green on black, Share Tech Mono |
+| **Rosé** (`rose`) | Pink + ivory, Fraunces serif, spacious |

-Each built-in ships its own palette, typography, and layout — switching produces visible changes beyond color alone.
+To build your own theme, add a plugin tab, inject into shell slots, or expose plugin-specific REST endpoints, see **[Extending the Dashboard](./extending-the-dashboard)** — the complete guide covers:

-| Theme | Palette | Typography | Layout |
-|-------|---------|------------|--------|
-| **Hermes Teal** (`default`) | Dark teal + cream | System stack, 15px | 0.5rem radius, comfortable |
-| **Midnight** (`midnight`) | Deep blue-violet | Inter + JetBrains Mono, 14px | 0.75rem radius, comfortable |
-| **Ember** (`ember`) | Warm crimson / bronze | Spectral (serif) + IBM Plex Mono, 15px | 0.25rem radius, comfortable |
-| **Mono** (`mono`) | Grayscale | IBM Plex Sans + IBM Plex Mono, 13px | 0 radius, compact |
-| **Cyberpunk** (`cyberpunk`) | Neon green on black | Share Tech Mono everywhere, 14px | 0 radius, compact |
-| **Rosé** (`rose`) | Pink and ivory | Fraunces (serif) + DM Mono, 16px | 1rem radius, spacious |
-
-Themes that reference Google Fonts (everything except Hermes Teal) load the stylesheet on demand — the first time you switch to them, a `<link>` tag is injected into `<head>`.
-
-### Custom themes
-
-Drop a YAML file in `~/.hermes/dashboard-themes/` and it appears in the picker automatically. The file can be as minimal as a name plus the fields you want to override — every missing field inherits a sane default.
-
-Minimal example (colors only, bare hex shorthand):
-
-```yaml
-# ~/.hermes/dashboard-themes/neon.yaml
-name: neon
-label: Neon
-description: Pure magenta on black
-colors:
-  background: "#000000"
-  midground: "#ff00ff"
-```
-
-Full example (every knob):
-
-```yaml
-# ~/.hermes/dashboard-themes/ocean.yaml
-name: ocean
-label: Ocean Deep
-description: Deep sea blues with coral accents
-
-palette:
-  background:
-    hex: "#0a1628"
-    alpha: 1.0
-  midground:
-    hex: "#a8d0ff"
-    alpha: 1.0
-  foreground:
-    hex: "#ffffff"
-    alpha: 0.0
-  warmGlow: "rgba(255, 107, 107, 0.35)"
-  noiseOpacity: 0.7
-
-typography:
-  fontSans: "Poppins, system-ui, sans-serif"
-  fontMono: "Fira Code, ui-monospace, monospace"
-  fontDisplay: "Poppins, system-ui, sans-serif"   # optional, falls back to fontSans
-  fontUrl: "https://fonts.googleapis.com/css2?family=Poppins:wght@400;500;600&family=Fira+Code:wght@400;500&display=swap"
-  baseSize: "15px"
-  lineHeight: "1.6"
-  letterSpacing: "-0.003em"
-
-layout:
-  radius: "0.75rem"      # 0 | 0.25rem | 0.5rem | 0.75rem | 1rem | any length
-  density: comfortable   # compact | comfortable | spacious
-
-# Optional — pin individual shadcn tokens that would otherwise derive from
-# the palette. Any key listed here wins over the palette cascade.
-colorOverrides:
-  destructive: "#ff6b6b"
-  ring: "#ff6b6b"
-```
-
-Refresh the dashboard after creating the file.
-
-### Palette model
-
-The palette is a 3-layer triplet — **background**, **midground**, **foreground** — plus a warm-glow rgba() string and a noise-opacity multiplier. Every shadcn token (card, muted, border, primary, popover, etc.) is derived from this triplet via CSS `color-mix()` in the dashboard's stylesheet, so overriding three colors cascades into the whole UI.
-
- `background` — deepest canvas color (typically near-black). The page background and card fill come from this.
- `midground` — primary text and accent. Most UI chrome reads this.
- `foreground` — top-layer highlight. In the default theme this is white at alpha 0 (invisible); themes that want a bright accent on top can raise its alpha.
- `warmGlow` — rgba() vignette color used by the ambient backdrop.
- `noiseOpacity` — 0–1.2 multiplier on the grain overlay. Lower = softer, higher = grittier.
-
-Each layer accepts `{hex, alpha}` or a bare hex string (alpha defaults to 1.0).
-
-### Typography model
-
-| Key | Type | Description |
-|-----|------|-------------|
-| `fontSans` | string | CSS font-family stack for body copy (applied to `html`, `body`) |
-| `fontMono` | string | CSS font-family stack for code blocks, `<code>`, `.font-mono` utilities, dense readouts |
-| `fontDisplay` | string | Optional heading/display font stack. Falls back to `fontSans` |
-| `fontUrl` | string | Optional external stylesheet URL. Injected as `<link rel="stylesheet">` in `<head>` on theme switch. Same URL is never injected twice. Works with Google Fonts, Bunny Fonts, self-hosted `@font-face` sheets, anything you can link |
-| `baseSize` | string | Root font size — controls the rem scale for the whole dashboard. Example: `"14px"`, `"16px"` |
-| `lineHeight` | string | Default line-height, e.g. `"1.5"`, `"1.65"` |
-| `letterSpacing` | string | Default letter-spacing, e.g. `"0"`, `"0.01em"`, `"-0.01em"` |
-
-### Layout model
-
-| Key | Values | Description |
-|-----|--------|-------------|
-| `radius` | any CSS length | Corner-radius token. Cascades into `--radius-sm/md/lg/xl` so every rounded element shifts together. |
-| `density` | `compact` \| `comfortable` \| `spacious` | Spacing multiplier. Compact = 0.85×, comfortable = 1.0× (default), spacious = 1.2×. Scales Tailwind's base spacing, so padding, gap, and space-between utilities all shift proportionally. |
-
-### Color overrides (optional)
-
-Most themes won't need this — the 3-layer palette derives every shadcn token. But if you want a specific accent that the derivation won't produce (a softer destructive red for a pastel theme, a specific success green for a brand), pin individual tokens here.
-
-Supported keys: `card`, `cardForeground`, `popover`, `popoverForeground`, `primary`, `primaryForeground`, `secondary`, `secondaryForeground`, `muted`, `mutedForeground`, `accent`, `accentForeground`, `destructive`, `destructiveForeground`, `success`, `warning`, `border`, `input`, `ring`.
-
-Any key set here overrides the derived value for the active theme only — switching to another theme clears the overrides.
-
-### Layout variants
-
-`layoutVariant` selects the overall shell layout. Defaults to `standard`.
-
-| Variant | Behaviour |
-|---------|-----------|
-| `standard` | Single column, 1600px max-width (default) |
-| `cockpit` | Left sidebar rail (260px) + main content. Populated by plugins via the `sidebar` slot |
-| `tiled` | Drops the max-width clamp so pages can use the full viewport |
-
-```yaml
-layoutVariant: cockpit
-```
-
-The current variant is exposed as `document.documentElement.dataset.layoutVariant` so custom CSS can target it via `:root[data-layout-variant="cockpit"]`.
-
-### Theme assets
-
-Ship artwork URLs with a theme. Each named slot becomes a CSS var (`--theme-asset-<name>`) that plugins and the built-in shell read; the `bg` slot is automatically wired into the backdrop.
-
-```yaml
-assets:
-  bg: "https://example.com/hero-bg.jpg"       # full-viewport background
-  hero: "/my-images/strike-freedom.png"       # for plugin sidebars
-  crest: "/my-images/crest.svg"               # for header slot plugins
-  logo: "/my-images/logo.png"
-  sidebar: "/my-images/rail.png"
-  header: "/my-images/header-art.png"
-  custom:
-    scanLines: "/my-images/scanlines.png"     # → --theme-asset-custom-scanLines
-```
-
-Values accept bare URLs (wrapped in `url(...)` automatically), pre-wrapped `url(...)`/`linear-gradient(...)`/`radial-gradient(...)` expressions, and `none`.
-
-### Component chrome overrides
-
-Themes can restyle individual shell components without writing CSS selectors via the `componentStyles` block. Each bucket's entries become CSS vars (`--component-<bucket>-<kebab-property>`) that the shell's shared components read — so `card:` overrides apply to every `<Card>`, `header:` to the app bar, etc.
-
-```yaml
-componentStyles:
-  card:
-    clipPath: "polygon(12px 0, 100% 0, 100% calc(100% - 12px), calc(100% - 12px) 100%, 0 100%, 0 12px)"
-    background: "linear-gradient(180deg, rgba(10, 22, 52, 0.85), rgba(5, 9, 26, 0.92))"
-    boxShadow: "inset 0 0 0 1px rgba(64, 200, 255, 0.28)"
-  header:
-    background: "linear-gradient(180deg, rgba(16, 32, 72, 0.95), rgba(5, 9, 26, 0.9))"
-  tab:
-    clipPath: "polygon(6px 0, 100% 0, calc(100% - 6px) 100%, 0 100%)"
-  sidebar: {...}
-  backdrop: {...}
-  footer: {...}
-  progress: {...}
-  badge: {...}
-  page: {...}
-```
-
-Supported buckets: `card`, `header`, `footer`, `sidebar`, `tab`, `progress`, `badge`, `backdrop`, `page`. Property names use camelCase (`clipPath`) and are emitted as kebab (`clip-path`). Values are plain CSS strings — anything CSS accepts (`clip-path`, `border-image`, `background`, `box-shadow`, animations, etc.).
-
-### Custom CSS
-
-For selector-level chrome that doesn't fit `componentStyles` — pseudo-elements, animations, media queries, theme-scoped overrides — drop raw CSS into the `customCSS` field:
-
-```yaml
-customCSS: |
-  :root[data-layout-variant="cockpit"] body::before {
-    content: "";
-    position: fixed;
-    inset: 0;
-    pointer-events: none;
-    z-index: 100;
-    background: repeating-linear-gradient(to bottom,
-      transparent 0px, transparent 2px,
-      rgba(64, 200, 255, 0.035) 3px, rgba(64, 200, 255, 0.035) 4px);
-    mix-blend-mode: screen;
-  }
-```
-
-The CSS is injected as a single scoped `<style data-hermes-theme-css>` tag on theme apply and cleaned up on theme switch. Capped at 32 KiB per theme.
-
-## Dashboard plugins
-
-Plugins live in `~/.hermes/plugins/<name>/dashboard/` (user) or repo `plugins/<name>/dashboard/` (bundled). Each ships a `manifest.json` plus a plain JS bundle that uses the plugin SDK exposed on `window.__HERMES_PLUGIN_SDK__`.
-
-### Manifest
-
-```json
-{
-  "name": "my-plugin",
-  "label": "My Plugin",
-  "icon": "Sparkles",
-  "version": "1.0.0",
-  "tab": {
-    "path": "/my-plugin",
-    "position": "after:skills",
-    "override": "/",
-    "hidden": false
-  },
-  "slots": ["sidebar", "header-left"],
-  "entry": "dist/index.js",
-  "css": "dist/index.css",
-  "api": "api.py"
-}
-```
-
-| Field | Description |
-|-------|-------------|
-| `tab.path` | Route path the plugin component renders at |
-| `tab.position` | `end`, `after:<tab>`, or `before:<tab>` |
-| `tab.override` | When set to a built-in path (`/`, `/sessions`, etc.), this plugin replaces that page instead of adding a new tab |
-| `tab.hidden` | When true, register component + slots but skip the nav entry. Used by slot-only plugins |
-| `slots` | Shell slots this plugin populates (documentation aid; actual registration happens from the JS bundle) |
-
-### Shell slots
-
-Plugins inject components into named shell locations by calling `window.__HERMES_PLUGINS__.registerSlot(pluginName, slotName, Component)`. Multiple plugins can populate the same slot — they render stacked in registration order.
-
-| Slot | Location |
-|------|----------|
-| `backdrop` | Inside the backdrop layer stack |
-| `header-left` | Before the Hermes brand in the top bar |
-| `header-right` | Before the theme/language switchers |
-| `header-banner` | Full-width strip below the nav |
-| `sidebar` | Cockpit sidebar rail (only rendered when `layoutVariant === "cockpit"`) |
-| `pre-main` | Above the route outlet |
-| `post-main` | Below the route outlet |
-| `footer-left` / `footer-right` | Footer cell content (replaces default) |
-| `overlay` | Fixed-position layer above everything else |
-
-### Plugin SDK
-
-Exposed on `window.__HERMES_PLUGIN_SDK__`:
-
- `React` + `hooks` (useState, useEffect, useCallback, useMemo, useRef, useContext, createContext)
- `components` — Card, Badge, Button, Input, Label, Select, Separator, Tabs, **PluginSlot**
- `api` — Hermes API client, plus raw `fetchJSON`
- `utils` — `cn()`, `timeAgo()`, `isoTimeAgo()`
- `useI18n` — i18n hook for multi-language plugins
-
-### Demo: Strike Freedom Cockpit
-
-`plugins/strike-freedom-cockpit/` ships a complete skin demo showing every extension point — cockpit layout variant, theme-supplied hero/crest assets, notched card corners via `componentStyles`, scanlines via `customCSS`, and a slot-only plugin that populates the sidebar, header, and footer. Copy the theme YAML into `~/.hermes/dashboard-themes/` and the plugin directory into `~/.hermes/plugins/` to try it.
-
-### Theme API
-
-| Endpoint | Method | Description |
-|----------|--------|-------------|
-| `/api/dashboard/themes` | GET | List available themes + active name. Built-ins return `{name, label, description}`; user themes also include a `definition` field with the full normalised theme object. |
-| `/api/dashboard/theme` | PUT | Set active theme. Body: `{"name": "midnight"}` |
+- Theme YAML schema — palette, typography, layout, assets, componentStyles, colorOverrides, customCSS
+- Layout variants — `standard`, `cockpit`, `tiled`
+- Plugin manifest, SDK, shell slots, page-scoped slots (inject widgets into built-in pages without overriding them), backend FastAPI routes
+- A full combined theme-plus-plugin walkthrough (Strike Freedom cockpit demo)
+- Discovery, reload, and troubleshooting
@@ -81,7 +81,7 @@ const sidebars: SidebarsConfig = {
          label: 'Management',
          items: [
            'user-guide/features/web-dashboard',
-            'user-guide/features/dashboard-plugins',
+            'user-guide/features/extending-the-dashboard',
          ],
        },
        {