feat: enable streaming by default in CLI

Streaming provides a better UX — tokens appear as they arrive instead of waiting for the full response. show_reasoning remains false so thinking blocks are not streamed to the user.
refactor(cli): add protected TUI extension hooks for wrapper CLIs
2026-03-21 09:49:47 -07:00 · 2026-03-21 09:42:07 -07:00 · 2026-03-21 09:31:53 -07:00 · 2026-03-21 09:31:15 -07:00 · 2026-03-21 09:27:40 -07:00 · 2026-03-21 09:26:57 -07:00
27 changed files with 1234 additions and 144 deletions
--- a/agent/context_compressor.py
+++ b/agent/context_compressor.py
@@ -1,8 +1,16 @@
 """Automatic context window compression for long conversations.

 Self-contained class with its own OpenAI client for summarization.
-Uses Gemini Flash (cheap/fast) to summarize middle turns while
+Uses auxiliary model (cheap/fast) to summarize middle turns while
 protecting head and tail context.
+
+Improvements over v1:
+  - Structured summary template (Goal, Progress, Decisions, Files, Next Steps)
+  - Iterative summary updates (preserves info across multiple compactions)
+  - Token-budget tail protection instead of fixed message count
+  - Tool output pruning before LLM summarization (cheap pre-pass)
+  - Scaled summary budget (proportional to compressed content)
+  - Richer tool call/result detail in summarizer input
 """

 import logging
@@ -27,12 +35,31 @@ SUMMARY_PREFIX = (
 )
 LEGACY_SUMMARY_PREFIX = "[CONTEXT SUMMARY]:"

+# Minimum / maximum tokens for the summary output
+_MIN_SUMMARY_TOKENS = 2000
+_MAX_SUMMARY_TOKENS = 8000
+# Proportion of compressed content to allocate for summary
+_SUMMARY_RATIO = 0.20
+
+# Token budget for tail protection (keep most-recent context)
+_DEFAULT_TAIL_TOKEN_BUDGET = 20_000
+
+# Placeholder used when pruning old tool results
+_PRUNED_TOOL_PLACEHOLDER = "[Old tool output cleared to save context space]"
+
+# Chars per token rough estimate
+_CHARS_PER_TOKEN = 4
+

 class ContextCompressor:
    """Compresses conversation context when approaching the model's context limit.

-    Algorithm: protect first N + last N turns, summarize everything in between.
-    Token tracking uses actual counts from API responses for accuracy.
+    Algorithm:
+      1. Prune old tool results (cheap, no LLM call)
+      2. Protect head messages (system prompt + first exchange)
+      3. Protect tail messages by token budget (most recent ~20K tokens)
+      4. Summarize middle turns with structured LLM prompt
+      5. On subsequent compactions, iteratively update the previous summary
    """

    def __init__(
@@ -74,6 +101,9 @@ class ContextCompressor:

        self.summary_model = summary_model_override or ""

+        # Stores the previous compaction summary for iterative updates
+        self._previous_summary: Optional[str] = None
+
    def update_from_response(self, usage: Dict[str, Any]):
        """Update tracked token usage from API response."""
        self.last_prompt_tokens = usage.get("prompt_tokens", 0)
@@ -100,53 +130,204 @@ class ContextCompressor:
            "compression_count": self.compression_count,
        }

-    def _generate_summary(self, turns_to_summarize: List[Dict[str, Any]]) -> Optional[str]:
-        """Generate a concise summary of conversation turns.
+    # ------------------------------------------------------------------
+    # Tool output pruning (cheap pre-pass, no LLM call)
+    # ------------------------------------------------------------------

-        Tries the auxiliary model first, then falls back to the user's main
-        model.  Returns None if all attempts fail — the caller should drop
+    def _prune_old_tool_results(
+        self, messages: List[Dict[str, Any]], protect_tail_count: int,
+    ) -> tuple[List[Dict[str, Any]], int]:
+        """Replace old tool result contents with a short placeholder.
+
+        Walks backward from the end, protecting the most recent
+        ``protect_tail_count`` messages. Older tool results get their
+        content replaced with a placeholder string.
+
+        Returns (pruned_messages, pruned_count).
+        """
+        if not messages:
+            return messages, 0
+
+        result = [m.copy() for m in messages]
+        pruned = 0
+        prune_boundary = len(result) - protect_tail_count
+
+        for i in range(prune_boundary):
+            msg = result[i]
+            if msg.get("role") != "tool":
+                continue
+            content = msg.get("content", "")
+            if not content or content == _PRUNED_TOOL_PLACEHOLDER:
+                continue
+            # Only prune if the content is substantial (>200 chars)
+            if len(content) > 200:
+                result[i] = {**msg, "content": _PRUNED_TOOL_PLACEHOLDER}
+                pruned += 1
+
+        return result, pruned
+
+    # ------------------------------------------------------------------
+    # Summarization
+    # ------------------------------------------------------------------
+
+    def _compute_summary_budget(self, turns_to_summarize: List[Dict[str, Any]]) -> int:
+        """Scale summary token budget with the amount of content being compressed."""
+        content_tokens = estimate_messages_tokens_rough(turns_to_summarize)
+        budget = int(content_tokens * _SUMMARY_RATIO)
+        return max(_MIN_SUMMARY_TOKENS, min(budget, _MAX_SUMMARY_TOKENS))
+
+    def _serialize_for_summary(self, turns: List[Dict[str, Any]]) -> str:
+        """Serialize conversation turns into labeled text for the summarizer.
+
+        Includes tool call arguments and result content (up to 3000 chars
+        per message) so the summarizer can preserve specific details like
+        file paths, commands, and outputs.
+        """
+        parts = []
+        for msg in turns:
+            role = msg.get("role", "unknown")
+            content = msg.get("content") or ""
+
+            # Tool results: keep more content than before (3000 chars)
+            if role == "tool":
+                tool_id = msg.get("tool_call_id", "")
+                if len(content) > 3000:
+                    content = content[:2000] + "\n...[truncated]...\n" + content[-800:]
+                parts.append(f"[TOOL RESULT {tool_id}]: {content}")
+                continue
+
+            # Assistant messages: include tool call names AND arguments
+            if role == "assistant":
+                if len(content) > 3000:
+                    content = content[:2000] + "\n...[truncated]...\n" + content[-800:]
+                tool_calls = msg.get("tool_calls", [])
+                if tool_calls:
+                    tc_parts = []
+                    for tc in tool_calls:
+                        if isinstance(tc, dict):
+                            fn = tc.get("function", {})
+                            name = fn.get("name", "?")
+                            args = fn.get("arguments", "")
+                            # Truncate long arguments but keep enough for context
+                            if len(args) > 500:
+                                args = args[:400] + "..."
+                            tc_parts.append(f"  {name}({args})")
+                        else:
+                            fn = getattr(tc, "function", None)
+                            name = getattr(fn, "name", "?") if fn else "?"
+                            tc_parts.append(f"  {name}(...)")
+                    content += "\n[Tool calls:\n" + "\n".join(tc_parts) + "\n]"
+                parts.append(f"[ASSISTANT]: {content}")
+                continue
+
+            # User and other roles
+            if len(content) > 3000:
+                content = content[:2000] + "\n...[truncated]...\n" + content[-800:]
+            parts.append(f"[{role.upper()}]: {content}")
+
+        return "\n\n".join(parts)
+
+    def _generate_summary(self, turns_to_summarize: List[Dict[str, Any]]) -> Optional[str]:
+        """Generate a structured summary of conversation turns.
+
+        Uses a structured template (Goal, Progress, Decisions, Files, Next Steps)
+        inspired by Pi-mono and OpenCode. When a previous summary exists,
+        generates an iterative update instead of summarizing from scratch.
+
+        Returns None if all attempts fail — the caller should drop
        the middle turns without a summary rather than inject a useless
        placeholder.
        """
-        parts = []
-        for msg in turns_to_summarize:
-            role = msg.get("role", "unknown")
-            content = msg.get("content") or ""
-            if len(content) > 2000:
-                content = content[:1000] + "\n...[truncated]...\n" + content[-500:]
-            tool_calls = msg.get("tool_calls", [])
-            if tool_calls:
-                tool_names = [tc.get("function", {}).get("name", "?") for tc in tool_calls if isinstance(tc, dict)]
-                content += f"\n[Tool calls: {', '.join(tool_names)}]"
-            parts.append(f"[{role.upper()}]: {content}")
+        summary_budget = self._compute_summary_budget(turns_to_summarize)
+        content_to_summarize = self._serialize_for_summary(turns_to_summarize)

-        content_to_summarize = "\n\n".join(parts)
-        prompt = f"""Create a concise handoff summary for a later assistant that will continue this conversation after earlier turns are compacted.
+        if self._previous_summary:
+            # Iterative update: preserve existing info, add new progress
+            prompt = f"""You are updating a context compaction summary. A previous compaction produced the summary below. New conversation turns have occurred since then and need to be incorporated.

-Describe:
-1. What actions were taken (tool calls, searches, file operations)
-2. Key information or results obtained
-3. Important decisions, constraints, or user preferences
-4. Relevant data, file names, outputs, or next steps needed to continue
+PREVIOUS SUMMARY:
+{self._previous_summary}

-Keep it factual, concise, and focused on helping the next assistant resume without repeating work. Target ~{self.summary_target_tokens} tokens.
+NEW TURNS TO INCORPORATE:
+{content_to_summarize}
+
+Update the summary using this exact structure. PRESERVE all existing information that is still relevant. ADD new progress. Move items from "In Progress" to "Done" when completed. Remove information only if it is clearly obsolete.
+
+## Goal
+[What the user is trying to accomplish — preserve from previous summary, update if goal evolved]
+
+## Constraints & Preferences
+[User preferences, coding style, constraints, important decisions — accumulate across compactions]
+
+## Progress
+### Done
+[Completed work — include specific file paths, commands run, results obtained]
+### In Progress
+[Work currently underway]
+### Blocked
+[Any blockers or issues encountered]
+
+## Key Decisions
+[Important technical decisions and why they were made]
+
+## Relevant Files
+[Files read, modified, or created — with brief note on each. Accumulate across compactions.]
+
+## Next Steps
+[What needs to happen next to continue the work]
+
+## Critical Context
+[Any specific values, error messages, configuration details, or data that would be lost without explicit preservation]
+
+Target ~{summary_budget} tokens. Be specific — include file paths, command outputs, error messages, and concrete values rather than vague descriptions.
+
+Write only the summary body. Do not include any preamble or prefix."""
+        else:
+            # First compaction: summarize from scratch
+            prompt = f"""Create a structured handoff summary for a later assistant that will continue this conversation after earlier turns are compacted.

---
 TURNS TO SUMMARIZE:
 {content_to_summarize}
---

-Write only the summary body. Do not include any preamble or prefix; the system will add the handoff wrapper."""
+Use this exact structure:
+
+## Goal
+[What the user is trying to accomplish]
+
+## Constraints & Preferences
+[User preferences, coding style, constraints, important decisions]
+
+## Progress
+### Done
+[Completed work — include specific file paths, commands run, results obtained]
+### In Progress
+[Work currently underway]
+### Blocked
+[Any blockers or issues encountered]
+
+## Key Decisions
+[Important technical decisions and why they were made]
+
+## Relevant Files
+[Files read, modified, or created — with brief note on each]
+
+## Next Steps
+[What needs to happen next to continue the work]
+
+## Critical Context
+[Any specific values, error messages, configuration details, or data that would be lost without explicit preservation]
+
+Target ~{summary_budget} tokens. Be specific — include file paths, command outputs, error messages, and concrete values rather than vague descriptions. The goal is to prevent the next assistant from repeating work or losing important details.
+
+Write only the summary body. Do not include any preamble or prefix."""

-        # Use the centralized LLM router — handles provider resolution,
-        # auth, and fallback internally.
        try:
            call_kwargs = {
                "task": "compression",
                "messages": [{"role": "user", "content": prompt}],
                "temperature": 0.3,
-                "max_tokens": self.summary_target_tokens * 2,
-                "timeout": 30.0,
+                "max_tokens": summary_budget * 2,
+                "timeout": 45.0,
            }
            if self.summary_model:
                call_kwargs["model"] = self.summary_model
@@ -156,6 +337,8 @@ Write only the summary body. Do not include any preamble or prefix; the system w
            if not isinstance(content, str):
                content = str(content) if content else ""
            summary = content.strip()
+            # Store for iterative updates on next compaction
+            self._previous_summary = summary
            return self._with_summary_prefix(summary)
        except RuntimeError:
            logging.warning("Context compression: no provider available for "
@@ -280,10 +463,69 @@ Write only the summary body. Do not include any preamble or prefix; the system w
            idx = check
        return idx

+    # ------------------------------------------------------------------
+    # Tail protection by token budget
+    # ------------------------------------------------------------------
+
+    def _find_tail_cut_by_tokens(
+        self, messages: List[Dict[str, Any]], head_end: int,
+        token_budget: int = _DEFAULT_TAIL_TOKEN_BUDGET,
+    ) -> int:
+        """Walk backward from the end of messages, accumulating tokens until
+        the budget is reached. Returns the index where the tail starts.
+
+        Never cuts inside a tool_call/result group. Falls back to the old
+        ``protect_last_n`` if the budget would protect fewer messages.
+        """
+        n = len(messages)
+        min_tail = self.protect_last_n
+        accumulated = 0
+        cut_idx = n  # start from beyond the end
+
+        for i in range(n - 1, head_end - 1, -1):
+            msg = messages[i]
+            content = msg.get("content") or ""
+            msg_tokens = len(content) // _CHARS_PER_TOKEN + 10  # +10 for role/metadata
+            # Include tool call arguments in estimate
+            for tc in msg.get("tool_calls") or []:
+                if isinstance(tc, dict):
+                    args = tc.get("function", {}).get("arguments", "")
+                    msg_tokens += len(args) // _CHARS_PER_TOKEN
+            if accumulated + msg_tokens > token_budget and (n - i) >= min_tail:
+                break
+            accumulated += msg_tokens
+            cut_idx = i
+
+        # Ensure we protect at least protect_last_n messages
+        fallback_cut = n - min_tail
+        if cut_idx > fallback_cut:
+            cut_idx = fallback_cut
+
+        # If the token budget would protect everything (small conversations),
+        # fall back to the fixed protect_last_n approach so compression can
+        # still remove middle turns.
+        if cut_idx <= head_end:
+            cut_idx = fallback_cut
+
+        # Align to avoid splitting tool groups
+        cut_idx = self._align_boundary_backward(messages, cut_idx)
+
+        return max(cut_idx, head_end + 1)
+
+    # ------------------------------------------------------------------
+    # Main compression entry point
+    # ------------------------------------------------------------------
+
    def compress(self, messages: List[Dict[str, Any]], current_tokens: int = None) -> List[Dict[str, Any]]:
        """Compress conversation messages by summarizing middle turns.

-        Keeps first N + last N turns, summarizes everything in between.
+        Algorithm:
+          1. Prune old tool results (cheap pre-pass, no LLM call)
+          2. Protect head messages (system prompt + first exchange)
+          3. Find tail boundary by token budget (~20K tokens of recent context)
+          4. Summarize middle turns with structured LLM prompt
+          5. On re-compression, iteratively update the previous summary
+
        After compression, orphaned tool_call / tool_result pairs are cleaned
        up so the API never receives mismatched IDs.
        """
@@ -297,19 +539,26 @@ Write only the summary body. Do not include any preamble or prefix; the system w
                )
            return messages

-        compress_start = self.protect_first_n
-        compress_end = n_messages - self.protect_last_n
-        if compress_start >= compress_end:
-            return messages
+        display_tokens = current_tokens if current_tokens else self.last_prompt_tokens or estimate_messages_tokens_rough(messages)

-        # Adjust boundaries to avoid splitting tool_call/result groups.
+        # Phase 1: Prune old tool results (cheap, no LLM call)
+        messages, pruned_count = self._prune_old_tool_results(
+            messages, protect_tail_count=self.protect_last_n * 3,
+        )
+        if pruned_count and not self.quiet_mode:
+            logger.info("Pre-compression: pruned %d old tool result(s)", pruned_count)
+
+        # Phase 2: Determine boundaries
+        compress_start = self.protect_first_n
        compress_start = self._align_boundary_forward(messages, compress_start)
-        compress_end = self._align_boundary_backward(messages, compress_end)
+
+        # Use token-budget tail protection instead of fixed message count
+        compress_end = self._find_tail_cut_by_tokens(messages, compress_start)
+
        if compress_start >= compress_end:
            return messages

        turns_to_summarize = messages[compress_start:compress_end]
-        display_tokens = current_tokens if current_tokens else self.last_prompt_tokens or estimate_messages_tokens_rough(messages)

        if not self.quiet_mode:
            logger.info(
@@ -323,15 +572,20 @@ Write only the summary body. Do not include any preamble or prefix; the system w
                self.threshold_percent * 100,
                self.threshold_tokens,
            )
+            tail_msgs = n_messages - compress_end
            logger.info(
-                "Summarizing turns %d-%d (%d turns)",
+                "Summarizing turns %d-%d (%d turns), protecting %d head + %d tail messages",
                compress_start + 1,
                compress_end,
                len(turns_to_summarize),
+                compress_start,
+                tail_msgs,
            )

+        # Phase 3: Generate structured summary
        summary = self._generate_summary(turns_to_summarize)

+        # Phase 4: Assemble compressed message list
        compressed = []
        for i in range(compress_start):
            msg = messages[i].copy()
--- a/batch_runner.py
+++ b/batch_runner.py
@@ -128,6 +128,7 @@ def _extract_tool_stats(messages: List[Dict[str, Any]]) -> Dict[str, Dict[str, i
        # Track tool calls from assistant messages
        if msg["role"] == "assistant" and "tool_calls" in msg and msg["tool_calls"]:
            for tool_call in msg["tool_calls"]:
+                if not tool_call or not isinstance(tool_call, dict): continue
                tool_name = tool_call["function"]["name"]
                tool_call_id = tool_call["id"]
                
--- a/cli-config.yaml.example
+++ b/cli-config.yaml.example
@@ -696,8 +696,8 @@ display:
  # Stream tokens to the terminal as they arrive instead of waiting for the
  # full response. The response box opens on first token and text appears
  # line-by-line. Tool calls are still captured silently.
-  # Disabled by default — enable to try the streaming UX.
-  streaming: false
+  # Stream tokens to the terminal in real-time. Disable to wait for full responses.
+  streaming: true

  # ───────────────────────────────────────────────────────────────────────────
  # Skin / Theme
--- a/cli.py
+++ b/cli.py
@@ -216,7 +216,7 @@ def load_cli_config() -> Dict[str, Any]:
            "compact": False,
            "resume_display": "full",
            "show_reasoning": False,
-            "streaming": False,
+            "streaming": True,

            "skin": "default",
        },
@@ -398,7 +398,7 @@ def load_cli_config() -> Dict[str, Any]:
            "provider": "AUXILIARY_WEB_EXTRACT_PROVIDER",
            "model": "AUXILIARY_WEB_EXTRACT_MODEL",
            "base_url": "AUXILIARY_WEB_EXTRACT_BASE_URL",
-            "api_key": "AUXILI..._KEY",
+            "api_key": "AUXILIARY_WEB_EXTRACT_API_KEY",
        },
        "approval": {
            "provider": "AUXILIARY_APPROVAL_PROVIDER",
@@ -1473,9 +1473,15 @@ class HermesCLI:
        Opens a dim reasoning box on first token, streams line-by-line.
        The box is closed automatically when content tokens start arriving
        (via _stream_delta → _emit_stream_text).
+
+        Once the response box is open, suppress any further reasoning
+        rendering — a late thinking block (e.g. after an interrupt) would
+        otherwise draw a reasoning box inside the response box.
        """
        if not text:
            return
+        if getattr(self, "_stream_box_opened", False):
+            return

        # Open reasoning box on first reasoning token
        if not getattr(self, "_reasoning_box_opened", False):
@@ -5773,6 +5779,73 @@ class HermesCLI:
        self._invalidate(min_interval=0.0)
        return True

+    # --- Protected TUI extension hooks for wrapper CLIs ---
+
+    def _get_extra_tui_widgets(self) -> list:
+        """Return extra prompt_toolkit widgets to insert into the TUI layout.
+
+        Wrapper CLIs can override this to inject widgets (e.g. a mini-player,
+        overlay menu) into the layout without overriding ``run()``.  Widgets
+        are inserted between the spacer and the status bar.
+        """
+        return []
+
+    def _register_extra_tui_keybindings(self, kb, *, input_area) -> None:
+        """Register extra keybindings on the TUI ``KeyBindings`` object.
+
+        Wrapper CLIs can override this to add keybindings (e.g. transport
+        controls, modal shortcuts) without overriding ``run()``.
+
+        Parameters
+        ----------
+        kb : KeyBindings
+            The active keybinding registry for the prompt_toolkit application.
+        input_area : TextArea
+            The main input widget, for wrappers that need to inspect or
+            manipulate user input from a keybinding handler.
+        """
+
+    def _build_tui_layout_children(
+        self,
+        *,
+        sudo_widget,
+        secret_widget,
+        approval_widget,
+        clarify_widget,
+        spinner_widget,
+        spacer,
+        status_bar,
+        input_rule_top,
+        image_bar,
+        input_area,
+        input_rule_bot,
+        voice_status_bar,
+        completions_menu,
+    ) -> list:
+        """Assemble the ordered list of children for the root ``HSplit``.
+
+        Wrapper CLIs typically override ``_get_extra_tui_widgets`` instead of
+        this method.  Override this only when you need full control over widget
+        ordering.
+        """
+        return [
+            Window(height=0),
+            sudo_widget,
+            secret_widget,
+            approval_widget,
+            clarify_widget,
+            spinner_widget,
+            spacer,
+            *self._get_extra_tui_widgets(),
+            status_bar,
+            input_rule_top,
+            image_bar,
+            input_area,
+            input_rule_bot,
+            voice_status_bar,
+            completions_menu,
+        ]
+
    def run(self):
        """Run the interactive CLI loop with persistent input at bottom."""
        self.show_banner()
@@ -6735,26 +6808,32 @@ class HermesCLI:
            filter=Condition(lambda: cli_ref._status_bar_visible),
        )

+        # Allow wrapper CLIs to register extra keybindings.
+        self._register_extra_tui_keybindings(kb, input_area=input_area)
+
        # Layout: interactive prompt widgets + ruled input at bottom.
        # The sudo, approval, and clarify widgets appear above the input when
        # the corresponding interactive prompt is active.
+        completions_menu = CompletionsMenu(max_height=12, scroll_offset=1)
+
        layout = Layout(
-            HSplit([
-                Window(height=0),
-                sudo_widget,
-                secret_widget,
-                approval_widget,
-                clarify_widget,
-                spinner_widget,
-                spacer,
-                status_bar,
-                input_rule_top,
-                image_bar,
-                input_area,
-                input_rule_bot,
-                voice_status_bar,
-                CompletionsMenu(max_height=12, scroll_offset=1),
-            ])
+            HSplit(
+                self._build_tui_layout_children(
+                    sudo_widget=sudo_widget,
+                    secret_widget=secret_widget,
+                    approval_widget=approval_widget,
+                    clarify_widget=clarify_widget,
+                    spinner_widget=spinner_widget,
+                    spacer=spacer,
+                    status_bar=status_bar,
+                    input_rule_top=input_rule_top,
+                    image_bar=image_bar,
+                    input_area=input_area,
+                    input_rule_bot=input_rule_bot,
+                    voice_status_bar=voice_status_bar,
+                    completions_menu=completions_menu,
+                )
+            )
        )
        
        # Style for the application
--- a/cron/scheduler.py
+++ b/cron/scheduler.py
@@ -159,15 +159,29 @@ def _deliver_result(job: dict, content: str) -> None:
        logger.warning("Job '%s': platform '%s' not configured/enabled", job["id"], platform_name)
        return

+    # Wrap the content so the user knows this is a cron delivery and that
+    # the interactive agent has no visibility into it.
+    task_name = job.get("name", job["id"])
+    wrapped = (
+        f"Cronjob Response: {task_name}\n"
+        f"-------------\n\n"
+        f"{content}\n\n"
+        f"Note: The agent cannot see this message, and therefore cannot respond to it."
+    )
+
    # Run the async send in a fresh event loop (safe from any thread)
+    coro = _send_to_platform(platform, pconfig, chat_id, wrapped, thread_id=thread_id)
    try:
-        result = asyncio.run(_send_to_platform(platform, pconfig, chat_id, content, thread_id=thread_id))
+        result = asyncio.run(coro)
    except RuntimeError:
-        # asyncio.run() fails if there's already a running loop in this thread;
-        # spin up a new thread to avoid that.
+        # asyncio.run() checks for a running loop before awaiting the coroutine;
+        # when it raises, the original coro was never started — close it to
+        # prevent "coroutine was never awaited" RuntimeWarning, then retry in a
+        # fresh thread that has no running loop.
+        coro.close()
        import concurrent.futures
        with concurrent.futures.ThreadPoolExecutor(max_workers=1) as pool:
-            future = pool.submit(asyncio.run, _send_to_platform(platform, pconfig, chat_id, content, thread_id=thread_id))
+            future = pool.submit(asyncio.run, _send_to_platform(platform, pconfig, chat_id, wrapped, thread_id=thread_id))
            result = future.result(timeout=30)
    except Exception as e:
        logger.error("Job '%s': delivery to %s:%s failed: %s", job["id"], platform_name, chat_id, e)
@@ -177,12 +191,6 @@ def _deliver_result(job: dict, content: str) -> None:
        logger.error("Job '%s': delivery error: %s", job["id"], result["error"])
    else:
        logger.info("Job '%s': delivered to %s:%s", job["id"], platform_name, chat_id)
-        # Mirror the delivered content into the target's gateway session
-        try:
-            from gateway.mirror import mirror_to_session
-            mirror_to_session(platform_name, chat_id, content, source_label="cron", thread_id=thread_id)
-        except Exception as e:
-            logger.warning("Job '%s': mirror_to_session failed: %s", job["id"], e)


 def _build_job_prompt(job: dict) -> str:
--- a/gateway/config.py
+++ b/gateway/config.py
@@ -455,11 +455,27 @@ def load_gateway_config() -> GatewayConfig:
                    "pair",
                )

-            # Bridge per-platform settings from config.yaml into gw_data
+            # Merge platforms section from config.yaml into gw_data so that
+            # nested keys like platforms.webhook.extra.routes are loaded.
+            yaml_platforms = yaml_cfg.get("platforms")
            platforms_data = gw_data.setdefault("platforms", {})
            if not isinstance(platforms_data, dict):
                platforms_data = {}
                gw_data["platforms"] = platforms_data
+            if isinstance(yaml_platforms, dict):
+                for plat_name, plat_block in yaml_platforms.items():
+                    if not isinstance(plat_block, dict):
+                        continue
+                    existing = platforms_data.get(plat_name, {})
+                    if not isinstance(existing, dict):
+                        existing = {}
+                    # Deep-merge extra dicts so gateway.json defaults survive
+                    merged_extra = {**existing.get("extra", {}), **plat_block.get("extra", {})}
+                    merged = {**existing, **plat_block}
+                    if merged_extra:
+                        merged["extra"] = merged_extra
+                    platforms_data[plat_name] = merged
+                gw_data["platforms"] = platforms_data
            for plat in Platform:
                if plat == Platform.LOCAL:
                    continue
--- a/gateway/platforms/mattermost.py
+++ b/gateway/platforms/mattermost.py
@@ -617,16 +617,16 @@ class MattermostAdapter(BasePlatformAdapter):
                        if mime.startswith("image/"):
                            local_path = cache_image_from_bytes(file_data, ext or ".png")
                            media_urls.append(local_path)
-                            media_types.append("image")
+                            media_types.append(mime)
                        elif mime.startswith("audio/"):
                            from gateway.platforms.base import cache_audio_from_bytes
                            local_path = cache_audio_from_bytes(file_data, ext or ".ogg")
                            media_urls.append(local_path)
-                            media_types.append("audio")
+                            media_types.append(mime)
                        else:
                            local_path = cache_document_from_bytes(file_data, fname)
                            media_urls.append(local_path)
-                            media_types.append("document")
+                            media_types.append(mime)
                    else:
                        logger.warning("Mattermost: failed to download file %s: HTTP %s", fid, resp.status)
            except Exception as exc:
--- a/gateway/platforms/telegram.py
+++ b/gateway/platforms/telegram.py
@@ -129,6 +129,8 @@ class TelegramAdapter(BasePlatformAdapter):
        self._pending_text_batch_tasks: Dict[str, asyncio.Task] = {}
        self._token_lock_identity: Optional[str] = None
        self._polling_error_task: Optional[asyncio.Task] = None
+        self._polling_conflict_count: int = 0
+        self._polling_error_callback_ref = None

    @staticmethod
    def _looks_like_polling_conflict(error: Exception) -> bool:
@@ -142,10 +144,49 @@ class TelegramAdapter(BasePlatformAdapter):
    async def _handle_polling_conflict(self, error: Exception) -> None:
        if self.has_fatal_error and self.fatal_error_code == "telegram_polling_conflict":
            return
+        # Track consecutive conflicts — transient 409s can occur when a
+        # previous gateway instance hasn't fully released its long-poll
+        # session on Telegram's server (e.g. during --replace handoffs or
+        # systemd Restart=on-failure respawns).  Retry a few times before
+        # giving up, so the old session has time to expire.
+        self._polling_conflict_count += 1
+
+        MAX_CONFLICT_RETRIES = 3
+        RETRY_DELAY = 10  # seconds
+
+        if self._polling_conflict_count <= MAX_CONFLICT_RETRIES:
+            logger.warning(
+                "[%s] Telegram polling conflict (%d/%d), will retry in %ds. Error: %s",
+                self.name, self._polling_conflict_count, MAX_CONFLICT_RETRIES,
+                RETRY_DELAY, error,
+            )
+            try:
+                if self._app and self._app.updater and self._app.updater.running:
+                    await self._app.updater.stop()
+            except Exception:
+                pass
+            await asyncio.sleep(RETRY_DELAY)
+            try:
+                await self._app.updater.start_polling(
+                    allowed_updates=Update.ALL_TYPES,
+                    drop_pending_updates=False,
+                    error_callback=self._polling_error_callback_ref,
+                )
+                logger.info("[%s] Telegram polling resumed after conflict retry %d", self.name, self._polling_conflict_count)
+                self._polling_conflict_count = 0  # reset on success
+                return
+            except Exception as retry_err:
+                logger.warning("[%s] Telegram polling retry failed: %s", self.name, retry_err)
+                # Don't fall through to fatal yet — wait for the next conflict
+                # to trigger another retry attempt (up to MAX_CONFLICT_RETRIES).
+                return
+
+        # Exhausted retries — fatal
        message = (
            "Another Telegram bot poller is already using this token. "
-            "Hermes stopped Telegram polling to avoid endless retry spam. "
+            "Hermes stopped Telegram polling after %d retries. "
            "Make sure only one gateway instance is running for this bot token."
+            % MAX_CONFLICT_RETRIES
        )
        logger.error("[%s] %s Original error: %s", self.name, message, error)
        self._set_fatal_error("telegram_polling_conflict", message, retryable=False)
@@ -242,6 +283,9 @@ class TelegramAdapter(BasePlatformAdapter):
                    return
                self._polling_error_task = loop.create_task(self._handle_polling_conflict(error))

+            # Store reference for retry use in _handle_polling_conflict
+            self._polling_error_callback_ref = _polling_error_callback
+
            await self._app.updater.start_polling(
                allowed_updates=Update.ALL_TYPES,
                drop_pending_updates=True,
--- a/gateway/run.py
+++ b/gateway/run.py
@@ -2248,7 +2248,8 @@ class GatewayRunner:
            )

            # Auto voice reply: send TTS audio before the text response
-            if self._should_send_voice_reply(event, response, agent_messages):
+            _already_sent = bool(agent_result.get("already_sent"))
+            if self._should_send_voice_reply(event, response, agent_messages, already_sent=_already_sent):
                await self._send_voice_reply(event, response)

            # If streaming already delivered the response, return None so
@@ -3054,6 +3055,7 @@ class GatewayRunner:
        event: MessageEvent,
        response: str,
        agent_messages: list,
+        already_sent: bool = False,
    ) -> bool:
        """Decide whether the runner should send a TTS voice reply.

@@ -3062,8 +3064,9 @@ class GatewayRunner:
        - response is empty or an error
        - agent already called text_to_speech tool (dedup)
        - voice input and base adapter auto-TTS already handled it (skip_double)
-          Exception: Discord voice channel — base play_tts is a no-op there,
-          so the runner must handle VC playback.
+          UNLESS streaming already consumed the response (already_sent=True),
+          in which case the base adapter won't have text for auto-TTS so the
+          runner must handle it.
        """
        if not response or response.startswith("Error:"):
            return False
@@ -3093,7 +3096,10 @@ class GatewayRunner:

        # Dedup: base adapter auto-TTS already handles voice input
        # (play_tts plays in VC when connected, so runner can skip).
-        if is_voice_input:
+        # When streaming already delivered the text (already_sent=True),
+        # the base adapter will receive None and can't run auto-TTS,
+        # so the runner must take over.
+        if is_voice_input and not already_sent:
            return False

        return True
--- a/hermes_cli/gateway.py
+++ b/hermes_cli/gateway.py
@@ -420,6 +420,8 @@ def generate_systemd_unit(system: bool = False, run_as_user: str | None = None)
 Description={SERVICE_DESCRIPTION}
 After=network-online.target
 Wants=network-online.target
+StartLimitIntervalSec=600
+StartLimitBurst=5

 [Service]
 Type=simple
@@ -434,7 +436,7 @@ Environment="PATH={sane_path}"
 Environment="VIRTUAL_ENV={venv_dir}"
 Environment="HERMES_HOME={hermes_home}"
 Restart=on-failure
-RestartSec=10
+RestartSec=30
 KillMode=mixed
 KillSignal=SIGTERM
 TimeoutStopSec=60
@@ -448,6 +450,8 @@ WantedBy=multi-user.target
    return f"""[Unit]
 Description={SERVICE_DESCRIPTION}
 After=network.target
+StartLimitIntervalSec=600
+StartLimitBurst=5

 [Service]
 Type=simple
@@ -457,7 +461,7 @@ Environment="PATH={sane_path}"
 Environment="VIRTUAL_ENV={venv_dir}"
 Environment="HERMES_HOME={hermes_home}"
 Restart=on-failure
-RestartSec=10
+RestartSec=30
 KillMode=mixed
 KillSignal=SIGTERM
 TimeoutStopSec=60
--- a/hermes_cli/main.py
+++ b/hermes_cli/main.py
@@ -2688,7 +2688,7 @@ def cmd_update(args):

        print("→ Pulling updates...")
        try:
-            subprocess.run(git_cmd + ["pull", "origin", branch], cwd=PROJECT_ROOT, check=True)
+            subprocess.run(git_cmd + ["pull", "--ff-only", "origin", branch], cwd=PROJECT_ROOT, check=True)
        finally:
            if auto_stash_ref is not None:
                _restore_stashed_changes(
--- a/hermes_cli/plugins.py
+++ b/hermes_cli/plugins.py
@@ -5,7 +5,8 @@ Hermes Plugin System
 Discovers, loads, and manages plugins from three sources:

 1. **User plugins**   – ``~/.hermes/plugins/<name>/``
-2. **Project plugins** – ``./.hermes/plugins/<name>/``
+2. **Project plugins** – ``./.hermes/plugins/<name>/`` (opt-in via
+   ``HERMES_ENABLE_PROJECT_PLUGINS``)
 3. **Pip plugins**     – packages that expose the ``hermes_agent.plugins``
   entry-point group.

@@ -62,6 +63,11 @@ ENTRY_POINTS_GROUP = "hermes_agent.plugins"
 _NS_PARENT = "hermes_plugins"


+def _env_enabled(name: str) -> bool:
+    """Return True when an env var is set to a truthy opt-in value."""
+    return os.getenv(name, "").strip().lower() in {"1", "true", "yes", "on"}
+
+
 # ---------------------------------------------------------------------------
 # Data classes
 # ---------------------------------------------------------------------------
@@ -186,8 +192,9 @@ class PluginManager:
        manifests.extend(self._scan_directory(user_dir, source="user"))

        # 2. Project plugins (./.hermes/plugins/)
-        project_dir = Path.cwd() / ".hermes" / "plugins"
-        manifests.extend(self._scan_directory(project_dir, source="project"))
+        if _env_enabled("HERMES_ENABLE_PROJECT_PLUGINS"):
+            project_dir = Path.cwd() / ".hermes" / "plugins"
+            manifests.extend(self._scan_directory(project_dir, source="project"))

        # 3. Pip / entry-point plugins
        manifests.extend(self._scan_entry_points())
--- a/mini_swe_runner.py
+++ b/mini_swe_runner.py
@@ -339,6 +339,7 @@ class MiniSWERunner:
                    
                    # Add tool calls in XML format
                    for tool_call in msg["tool_calls"]:
+                        if not tool_call or not isinstance(tool_call, dict): continue
                        try:
                            arguments = json.loads(tool_call["function"]["arguments"]) \
                                if isinstance(tool_call["function"]["arguments"], str) \
--- a/run_agent.py
+++ b/run_agent.py
@@ -1374,9 +1374,6 @@ class AIAgent:
        def _run_review():
            import contextlib, os as _os
            try:
-                # Redirect stdout to devnull so spinners, cute messages,
-                # and any other print() calls from the review agent don't
-                # leak into the main CLI display.
                with open(_os.devnull, "w") as _devnull, \
                     contextlib.redirect_stdout(_devnull):
                    review_agent = AIAgent(
@@ -1396,6 +1393,39 @@ class AIAgent:
                        user_message=prompt,
                        conversation_history=messages_snapshot,
                    )
+
+                # Scan the review agent's messages for successful tool actions
+                # and surface a compact summary to the user.
+                actions = []
+                for msg in getattr(review_agent, "_session_messages", []):
+                    if not isinstance(msg, dict) or msg.get("role") != "tool":
+                        continue
+                    try:
+                        data = json.loads(msg.get("content", "{}"))
+                    except (json.JSONDecodeError, TypeError):
+                        continue
+                    if not data.get("success"):
+                        continue
+                    message = data.get("message", "")
+                    target = data.get("target", "")
+                    if "created" in message.lower():
+                        actions.append(message)
+                    elif "updated" in message.lower():
+                        actions.append(message)
+                    elif "added" in message.lower() or (target and "add" in message.lower()):
+                        label = "Memory" if target == "memory" else "User profile" if target == "user" else target
+                        actions.append(f"{label} updated")
+                    elif "Entry added" in message:
+                        label = "Memory" if target == "memory" else "User profile" if target == "user" else target
+                        actions.append(f"{label} updated")
+                    elif "removed" in message.lower() or "replaced" in message.lower():
+                        label = "Memory" if target == "memory" else "User profile" if target == "user" else target
+                        actions.append(f"{label} updated")
+
+                if actions:
+                    summary = " · ".join(dict.fromkeys(actions))
+                    self._safe_print(f"  💾 {summary}")
+
            except Exception as e:
                logger.debug("Background memory/skill review failed: %s", e)

@@ -1588,6 +1618,7 @@ class AIAgent:
                    
                    # Add tool calls wrapped in XML tags
                    for tool_call in msg["tool_calls"]:
+                        if not tool_call or not isinstance(tool_call, dict): continue
                        # Parse arguments - should always succeed since we validate during conversation
                        # but keep try-except as safety net
                        try:
@@ -2299,6 +2330,18 @@ class AIAgent:
            timestamp_line += f"\nProvider: {self.provider}"
        prompt_parts.append(timestamp_line)

+        # Alibaba Coding Plan API always returns "glm-4.7" as model name regardless
+        # of the requested model. Inject explicit model identity into the system prompt
+        # so the agent can correctly report which model it is (workaround for API bug).
+        if self.provider in ("alibaba-coding-plan", "alibaba-coding-plan-anthropic"):
+            _model_short = self.model.split("/")[-1] if "/" in self.model else self.model
+            prompt_parts.append(
+                f"You are powered by the model named {_model_short}. "
+                f"The exact model ID is {self.model}. "
+                f"When asked what model you are, always answer based on this information, "
+                f"not on any model name returned by the API."
+            )
+
        platform_key = (self.platform or "").lower().strip()
        if platform_key in PLATFORM_HINTS:
            prompt_parts.append(PLATFORM_HINTS[platform_key])
@@ -6741,6 +6784,7 @@ class AIAgent:
                                if msg.get("role") == "assistant" and msg.get("tool_calls"):
                                    tool_names = []
                                    for tc in msg["tool_calls"]:
+                                        if not tc or not isinstance(tc, dict): continue
                                        fn = tc.get("function", {})
                                        tool_names.append(fn.get("name", "unknown"))
                                    msg["content"] = f"Calling the {', '.join(tool_names)} tool{'s' if len(tool_names) > 1 else ''}..."
@@ -6783,6 +6827,7 @@ class AIAgent:
                                    if msg.get("role") == "assistant" and msg.get("tool_calls"):
                                        tool_names = []
                                        for tc in msg["tool_calls"]:
+                                            if not tc or not isinstance(tc, dict): continue
                                            fn = tc.get("function", {})
                                            tool_names.append(fn.get("name", "unknown"))
                                        msg["content"] = f"Calling the {', '.join(tool_names)} tool{'s' if len(tool_names) > 1 else ''}..."
@@ -6902,6 +6947,7 @@ class AIAgent:
                            if isinstance(m, dict) and m.get("role") == "tool"
                        }
                        for tc in msg["tool_calls"]:
+                            if not tc or not isinstance(tc, dict): continue
                            if tc["id"] not in answered_ids:
                                err_msg = {
                                    "role": "tool",
@@ -6912,20 +6958,18 @@ class AIAgent:
                        pending_handled = True
                    break
                
-                if not pending_handled:
-                    # Error happened before tool processing (e.g. response parsing).
-                    # Choose role to avoid consecutive same-role messages.
-                    last_role = messages[-1].get("role") if messages else None
-                    err_role = "assistant" if last_role == "user" else "user"
-                    sys_err_msg = {
-                        "role": err_role,
-                        "content": f"[System error during processing: {error_msg}]",
-                    }
-                    messages.append(sys_err_msg)
-                
+                # Non-tool errors don't need a synthetic message injected.
+                # The error is already printed to the user (line above), and
+                # the retry loop continues.  Injecting a fake user/assistant
+                # message pollutes history, burns tokens, and risks violating
+                # role-alternation invariants.
+
                # If we're near the limit, break to avoid infinite loops
                if api_call_count >= self.max_iterations - 1:
                    final_response = f"I apologize, but I encountered repeated errors: {error_msg}"
+                    # Append as assistant so the history stays valid for
+                    # session resume (avoids consecutive user messages).
+                    messages.append({"role": "assistant", "content": final_response})
                    break
        
        if final_response is None and (
--- a/scripts/hermes-gateway
+++ b/scripts/hermes-gateway
@@ -82,13 +82,15 @@ def generate_systemd_unit() -> str:
    return f"""[Unit]
 Description={SERVICE_DESCRIPTION}
 After=network.target
+StartLimitIntervalSec=600
+StartLimitBurst=5

 [Service]
 Type=simple
 ExecStart={python_path} {script_path} run
 WorkingDirectory={working_dir}
 Restart=on-failure
-RestartSec=10
+RestartSec=30
 StandardOutput=journal
 StandardError=journal

--- a/scripts/install.sh
+++ b/scripts/install.sh
@@ -577,7 +577,7 @@ clone_repo() {

            git fetch origin
            git checkout "$BRANCH"
-            git pull origin "$BRANCH"
+            git pull --ff-only origin "$BRANCH"

            if [ -n "$autostash_ref" ]; then
                local restore_now="yes"
@@ -772,6 +772,12 @@ setup_path() {
        case "$LOGIN_SHELL" in
            zsh)
                [ -f "$HOME/.zshrc" ] && SHELL_CONFIGS+=("$HOME/.zshrc")
+                [ -f "$HOME/.zprofile" ] && SHELL_CONFIGS+=("$HOME/.zprofile")
+                # If neither exists, create ~/.zshrc (common on fresh macOS installs)
+                if [ ${#SHELL_CONFIGS[@]} -eq 0 ]; then
+                    touch "$HOME/.zshrc"
+                    SHELL_CONFIGS+=("$HOME/.zshrc")
+                fi
                ;;
            bash)
                [ -f "$HOME/.bashrc" ] && SHELL_CONFIGS+=("$HOME/.bashrc")
--- a/tests/cron/test_scheduler.py
+++ b/tests/cron/test_scheduler.py
@@ -95,11 +95,58 @@ class TestResolveDeliveryTarget:
        }


-class TestDeliverResultMirrorLogging:
-    """Verify that mirror_to_session failures are logged, not silently swallowed."""
+class TestDeliverResultWrapping:
+    """Verify that cron deliveries are wrapped with header/footer and no longer mirrored."""

-    def test_mirror_failure_is_logged(self, caplog):
-        """When mirror_to_session raises, a warning should be logged."""
+    def test_delivery_wraps_content_with_header_and_footer(self):
+        """Delivered content should include task name header and agent-invisible note."""
+        from gateway.config import Platform
+
+        pconfig = MagicMock()
+        pconfig.enabled = True
+        mock_cfg = MagicMock()
+        mock_cfg.platforms = {Platform.TELEGRAM: pconfig}
+
+        with patch("gateway.config.load_gateway_config", return_value=mock_cfg), \
+             patch("tools.send_message_tool._send_to_platform", new=AsyncMock(return_value={"success": True})) as send_mock:
+            job = {
+                "id": "test-job",
+                "name": "daily-report",
+                "deliver": "origin",
+                "origin": {"platform": "telegram", "chat_id": "123"},
+            }
+            _deliver_result(job, "Here is today's summary.")
+
+        send_mock.assert_called_once()
+        sent_content = send_mock.call_args.kwargs.get("content") or send_mock.call_args[0][-1]
+        assert "Cronjob Response: daily-report" in sent_content
+        assert "-------------" in sent_content
+        assert "Here is today's summary." in sent_content
+        assert "The agent cannot see this message" in sent_content
+
+    def test_delivery_uses_job_id_when_no_name(self):
+        """When a job has no name, the wrapper should fall back to job id."""
+        from gateway.config import Platform
+
+        pconfig = MagicMock()
+        pconfig.enabled = True
+        mock_cfg = MagicMock()
+        mock_cfg.platforms = {Platform.TELEGRAM: pconfig}
+
+        with patch("gateway.config.load_gateway_config", return_value=mock_cfg), \
+             patch("tools.send_message_tool._send_to_platform", new=AsyncMock(return_value={"success": True})) as send_mock:
+            job = {
+                "id": "abc-123",
+                "deliver": "origin",
+                "origin": {"platform": "telegram", "chat_id": "123"},
+            }
+            _deliver_result(job, "Output.")
+
+        sent_content = send_mock.call_args.kwargs.get("content") or send_mock.call_args[0][-1]
+        assert "Cronjob Response: abc-123" in sent_content
+
+    def test_no_mirror_to_session_call(self):
+        """Cron deliveries should NOT mirror into the gateway session."""
        from gateway.config import Platform

        pconfig = MagicMock()
@@ -109,20 +156,18 @@ class TestDeliverResultMirrorLogging:

        with patch("gateway.config.load_gateway_config", return_value=mock_cfg), \
             patch("tools.send_message_tool._send_to_platform", new=AsyncMock(return_value={"success": True})), \
-             patch("gateway.mirror.mirror_to_session", side_effect=ConnectionError("network down")):
+             patch("gateway.mirror.mirror_to_session") as mirror_mock:
            job = {
                "id": "test-job",
                "deliver": "origin",
                "origin": {"platform": "telegram", "chat_id": "123"},
            }
-            with caplog.at_level(logging.WARNING, logger="cron.scheduler"):
-                _deliver_result(job, "Hello!")
+            _deliver_result(job, "Hello!")

-        assert any("mirror_to_session failed" in r.message for r in caplog.records), \
-            f"Expected 'mirror_to_session failed' warning in logs, got: {[r.message for r in caplog.records]}"
+        mirror_mock.assert_not_called()

    def test_origin_delivery_preserves_thread_id(self):
-        """Origin delivery should forward thread_id to send/mirror helpers."""
+        """Origin delivery should forward thread_id to the send helper."""
        from gateway.config import Platform

        pconfig = MagicMock()
@@ -132,6 +177,7 @@ class TestDeliverResultMirrorLogging:

        job = {
            "id": "test-job",
+            "name": "topic-job",
            "deliver": "origin",
            "origin": {
                "platform": "telegram",
@@ -141,19 +187,11 @@ class TestDeliverResultMirrorLogging:
        }

        with patch("gateway.config.load_gateway_config", return_value=mock_cfg), \
-             patch("tools.send_message_tool._send_to_platform", new=AsyncMock(return_value={"success": True})) as send_mock, \
-             patch("gateway.mirror.mirror_to_session") as mirror_mock:
+             patch("tools.send_message_tool._send_to_platform", new=AsyncMock(return_value={"success": True})) as send_mock:
            _deliver_result(job, "hello")

        send_mock.assert_called_once()
        assert send_mock.call_args.kwargs["thread_id"] == "17585"
-        mirror_mock.assert_called_once_with(
-            "telegram",
-            "-1001",
-            "hello",
-            source_label="cron",
-            thread_id="17585",
-        )


 class TestRunJobSessionPersistence:
--- a/tests/gateway/test_mattermost.py
+++ b/tests/gateway/test_mattermost.py
@@ -572,3 +572,102 @@ class TestMattermostRequirements:
        monkeypatch.delenv("MATTERMOST_URL", raising=False)
        from gateway.platforms.mattermost import check_mattermost_requirements
        assert check_mattermost_requirements() is False
+
+
+# ---------------------------------------------------------------------------
+# Media type propagation (MIME types, not bare strings)
+# ---------------------------------------------------------------------------
+
+class TestMattermostMediaTypes:
+    """Verify that media_types contains actual MIME types (e.g. 'image/png')
+    rather than bare category strings ('image'), so downstream
+    ``mtype.startswith("image/")`` checks in run.py work correctly."""
+
+    def setup_method(self):
+        self.adapter = _make_adapter()
+        self.adapter._bot_user_id = "bot_user_id"
+        self.adapter.handle_message = AsyncMock()
+
+    def _make_event(self, file_ids):
+        post_data = {
+            "id": "post_media",
+            "user_id": "user_123",
+            "channel_id": "chan_456",
+            "message": "file attached",
+            "file_ids": file_ids,
+        }
+        return {
+            "event": "posted",
+            "data": {
+                "post": json.dumps(post_data),
+                "channel_type": "O",
+                "sender_name": "@alice",
+            },
+        }
+
+    @pytest.mark.asyncio
+    async def test_image_media_type_is_full_mime(self):
+        """An image attachment should produce 'image/png', not 'image'."""
+        file_info = {"name": "photo.png", "mime_type": "image/png"}
+        self.adapter._api_get = AsyncMock(return_value=file_info)
+
+        mock_resp = AsyncMock()
+        mock_resp.status = 200
+        mock_resp.read = AsyncMock(return_value=b"\x89PNG fake")
+        mock_resp.__aenter__ = AsyncMock(return_value=mock_resp)
+        mock_resp.__aexit__ = AsyncMock(return_value=False)
+        self.adapter._session = MagicMock()
+        self.adapter._session.get = MagicMock(return_value=mock_resp)
+
+        with patch("gateway.platforms.base.cache_image_from_bytes", return_value="/tmp/photo.png"):
+            await self.adapter._handle_ws_event(self._make_event(["file1"]))
+
+        msg = self.adapter.handle_message.call_args[0][0]
+        assert msg.media_types == ["image/png"]
+        assert msg.media_types[0].startswith("image/")
+
+    @pytest.mark.asyncio
+    async def test_audio_media_type_is_full_mime(self):
+        """An audio attachment should produce 'audio/ogg', not 'audio'."""
+        file_info = {"name": "voice.ogg", "mime_type": "audio/ogg"}
+        self.adapter._api_get = AsyncMock(return_value=file_info)
+
+        mock_resp = AsyncMock()
+        mock_resp.status = 200
+        mock_resp.read = AsyncMock(return_value=b"OGG fake")
+        mock_resp.__aenter__ = AsyncMock(return_value=mock_resp)
+        mock_resp.__aexit__ = AsyncMock(return_value=False)
+        self.adapter._session = MagicMock()
+        self.adapter._session.get = MagicMock(return_value=mock_resp)
+
+        with patch("gateway.platforms.base.cache_audio_from_bytes", return_value="/tmp/voice.ogg"), \
+             patch("gateway.platforms.base.cache_image_from_bytes"), \
+             patch("gateway.platforms.base.cache_document_from_bytes"):
+            await self.adapter._handle_ws_event(self._make_event(["file2"]))
+
+        msg = self.adapter.handle_message.call_args[0][0]
+        assert msg.media_types == ["audio/ogg"]
+        assert msg.media_types[0].startswith("audio/")
+
+    @pytest.mark.asyncio
+    async def test_document_media_type_is_full_mime(self):
+        """A document attachment should produce 'application/pdf', not 'document'."""
+        file_info = {"name": "report.pdf", "mime_type": "application/pdf"}
+        self.adapter._api_get = AsyncMock(return_value=file_info)
+
+        mock_resp = AsyncMock()
+        mock_resp.status = 200
+        mock_resp.read = AsyncMock(return_value=b"PDF fake")
+        mock_resp.__aenter__ = AsyncMock(return_value=mock_resp)
+        mock_resp.__aexit__ = AsyncMock(return_value=False)
+        self.adapter._session = MagicMock()
+        self.adapter._session.get = MagicMock(return_value=mock_resp)
+
+        with patch("gateway.platforms.base.cache_document_from_bytes", return_value="/tmp/report.pdf"), \
+             patch("gateway.platforms.base.cache_image_from_bytes"):
+            await self.adapter._handle_ws_event(self._make_event(["file3"]))
+
+        msg = self.adapter.handle_message.call_args[0][0]
+        assert msg.media_types == ["application/pdf"]
+        assert not msg.media_types[0].startswith("image/")
+        assert not msg.media_types[0].startswith("audio/")
--- a/tests/gateway/test_telegram_conflict.py
+++ b/tests/gateway/test_telegram_conflict.py
@@ -47,8 +47,9 @@ async def test_connect_rejects_same_host_token_lock(monkeypatch):


@pytest.mark.asyncio
-async def test_polling_conflict_stops_polling_and_notifies_handler(monkeypatch):
-    adapter = TelegramAdapter(PlatformConfig(enabled=True, token="secret-token"))
+async def test_polling_conflict_retries_before_fatal(monkeypatch):
+    """A single 409 should trigger a retry, not an immediate fatal error."""
+    adapter = TelegramAdapter(PlatformConfig(enabled=True, token="***"))
    fatal_handler = AsyncMock()
    adapter.set_fatal_error_handler(fatal_handler)

@@ -69,6 +70,7 @@ async def test_polling_conflict_stops_polling_and_notifies_handler(monkeypatch):
    updater = SimpleNamespace(
        start_polling=AsyncMock(side_effect=fake_start_polling),
        stop=AsyncMock(),
+        running=True,
    )
    bot = SimpleNamespace(set_my_commands=AsyncMock())
    app = SimpleNamespace(
@@ -83,20 +85,102 @@ async def test_polling_conflict_stops_polling_and_notifies_handler(monkeypatch):
    builder.build.return_value = app
    monkeypatch.setattr("gateway.platforms.telegram.Application", SimpleNamespace(builder=MagicMock(return_value=builder)))

+    # Speed up retries for testing
+    monkeypatch.setattr("asyncio.sleep", AsyncMock())
+
    ok = await adapter.connect()

    assert ok is True
    assert callable(captured["error_callback"])

    conflict = type("Conflict", (Exception,), {})
-    captured["error_callback"](conflict("Conflict: terminated by other getUpdates request; make sure that only one bot instance is running"))

+    # First conflict: should retry, NOT be fatal
+    captured["error_callback"](conflict("Conflict: terminated by other getUpdates request"))
    await asyncio.sleep(0)
    await asyncio.sleep(0)
+    # Give the scheduled task a chance to run
+    for _ in range(10):
+        await asyncio.sleep(0)

-    assert adapter.fatal_error_code == "telegram_polling_conflict"
+    assert adapter.has_fatal_error is False, "First conflict should not be fatal"
+    assert adapter._polling_conflict_count == 0, "Count should reset after successful retry"
+
+
+@pytest.mark.asyncio
+async def test_polling_conflict_becomes_fatal_after_retries(monkeypatch):
+    """After exhausting retries, the conflict should become fatal."""
+    adapter = TelegramAdapter(PlatformConfig(enabled=True, token="***"))
+    fatal_handler = AsyncMock()
+    adapter.set_fatal_error_handler(fatal_handler)
+
+    monkeypatch.setattr(
+        "gateway.status.acquire_scoped_lock",
+        lambda scope, identity, metadata=None: (True, None),
+    )
+    monkeypatch.setattr(
+        "gateway.status.release_scoped_lock",
+        lambda scope, identity: None,
+    )
+
+    captured = {}
+
+    async def fake_start_polling(**kwargs):
+        captured["error_callback"] = kwargs["error_callback"]
+
+    # Make start_polling fail on retries to exhaust retries
+    call_count = {"n": 0}
+
+    async def failing_start_polling(**kwargs):
+        call_count["n"] += 1
+        if call_count["n"] == 1:
+            # First call (initial connect) succeeds
+            captured["error_callback"] = kwargs["error_callback"]
+        else:
+            # Retry calls fail
+            raise Exception("Connection refused")
+
+    updater = SimpleNamespace(
+        start_polling=AsyncMock(side_effect=failing_start_polling),
+        stop=AsyncMock(),
+        running=True,
+    )
+    bot = SimpleNamespace(set_my_commands=AsyncMock())
+    app = SimpleNamespace(
+        bot=bot,
+        updater=updater,
+        add_handler=MagicMock(),
+        initialize=AsyncMock(),
+        start=AsyncMock(),
+    )
+    builder = MagicMock()
+    builder.token.return_value = builder
+    builder.build.return_value = app
+    monkeypatch.setattr("gateway.platforms.telegram.Application", SimpleNamespace(builder=MagicMock(return_value=builder)))
+
+    # Speed up retries for testing
+    monkeypatch.setattr("asyncio.sleep", AsyncMock())
+
+    ok = await adapter.connect()
+    assert ok is True
+
+    conflict = type("Conflict", (Exception,), {})
+
+    # Directly call _handle_polling_conflict to avoid event-loop scheduling
+    # complexity.  Each call simulates one 409 from Telegram.
+    for i in range(4):
+        await adapter._handle_polling_conflict(
+            conflict("Conflict: terminated by other getUpdates request")
+        )
+
+    # After 3 failed retries (count 1-3 each enter the retry branch but
+    # start_polling raises), the 4th conflict pushes count to 4 which
+    # exceeds MAX_CONFLICT_RETRIES (3), entering the fatal branch.
+    assert adapter.fatal_error_code == "telegram_polling_conflict", (
+        f"Expected fatal after 4 conflicts, got code={adapter.fatal_error_code}, "
+        f"count={adapter._polling_conflict_count}"
+    )
    assert adapter.has_fatal_error is True
-    updater.stop.assert_awaited()
    fatal_handler.assert_awaited_once()


--- a/tests/gateway/test_voice_command.py
+++ b/tests/gateway/test_voice_command.py
@@ -2467,7 +2467,8 @@ class TestVoiceTTSPlayback:
        runner.adapters = {}
        return runner

-    def _call_should_reply(self, runner, voice_mode, msg_type, response="Hello", agent_msgs=None):
+    def _call_should_reply(self, runner, voice_mode, msg_type, response="Hello",
+                           agent_msgs=None, already_sent=False):
        from gateway.platforms.base import MessageType, MessageEvent, SessionSource
        from gateway.config import Platform
        runner._voice_mode["ch1"] = voice_mode
@@ -2476,28 +2477,32 @@ class TestVoiceTTSPlayback:
            user_id="1", user_name="test", chat_type="channel",
        )
        event = MessageEvent(source=source, text="test", message_type=msg_type)
-        return runner._should_send_voice_reply(event, response, agent_msgs or [])
+        return runner._should_send_voice_reply(
+            event, response, agent_msgs or [], already_sent=already_sent,
+        )
+
+    # -- Streaming OFF (existing behavior, must not change) --

    def test_voice_input_runner_skips(self):
-        """Voice input: runner skips — base adapter handles via play_tts."""
+        """Streaming OFF + voice input: runner skips — base adapter handles."""
        from gateway.platforms.base import MessageType
        runner = self._make_runner()
-        assert self._call_should_reply(runner, "all", MessageType.VOICE) is False
+        assert self._call_should_reply(runner, "all", MessageType.VOICE, already_sent=False) is False

    def test_text_input_voice_all_runner_fires(self):
-        """Text input + voice_mode=all: runner generates TTS."""
+        """Streaming OFF + text input + voice_mode=all: runner generates TTS."""
        from gateway.platforms.base import MessageType
        runner = self._make_runner()
-        assert self._call_should_reply(runner, "all", MessageType.TEXT) is True
+        assert self._call_should_reply(runner, "all", MessageType.TEXT, already_sent=False) is True

    def test_text_input_voice_off_no_tts(self):
-        """Text input + voice_mode=off: no TTS."""
+        """Streaming OFF + text input + voice_mode=off: no TTS."""
        from gateway.platforms.base import MessageType
        runner = self._make_runner()
        assert self._call_should_reply(runner, "off", MessageType.TEXT) is False

    def test_text_input_voice_only_no_tts(self):
-        """Text input + voice_mode=voice_only: no TTS for text."""
+        """Streaming OFF + text input + voice_mode=voice_only: no TTS for text."""
        from gateway.platforms.base import MessageType
        runner = self._make_runner()
        assert self._call_should_reply(runner, "voice_only", MessageType.TEXT) is False
@@ -2523,6 +2528,43 @@ class TestVoiceTTSPlayback:
        ]}]
        assert self._call_should_reply(runner, "all", MessageType.TEXT, agent_msgs=agent_msgs) is False

+    # -- Streaming ON (already_sent=True) --
+
+    def test_streaming_on_voice_input_runner_fires(self):
+        """Streaming ON + voice input: runner handles TTS (base adapter has no text)."""
+        from gateway.platforms.base import MessageType
+        runner = self._make_runner()
+        assert self._call_should_reply(runner, "all", MessageType.VOICE, already_sent=True) is True
+
+    def test_streaming_on_text_input_runner_fires(self):
+        """Streaming ON + text input: runner handles TTS (same as before)."""
+        from gateway.platforms.base import MessageType
+        runner = self._make_runner()
+        assert self._call_should_reply(runner, "all", MessageType.TEXT, already_sent=True) is True
+
+    def test_streaming_on_voice_off_no_tts(self):
+        """Streaming ON + voice_mode=off: no TTS regardless of streaming."""
+        from gateway.platforms.base import MessageType
+        runner = self._make_runner()
+        assert self._call_should_reply(runner, "off", MessageType.VOICE, already_sent=True) is False
+
+    def test_streaming_on_empty_response_no_tts(self):
+        """Streaming ON + empty response: no TTS."""
+        from gateway.platforms.base import MessageType
+        runner = self._make_runner()
+        assert self._call_should_reply(runner, "all", MessageType.VOICE, response="", already_sent=True) is False
+
+    def test_streaming_on_agent_tts_dedup(self):
+        """Streaming ON + agent called TTS: runner skips (dedup still works)."""
+        from gateway.platforms.base import MessageType
+        runner = self._make_runner()
+        agent_msgs = [{"role": "assistant", "tool_calls": [
+            {"id": "1", "type": "function", "function": {"name": "text_to_speech", "arguments": "{}"}}
+        ]}]
+        assert self._call_should_reply(
+            runner, "all", MessageType.VOICE, agent_msgs=agent_msgs, already_sent=True,
+        ) is False
+

 class TestUDPKeepalive:
    """UDP keepalive prevents Discord from dropping the voice session."""
--- a/tests/test_cli_extension_hooks.py
+++ b/tests/test_cli_extension_hooks.py
@@ -0,0 +1,138 @@
+"""Tests for protected HermesCLI TUI extension hooks.
+
+Verifies that wrapper CLIs can extend the TUI via:
+  - _get_extra_tui_widgets()
+  - _register_extra_tui_keybindings()
+  - _build_tui_layout_children()
+without overriding run().
+"""
+
+from __future__ import annotations
+
+import importlib
+import sys
+from unittest.mock import MagicMock, patch
+
+from prompt_toolkit.key_binding import KeyBindings
+
+
+def _make_cli(**kwargs):
+    """Create a HermesCLI with prompt_toolkit stubs (same pattern as test_cli_init)."""
+    _clean_config = {
+        "model": {
+            "default": "anthropic/claude-opus-4.6",
+            "base_url": "https://openrouter.ai/api/v1",
+            "provider": "auto",
+        },
+        "display": {"compact": False, "tool_progress": "all"},
+        "agent": {},
+        "terminal": {"env_type": "local"},
+    }
+    clean_env = {"LLM_MODEL": "", "HERMES_MAX_ITERATIONS": ""}
+    prompt_toolkit_stubs = {
+        "prompt_toolkit": MagicMock(),
+        "prompt_toolkit.history": MagicMock(),
+        "prompt_toolkit.styles": MagicMock(),
+        "prompt_toolkit.patch_stdout": MagicMock(),
+        "prompt_toolkit.application": MagicMock(),
+        "prompt_toolkit.layout": MagicMock(),
+        "prompt_toolkit.layout.processors": MagicMock(),
+        "prompt_toolkit.filters": MagicMock(),
+        "prompt_toolkit.layout.dimension": MagicMock(),
+        "prompt_toolkit.layout.menus": MagicMock(),
+        "prompt_toolkit.widgets": MagicMock(),
+        "prompt_toolkit.key_binding": MagicMock(),
+        "prompt_toolkit.completion": MagicMock(),
+        "prompt_toolkit.formatted_text": MagicMock(),
+        "prompt_toolkit.auto_suggest": MagicMock(),
+    }
+    with patch.dict(sys.modules, prompt_toolkit_stubs), patch.dict(
+        "os.environ", clean_env, clear=False
+    ):
+        import cli as _cli_mod
+
+        _cli_mod = importlib.reload(_cli_mod)
+        with patch.object(_cli_mod, "get_tool_definitions", return_value=[]), patch.dict(
+            _cli_mod.__dict__, {"CLI_CONFIG": _clean_config}
+        ):
+            return _cli_mod.HermesCLI(**kwargs)
+
+
+class TestExtensionHookDefaults:
+    def test_extra_tui_widgets_default_empty(self):
+        cli = _make_cli()
+        assert cli._get_extra_tui_widgets() == []
+
+    def test_register_extra_tui_keybindings_default_noop(self):
+        cli = _make_cli()
+        kb = KeyBindings()
+        result = cli._register_extra_tui_keybindings(kb, input_area=None)
+        assert result is None
+        assert kb.bindings == []
+
+    def test_build_tui_layout_children_returns_all_widgets_in_order(self):
+        cli = _make_cli()
+        children = cli._build_tui_layout_children(
+            sudo_widget="sudo",
+            secret_widget="secret",
+            approval_widget="approval",
+            clarify_widget="clarify",
+            spinner_widget="spinner",
+            spacer="spacer",
+            status_bar="status",
+            input_rule_top="top-rule",
+            image_bar="image-bar",
+            input_area="input-area",
+            input_rule_bot="bottom-rule",
+            voice_status_bar="voice-status",
+            completions_menu="completions-menu",
+        )
+        # First element is Window(height=0), rest are the named widgets
+        assert children[1:] == [
+            "sudo", "secret", "approval", "clarify", "spinner",
+            "spacer", "status", "top-rule", "image-bar", "input-area",
+            "bottom-rule", "voice-status", "completions-menu",
+        ]
+
+
+class TestExtensionHookSubclass:
+    def test_extra_widgets_inserted_before_status_bar(self):
+        cli = _make_cli()
+        # Monkey-patch to simulate subclass override
+        cli._get_extra_tui_widgets = lambda: ["radio-menu", "mini-player"]
+
+        children = cli._build_tui_layout_children(
+            sudo_widget="sudo",
+            secret_widget="secret",
+            approval_widget="approval",
+            clarify_widget="clarify",
+            spinner_widget="spinner",
+            spacer="spacer",
+            status_bar="status",
+            input_rule_top="top-rule",
+            image_bar="image-bar",
+            input_area="input-area",
+            input_rule_bot="bottom-rule",
+            voice_status_bar="voice-status",
+            completions_menu="completions-menu",
+        )
+        # Extra widgets should appear between spacer and status bar
+        spacer_idx = children.index("spacer")
+        status_idx = children.index("status")
+        assert children[spacer_idx + 1] == "radio-menu"
+        assert children[spacer_idx + 2] == "mini-player"
+        assert children[spacer_idx + 3] == "status"
+        assert status_idx == spacer_idx + 3
+
+    def test_extra_keybindings_can_add_bindings(self):
+        cli = _make_cli()
+        kb = KeyBindings()
+
+        def _custom_hook(kb, *, input_area):
+            @kb.add("f2")
+            def _toggle(event):
+                return None
+
+        cli._register_extra_tui_keybindings = _custom_hook
+        cli._register_extra_tui_keybindings(kb, input_area=None)
+        assert len(kb.bindings) == 1
--- a/tests/test_plugins.py
+++ b/tests/test_plugins.py
@@ -67,6 +67,7 @@ class TestPluginDiscovery:
        project_dir = tmp_path / "project"
        project_dir.mkdir()
        monkeypatch.chdir(project_dir)
+        monkeypatch.setenv("HERMES_ENABLE_PROJECT_PLUGINS", "true")
        plugins_dir = project_dir / ".hermes" / "plugins"
        _make_plugin_dir(plugins_dir, "proj_plugin")

@@ -76,6 +77,19 @@ class TestPluginDiscovery:
        assert "proj_plugin" in mgr._plugins
        assert mgr._plugins["proj_plugin"].enabled

+    def test_discover_project_plugins_skipped_by_default(self, tmp_path, monkeypatch):
+        """Project plugins are not discovered unless explicitly enabled."""
+        project_dir = tmp_path / "project"
+        project_dir.mkdir()
+        monkeypatch.chdir(project_dir)
+        plugins_dir = project_dir / ".hermes" / "plugins"
+        _make_plugin_dir(plugins_dir, "proj_plugin")
+
+        mgr = PluginManager()
+        mgr.discover_and_load()
+
+        assert "proj_plugin" not in mgr._plugins
+
    def test_discover_is_idempotent(self, tmp_path, monkeypatch):
        """Calling discover_and_load() twice does not duplicate plugins."""
        plugins_dir = tmp_path / "hermes_test" / "plugins"
--- a/toolsets.py
+++ b/toolsets.py
@@ -355,24 +355,27 @@ def resolve_toolset(name: str, visited: Set[str] = None) -> List[str]:
            all_tools.update(resolved)
        return list(all_tools)

-    # Check for cycles
+    # Check for cycles / already-resolved (diamond deps).
+    # Silently return [] — either this is a diamond (not a bug, tools already
+    # collected via another path) or a genuine cycle (safe to skip).
    if name in visited:
-        print(f"⚠️  Circular dependency detected in toolset '{name}'")
        return []
-    
+
    visited.add(name)
-    
+
    # Get toolset definition
    toolset = TOOLSETS.get(name)
    if not toolset:
        return []
-    
+
    # Collect direct tools
    tools = set(toolset.get("tools", []))
-    
-    # Recursively resolve included toolsets
+
+    # Recursively resolve included toolsets, sharing the visited set across
+    # sibling includes so diamond dependencies are only resolved once and
+    # cycle warnings don't fire multiple times for the same cycle.
    for included_name in toolset.get("includes", []):
-        included_tools = resolve_toolset(included_name, visited.copy())
+        included_tools = resolve_toolset(included_name, visited)
        tools.update(included_tools)
    
    return list(tools)
--- a/website/docs/developer-guide/extending-the-cli.md
+++ b/website/docs/developer-guide/extending-the-cli.md
@@ -0,0 +1,196 @@
+---
+sidebar_position: 8
+title: "Extending the CLI"
+description: "Build wrapper CLIs that extend the Hermes TUI with custom widgets, keybindings, and layout changes"
+---
+
+# Extending the CLI
+
+Hermes exposes protected extension hooks on `HermesCLI` so wrapper CLIs can add widgets, keybindings, and layout customizations without overriding the 1000+ line `run()` method. This keeps your extension decoupled from internal changes.
+
+## Extension points
+
+There are five extension seams available:
+
+| Hook | Purpose | Override when... |
+|------|---------|------------------|
+| `_get_extra_tui_widgets()` | Inject widgets into the layout | You need a persistent UI element (panel, status line, mini-player) |
+| `_register_extra_tui_keybindings(kb, *, input_area)` | Add keyboard shortcuts | You need hotkeys (toggle panels, transport controls, modal shortcuts) |
+| `_build_tui_layout_children(**widgets)` | Full control over widget ordering | You need to reorder or wrap existing widgets (rare) |
+| `process_command()` | Add custom slash commands | You need `/mycommand` handling (pre-existing hook) |
+| `_build_tui_style_dict()` | Custom prompt_toolkit styles | You need custom colors or styling (pre-existing hook) |
+
+The first three are new protected hooks. The last two already existed.
+
+## Quick start: a wrapper CLI
+
+```python
+#!/usr/bin/env python3
+"""my_cli.py — Example wrapper CLI that extends Hermes."""
+
+from cli import HermesCLI
+from prompt_toolkit.layout import FormattedTextControl, Window
+from prompt_toolkit.filters import Condition
+
+
+class MyCLI(HermesCLI):
+
+    def __init__(self, **kwargs):
+        super().__init__(**kwargs)
+        self._panel_visible = False
+
+    def _get_extra_tui_widgets(self):
+        """Add a toggleable info panel above the status bar."""
+        cli_ref = self
+        return [
+            Window(
+                FormattedTextControl(lambda: "📊 My custom panel content"),
+                height=1,
+                filter=Condition(lambda: cli_ref._panel_visible),
+            ),
+        ]
+
+    def _register_extra_tui_keybindings(self, kb, *, input_area):
+        """F2 toggles the custom panel."""
+        cli_ref = self
+
+        @kb.add("f2")
+        def _toggle_panel(event):
+            cli_ref._panel_visible = not cli_ref._panel_visible
+
+    def process_command(self, cmd: str) -> bool:
+        """Add a /panel slash command."""
+        if cmd.strip().lower() == "/panel":
+            self._panel_visible = not self._panel_visible
+            state = "visible" if self._panel_visible else "hidden"
+            print(f"Panel is now {state}")
+            return True
+        return super().process_command(cmd)
+
+
+if __name__ == "__main__":
+    cli = MyCLI()
+    cli.run()
+```
+
+Run it:
+
+```bash
+cd ~/.hermes/hermes-agent
+source .venv/bin/activate
+python my_cli.py
+```
+
+## Hook reference
+
+### `_get_extra_tui_widgets()`
+
+Returns a list of prompt_toolkit widgets to insert into the TUI layout. Widgets appear **between the spacer and the status bar** — above the input area but below the main output.
+
+```python
+def _get_extra_tui_widgets(self) -> list:
+    return []  # default: no extra widgets
+```
+
+Each widget should be a prompt_toolkit container (e.g., `Window`, `ConditionalContainer`, `HSplit`). Use `ConditionalContainer` or `filter=Condition(...)` to make widgets toggleable.
+
+```python
+from prompt_toolkit.layout import ConditionalContainer, Window, FormattedTextControl
+from prompt_toolkit.filters import Condition
+
+def _get_extra_tui_widgets(self):
+    return [
+        ConditionalContainer(
+            Window(FormattedTextControl("Status: connected"), height=1),
+            filter=Condition(lambda: self._show_status),
+        ),
+    ]
+```
+
+### `_register_extra_tui_keybindings(kb, *, input_area)`
+
+Called after Hermes registers its own keybindings and before the layout is built. Add your keybindings to `kb`.
+
+```python
+def _register_extra_tui_keybindings(self, kb, *, input_area):
+    pass  # default: no extra keybindings
+```
+
+Parameters:
+- **`kb`** — The `KeyBindings` instance for the prompt_toolkit application
+- **`input_area`** — The main `TextArea` widget, if you need to read or manipulate user input
+
+```python
+def _register_extra_tui_keybindings(self, kb, *, input_area):
+    cli_ref = self
+
+    @kb.add("f3")
+    def _clear_input(event):
+        input_area.text = ""
+
+    @kb.add("f4")
+    def _insert_template(event):
+        input_area.text = "/search "
+```
+
+**Avoid conflicts** with built-in keybindings: `Enter` (submit), `Escape Enter` (newline), `Ctrl-C` (interrupt), `Ctrl-D` (exit), `Tab` (auto-suggest accept). Function keys F2+ and Ctrl-combinations are generally safe.
+
+### `_build_tui_layout_children(**widgets)`
+
+Override this only when you need full control over widget ordering. Most extensions should use `_get_extra_tui_widgets()` instead.
+
+```python
+def _build_tui_layout_children(self, *, sudo_widget, secret_widget,
+    approval_widget, clarify_widget, spinner_widget, spacer,
+    status_bar, input_rule_top, image_bar, input_area,
+    input_rule_bot, voice_status_bar, completions_menu) -> list:
+```
+
+The default implementation returns:
+
+```python
+[
+    Window(height=0),       # anchor
+    sudo_widget,            # sudo password prompt (conditional)
+    secret_widget,          # secret input prompt (conditional)
+    approval_widget,        # dangerous command approval (conditional)
+    clarify_widget,         # clarify question UI (conditional)
+    spinner_widget,         # thinking spinner (conditional)
+    spacer,                 # fills remaining vertical space
+    *self._get_extra_tui_widgets(),  # YOUR WIDGETS GO HERE
+    status_bar,             # model/token/context status line
+    input_rule_top,         # ─── border above input
+    image_bar,              # attached images indicator
+    input_area,             # user text input
+    input_rule_bot,         # ─── border below input
+    voice_status_bar,       # voice mode status (conditional)
+    completions_menu,       # autocomplete dropdown
+]
+```
+
+## Layout diagram
+
+```
+┌─────────────────────────────────────────┐
+│ (output scrolls here)                   │
+│                                         │
+│                          spacer ────────│
+│ ★ Your extra widgets appear here ★      │
+├─────────────────────────────────────────┤
+│ ⚕ claude-sonnet-4 · 42% · 2m    status │
+├─────────────────────────────────────────┤
+│ 📎 2 images                    image bar│
+│ ❯ your input here           input area  │
+├─────────────────────────────────────────┤
+│ 🎤 Voice mode: listening   voice status │
+│ ▸ completions...         autocomplete   │
+└─────────────────────────────────────────┘
+```
+
+## Tips
+
+- **Invalidate the display** after state changes: call `self._invalidate()` to trigger a prompt_toolkit redraw.
+- **Access agent state**: `self.agent`, `self.model`, `self.conversation_history` are all available.
+- **Custom styles**: Override `_build_tui_style_dict()` and add entries for your custom style classes.
+- **Slash commands**: Override `process_command()`, handle your commands, and call `super().process_command(cmd)` for everything else.
+- **Don't override `run()`** unless absolutely necessary — the extension hooks exist specifically to avoid that coupling.
--- a/website/docs/reference/environment-variables.md
+++ b/website/docs/reference/environment-variables.md
@@ -232,6 +232,7 @@ For native Anthropic auth, Hermes prefers Claude Code's own credential files whe
 | `HERMES_QUIET` | Suppress non-essential output (`true`/`false`) |
 | `HERMES_API_TIMEOUT` | LLM API call timeout in seconds (default: `900`) |
 | `HERMES_EXEC_ASK` | Enable execution approval prompts in gateway mode (`true`/`false`) |
+| `HERMES_ENABLE_PROJECT_PLUGINS` | Enable auto-discovery of repo-local plugins from `./.hermes/plugins/` (`true`/`false`, default: `false`) |
 | `HERMES_BACKGROUND_NOTIFICATIONS` | Background process notification mode in gateway: `all` (default), `result`, `error`, `off` |
 | `HERMES_EPHEMERAL_SYSTEM_PROMPT` | Ephemeral system prompt injected at API-call time (never persisted to sessions) |

--- a/website/docs/user-guide/features/plugins.md
+++ b/website/docs/user-guide/features/plugins.md
@@ -22,6 +22,8 @@ Drop a directory into `~/.hermes/plugins/` with a `plugin.yaml` and Python code:

 Start Hermes — your tools appear alongside built-in tools. The model can call them immediately.

+Project-local plugins under `./.hermes/plugins/` are disabled by default. Enable them only for trusted repositories by setting `HERMES_ENABLE_PROJECT_PLUGINS=true` before starting Hermes.
+
 ## What plugins can do

 | Capability | How |
@@ -38,7 +40,7 @@ Start Hermes — your tools appear alongside built-in tools. The model can call
 | Source | Path | Use case |
 |--------|------|----------|
 | User | `~/.hermes/plugins/` | Personal plugins |
-| Project | `.hermes/plugins/` | Project-specific plugins |
+| Project | `.hermes/plugins/` | Project-specific plugins (requires `HERMES_ENABLE_PROJECT_PLUGINS=true`) |
 | pip | `hermes_agent.plugins` entry_points | Distributed packages |

 ## Available hooks
--- a/website/sidebars.ts
+++ b/website/sidebars.ts
@@ -129,6 +129,7 @@ const sidebars: SidebarsConfig = {
        'developer-guide/environments',
        'developer-guide/adding-tools',
        'developer-guide/creating-skills',
+        'developer-guide/extending-the-cli',
        'developer-guide/contributing',
      ],
    },
Author	SHA1	Message	Date
Teknium	c4e787d47b	feat: enable streaming by default in CLI Streaming provides a better UX — tokens appear as they arrive instead of waiting for the full response. show_reasoning remains false so thinking blocks are not streamed to the user.	2026-03-21 09:49:47 -07:00
Teknium	d70e07fc45	refactor(cli): add protected TUI extension hooks for wrapper CLIs Based on PR #1749 by @erosika (reimplemented on current main). Extracts three protected methods from run() so wrapper CLIs can extend the TUI without overriding the entire method: - _get_extra_tui_widgets(): inject widgets between spacer and status bar - _register_extra_tui_keybindings(kb, input_area): add keybindings - _build_tui_layout_children(**widgets): full control over ordering Default implementations reproduce existing layout exactly. The inline HSplit in run() now delegates to _build_tui_layout_children(). 5 tests covering defaults, widget insertion position, and keybinding registration.	2026-03-21 09:42:07 -07:00
Teknium	07112e4e98	fix(mattermost): use MIME types for media attachments (#2329 ) fix(mattermost): use MIME types for media attachments	2026-03-21 09:31:53 -07:00
Himess	bc15f6cca3	fix(mattermost): use MIME types for media attachments Bare strings like "image", "audio", "document" were appended to media_types, but downstream run.py checks mtype.startswith("image/") and mtype.startswith("audio/"), which never matched. This caused all Mattermost file attachments to be silently dropped from vision/STT processing. Use the actual MIME type from file_info instead.	2026-03-21 09:31:15 -07:00
Teknium	3921fb973c	fix(gateway): load platforms section from config.yaml for webhook routes (#2328 ) fix(gateway): load platforms section from config.yaml for webhook routes	2026-03-21 09:27:40 -07:00
Teknium	6408b4ad53	Merge pull request #2327 from NousResearch/hermes/hermes-5d6932ba fix: prevent systemd restart storm on gateway connection failure	2026-03-21 09:26:57 -07:00
Teknium	326b146d68	fix: prevent systemd restart storm on gateway connection failure Cherry-picked from PR #2319 by @itenev. When the gateway fails to connect (e.g. PrivilegedIntentsRequired, missing token), systemd's default RestartSec=10 with no start rate limit causes rapid reconnect storms flooding logs and triggering platform-side rate limits. - StartLimitIntervalSec=600 + StartLimitBurst=5 in [Unit] (max 5 restarts per 10 min) - RestartSec: 10 → 30 - Applied to both templates in gateway.py and scripts/hermes-gateway	2026-03-21 09:26:39 -07:00
dieutx	1830db0476	fix(gateway): load platforms section from config.yaml into gateway config The gateway config loader read config.yaml but never merged its `platforms` key into the runtime config dict. This meant that platform-specific settings defined under `platforms.<name>.extra` (e.g. webhook routes) were silently ignored unless the user also duplicated them in the legacy gateway.json file. Merge `yaml_cfg["platforms"]` into `gw_data["platforms"]` with a shallow deep-merge of the `extra` dict so that gateway.json defaults are preserved while config.yaml values take precedence. Closes #2305	2026-03-21 09:26:24 -07:00
Teknium	3ba6043c62	feat(compressor): major context compaction improvements (#2323 ) feat(compressor): major context compaction improvements — structured summaries, iterative updates, token-budget tail protection	2026-03-21 08:51:42 -07:00
Teknium	e75f58420c	feat(compressor): major context compaction improvements Six improvements to reduce information loss during context compression, informed by analysis of Cline, OpenCode, Pi-mono, Codex, and ClawdBot: 1. Structured summary template — sections for Goal, Progress (Done/ In Progress/Blocked), Key Decisions, Relevant Files, Next Steps, and Critical Context. Forces the summarizer to preserve each category instead of writing a vague paragraph. 2. Iterative summary updates — on re-compression, the prompt says 'PRESERVE existing info, ADD new progress, UPDATE done/in-progress status.' Previous summary is stored and fed back to the summarizer so accumulated context survives across multiple compactions. 3. Token-budget tail protection — instead of fixed protect_last_n=4, walks backward keeping ~20K tokens of recent context. Adapts to message density: sessions with big tool results protect fewer messages, short exchanges protect more. Falls back to protect_last_n for small conversations. 4. Tool output pruning (pre-pass) — before the expensive LLM summary, replaces old tool result contents with a placeholder. This is free (no LLM call) and can save 30%+ of context by itself. 5. Scaled summary budget — instead of fixed 2500 tokens, allocates 20% of compressed content tokens (clamped to 2000-8000). A 50-turn conversation gets more summary space than a 10-turn one. 6. Richer summarizer input — tool calls now include arguments (up to 500 chars) and tool results keep up to 3000 chars (was 1500). The summarizer sees 'terminal(git status) → M src/config.py' instead of just '[Tool calls: terminal]'.	2026-03-21 08:14:14 -07:00
Teknium	28bb0e770f	fix(voice): enable TTS voice reply when streaming is active (#2322 ) When streaming is enabled, the base adapter receives None from _handle_message (already_sent=True) and cannot run auto-TTS for voice input. The runner was unconditionally skipping voice input TTS assuming the base adapter would handle it. Now the runner takes over TTS responsibility when streaming has already delivered the text response, so voice channel playback works with both streaming on and off. Streaming off behavior is unchanged (default already_sent=False preserves the original code path exactly). Co-authored-by: 0xbyt4 <35742124+0xbyt4@users.noreply.github.com>	2026-03-21 08:08:37 -07:00
Teknium	06f4df52f1	fix(install): add zprofile fallback and create zshrc on fresh macOS installs (#2320 ) On macOS, zsh users may not have ~/.zshrc if they haven't customized their shell yet. The installer would silently fail to add ~/.local/bin to PATH, causing 'hermes: command not found' after installation. - Check ~/.zprofile as fallback for zsh users (macOS login shell config) - Create ~/.zshrc if neither config file exists Cherry-picked from PR #2315 by erhnysr. Co-authored-by: erhnysr <erhnysr@users.noreply.github.com>	2026-03-21 07:30:43 -07:00
Teknium	a03cbcd5f9	Merge pull request #2317 from NousResearch/hermes/hermes-5d6932ba fix(cron): close abandoned coroutine when asyncio.run() raises RuntimeError	2026-03-21 07:21:18 -07:00
Teknium	df67ae730b	fix(cron): close abandoned coroutine when asyncio.run() raises RuntimeError Cherry-picked from PR #2290 by @Mibayy. Closes #2138. When asyncio.run() raises RuntimeError (running loop exists), the coroutine was created but never awaited, producing a RuntimeWarning on GC. Extract coro before try, call coro.close() in the except branch before falling back to ThreadPoolExecutor.	2026-03-21 07:20:58 -07:00
Teknium	9305164bf3	fix: add None-entry guard to tool_calls loops in run_agent, batch_runner, and mini_swe_runner (#2316 ) Co-authored-by: Dilee <uzmpsk.dilekakbas@gmail.com>	2026-03-21 07:20:41 -07:00
Teknium	453f4c5175	Merge pull request #2312 from NousResearch/hermes/hermes-31d7db3b fix(gateway): retry Telegram 409 polling conflicts before giving up	2026-03-21 07:19:43 -07:00
Teknium	37a9979459	fix(cron): stop injecting cron outputs into gateway session history (#2313 ) Cron deliveries were mirrored into the target gateway session as assistant-role messages, causing consecutive assistant messages that violate message alternation (issue #2221). Instead of fixing the role, remove the mirror injection entirely. Cron outputs already live in their own cron session and don't belong in the interactive conversation history. Delivered messages are now wrapped with a header (task name) and a footer noting the agent cannot see or respond to the message, so users have clear context about what they're reading. Closes #2221	2026-03-21 07:18:36 -07:00
Teknium	713f2f73da	fix(agent): inject model identity for Alibaba Coding Plan (#2314 ) fix(agent): inject model identity for Alibaba Coding Plan	2026-03-21 07:11:51 -07:00
Teknium	237499d102	Merge pull request #2311 from NousResearch/hermes/hermes-5d6932ba fix(toolsets): pass visited set by reference to prevent diamond dependency duplication	2026-03-21 07:11:27 -07:00
Teknium	3f811f52fd	fix(toolsets): pass visited set by reference to prevent diamond dependency duplication Cherry-picked from PR #2292 by @Mibayy. Closes #2134. resolve_toolset() called visited.copy() per sibling include, breaking dedup for diamond dependencies (D resolved twice via B and C paths) and causing duplicate cycle warnings. Fix: pass visited directly so siblings share the same set. The .copy() for the all/* alias at the top level is kept so each top-level toolset gets an independent pass. Removes the print() cycle warning since hitting a visited name now usually means diamond (not a bug).	2026-03-21 07:11:09 -07:00
ygd58	2ea8054304	fix(agent): inject model identity for Alibaba Coding Plan to work around API returning wrong model name	2026-03-21 07:11:08 -07:00
Teknium	488a30e879	fix(gateway): retry Telegram 409 polling conflicts before giving up A single Telegram 409 Conflict from getUpdates permanently killed Telegram polling with no recovery possible (retryable=False on first occurrence). This is too aggressive for production use with process supervisors. Transient 409s are expected during: - --replace handoffs where the old long-poll session lingers on Telegram servers for a few seconds after SIGTERM - systemd Restart=on-failure respawns that overlap with the dying instance cleanup Now _handle_polling_conflict() retries up to 3 times with a 10-second delay between attempts. The 30-second total retry window lets stale server-side sessions expire. If all retries fail, the error is still marked as permanently fatal — preserving the original protection against genuine dual-instance conflicts. Tests updated: split the single conflict test into two — one verifying retry on transient conflict, one verifying fatal after exhausted retries. Closes #2296	2026-03-21 07:11:06 -07:00
Teknium	bc3f425212	Merge pull request #2309 from NousResearch/hermes/hermes-5d6932ba fix(cli): correct truncated AUXILIARY_WEB_EXTRACT_API_KEY env var name	2026-03-21 07:09:47 -07:00
Teknium	fd1d6c03cb	fix(cli): correct truncated AUXILIARY_WEB_EXTRACT_API_KEY env var name Cherry-picked from PR #2295 by @dlkakbs. The web_extract auxiliary client api_key env var was literally stored as 'AUXILI..._KEY' (dots in the source) instead of the full name. Users configuring an auxiliary web_extract model with an API key would have auth failures because the key was written to a non-existent var.	2026-03-21 07:09:28 -07:00
Teknium	58b52dfb2f	Merge pull request #2303 from NousResearch/hermes/hermes-31d7db3b fix: remove synthetic error message injection, fix session resume after repeated failures	2026-03-21 07:03:54 -07:00
Teknium	651e92fbbf	fix: use git pull --ff-only in update/install to avoid divergent branch error (#2274 ) fix: use git pull --rebase in update/install to avoid divergent branch error	2026-03-21 06:33:22 -07:00
Teknium	779619f742	fix: remove synthetic error message injection, fix session resume after repeated failures Two changes to the error handler in the agent loop: 1. Remove the 'if not pending_handled' block that injected fake [System error during processing: ...] messages into conversation history. These polluted history, burned tokens on retries, and could violate role alternation by injecting as role=user. The tool_calls error-result path (role=tool) is preserved. 2. Append the error final_response as an assistant message when hitting the iteration limit, so session resume doesn't produce consecutive user messages.	2026-03-21 06:33:05 -07:00
Teknium	96a5e9fc11	feat(agent): add summary of successful tool actions in review agent Enhanced the review agent to scan and summarize successful tool actions, providing users with a compact overview of updates made during the review process. This includes actions related to memory and user profiles, improving user feedback and interaction clarity.	2026-03-21 06:31:59 -07:00
Teknium	eb537b5db4	fix(cli): prevent multiple reasoning boxes from rendering Added a check to suppress further reasoning rendering once the response box is open, preventing potential overlap of reasoning boxes during late thinking blocks. This enhances the user experience by maintaining a clean output in the CLI.	2026-03-21 06:28:47 -07:00
Teknium	2da79b13df	feat: priority-based context file selection + CLAUDE.md support (#2301 ) Previously, all project context files (AGENTS.md, .cursorrules, .hermes.md) were loaded and concatenated into the system prompt. This bloated the prompt with potentially redundant or conflicting instructions. Now only ONE project context type is loaded, using priority order: 1. .hermes.md / HERMES.md (walk to git root) 2. AGENTS.md / agents.md (recursive directory walk) 3. CLAUDE.md / claude.md (cwd only, NEW) 4. .cursorrules / .cursor/rules/*.mdc (cwd only) SOUL.md from HERMES_HOME remains independent and always loads. Also adds CLAUDE.md as a recognized context file format, matching the convention popularized by Claude Code. Refactored the monolithic function into four focused helpers: _load_hermes_md, _load_agents_md, _load_claude_md, _load_cursorrules. Tests: replaced 1 coexistence test with 10 new tests covering priority ordering, CLAUDE.md loading, case sensitivity, injection blocking.	2026-03-21 06:26:20 -07:00
Test	870ebb8850	fix: use git pull --ff-only in update/install to avoid divergent branch error Fresh installs without pull.rebase configured hit a git error when running hermes update because git doesn't know how to reconcile divergent branches. --ff-only is the right strategy: it works for the normal case (local branch is behind remote) and fails cleanly if the user somehow has local commits, rather than silently rebasing them.	2026-03-20 22:28:55 -07:00
Test	10d719ac1b	fix(security): require opt-in for project plugin discovery	2026-03-20 20:50:30 -07:00