fix: MiniMax/Alibaba incorrectly detected as Anthropic OAuth, causing mcp_ tool prefix

_is_oauth_token() returned True for any key not starting with 'sk-ant-api', which means MiniMax and Alibaba API keys were falsely treated as Anthropic OAuth tokens. This triggered the Claude Code compatibility path: - All tool names prefixed with mcp_ (e.g. mcp_terminal, mcp_web_search) - System prompt injected with 'You are Claude Code' identity - 'Hermes Agent' replaced with 'Claude Code' throughout Fix: Make _is_oauth_token() positively identify Anthropic OAuth tokens by their key format instead of using a broad catch-all: - sk-ant-* (but not sk-ant-api-*) -> setup tokens, managed keys - eyJ* -> JWTs from Anthropic OAuth flow - Everything else -> False (MiniMax, Alibaba, etc.) Reported by stefan171.
Merge pull request #7555 from SHL0MS/feat/creative-ideation-skill
2026-04-11 00:42:36 -07:00 · 2026-04-11 02:09:17 -04:00 · 2026-04-11 01:44:36 -04:00 · 2026-04-11 01:21:47 -04:00 · 2026-04-10 21:18:34 -07:00 · 2026-04-10 21:18:34 -07:00
109 changed files with 8301 additions and 2876 deletions
@@ -161,18 +161,27 @@ def _get_claude_code_version() -> str:


 def _is_oauth_token(key: str) -> bool:
-    """Check if the key is an OAuth/setup token (not a regular Console API key).
+    """Check if the key is an Anthropic OAuth/setup token.

-    Regular API keys start with 'sk-ant-api'. Everything else (setup-tokens
-    starting with 'sk-ant-oat', managed keys, JWTs, etc.) needs Bearer auth.
+    Positively identifies Anthropic OAuth tokens by their key format:
+    - ``sk-ant-`` prefix (but NOT ``sk-ant-api``) → setup tokens, managed keys
+    - ``eyJ`` prefix → JWTs from the Anthropic OAuth flow
+
+    Non-Anthropic keys (MiniMax, Alibaba, etc.) don't match either pattern
+    and correctly return False.
    """
    if not key:
        return False
-    # Regular Console API keys use x-api-key header
+    # Regular Anthropic Console API keys — x-api-key auth, never OAuth
    if key.startswith("sk-ant-api"):
        return False
-    # Everything else (setup-tokens, managed keys, JWTs) uses Bearer auth
-    return True
+    # Anthropic-issued tokens (setup-tokens sk-ant-oat-*, managed keys)
+    if key.startswith("sk-ant-"):
+        return True
+    # JWTs from Anthropic OAuth flow
+    if key.startswith("eyJ"):
+        return True
+    return False


 def _normalize_base_url_text(base_url) -> str:
@@ -857,7 +857,7 @@ def _read_main_provider() -> str:
    return ""


-def _resolve_custom_runtime() -> Tuple[Optional[str], Optional[str]]:
+def _resolve_custom_runtime() -> Tuple[Optional[str], Optional[str], Optional[str]]:
    """Resolve the active custom/main endpoint the same way the main CLI does.

    This covers both env-driven OPENAI_BASE_URL setups and config-saved custom
@@ -870,18 +870,29 @@ def _resolve_custom_runtime() -> Tuple[Optional[str], Optional[str]]:
        runtime = resolve_runtime_provider(requested="custom")
    except Exception as exc:
        logger.debug("Auxiliary client: custom runtime resolution failed: %s", exc)
-        return None, None
+        runtime = None
+
+    if not isinstance(runtime, dict):
+        openai_base = os.getenv("OPENAI_BASE_URL", "").strip().rstrip("/")
+        openai_key = os.getenv("OPENAI_API_KEY", "").strip()
+        if not openai_base:
+            return None, None, None
+        runtime = {
+            "base_url": openai_base,
+            "api_key": openai_key,
+        }

    custom_base = runtime.get("base_url")
    custom_key = runtime.get("api_key")
+    custom_mode = runtime.get("api_mode")
    if not isinstance(custom_base, str) or not custom_base.strip():
-        return None, None
+        return None, None, None

    custom_base = custom_base.strip().rstrip("/")
    if "openrouter.ai" in custom_base.lower():
        # requested='custom' falls back to OpenRouter when no custom endpoint is
        # configured. Treat that as "no custom endpoint" for auxiliary routing.
-        return None, None
+        return None, None, None

    # Local servers (Ollama, llama.cpp, vLLM, LM Studio) don't require auth.
    # Use a placeholder key — the OpenAI SDK requires a non-empty string but
@@ -890,20 +901,33 @@ def _resolve_custom_runtime() -> Tuple[Optional[str], Optional[str]]:
    if not isinstance(custom_key, str) or not custom_key.strip():
        custom_key = "no-key-required"

-    return custom_base, custom_key.strip()
+    if not isinstance(custom_mode, str) or not custom_mode.strip():
+        custom_mode = None
+
+    return custom_base, custom_key.strip(), custom_mode


 def _current_custom_base_url() -> str:
-    custom_base, _ = _resolve_custom_runtime()
+    custom_base, _, _ = _resolve_custom_runtime()
    return custom_base or ""


 def _try_custom_endpoint() -> Tuple[Optional[OpenAI], Optional[str]]:
-    custom_base, custom_key = _resolve_custom_runtime()
+    runtime = _resolve_custom_runtime()
+    if len(runtime) == 2:
+        custom_base, custom_key = runtime
+        custom_mode = None
+    else:
+        custom_base, custom_key, custom_mode = runtime
    if not custom_base or not custom_key:
        return None, None
+    if custom_base.lower().startswith(_CODEX_AUX_BASE_URL.lower()):
+        return None, None
    model = _read_main_model() or "gpt-4o-mini"
-    logger.debug("Auxiliary client: custom endpoint (%s)", model)
+    logger.debug("Auxiliary client: custom endpoint (%s, api_mode=%s)", model, custom_mode or "chat_completions")
+    if custom_mode == "codex_responses":
+        real_client = OpenAI(api_key=custom_key, base_url=custom_base)
+        return CodexAuxiliaryClient(real_client, model), model
    return OpenAI(api_key=custom_key, base_url=custom_base), model


@@ -1401,6 +1425,23 @@ def resolve_provider_client(

        client = OpenAI(api_key=api_key, base_url=base_url,
                        **({"default_headers": headers} if headers else {}))
+
+        # Copilot GPT-5+ models (except gpt-5-mini) require the Responses
+        # API — they are not accessible via /chat/completions.  Wrap the
+        # plain client in CodexAuxiliaryClient so call_llm() transparently
+        # routes through responses.stream().
+        if provider == "copilot" and final_model and not raw_codex:
+            try:
+                from hermes_cli.models import _should_use_copilot_responses_api
+                if _should_use_copilot_responses_api(final_model):
+                    logger.debug(
+                        "resolve_provider_client: copilot model %s needs "
+                        "Responses API — wrapping with CodexAuxiliaryClient",
+                        final_model)
+                    client = CodexAuxiliaryClient(client, final_model)
+            except ImportError:
+                pass
+
        logger.debug("resolve_provider_client: %s (%s)", provider, final_model)
        return (_to_async_client(client, final_model) if async_mode
                else (client, final_model))
@@ -18,6 +18,7 @@ import time
 from typing import Any, Dict, List, Optional

 from agent.auxiliary_client import call_llm
+from agent.context_engine import ContextEngine
 from agent.model_metadata import (
    get_model_context_length,
    estimate_messages_tokens_rough,
@@ -50,8 +51,8 @@ _CHARS_PER_TOKEN = 4
 _SUMMARY_FAILURE_COOLDOWN_SECONDS = 600


-class ContextCompressor:
-    """Compresses conversation context when approaching the model's context limit.
+class ContextCompressor(ContextEngine):
+    """Default context engine — compresses conversation context via lossy summarization.

    Algorithm:
      1. Prune old tool results (cheap, no LLM call)
@@ -61,6 +62,33 @@ class ContextCompressor:
      5. On subsequent compactions, iteratively update the previous summary
    """

+    @property
+    def name(self) -> str:
+        return "compressor"
+
+    def on_session_reset(self) -> None:
+        """Reset all per-session state for /new or /reset."""
+        super().on_session_reset()
+        self._context_probed = False
+        self._context_probe_persistable = False
+        self._previous_summary = None
+
+    def update_model(
+        self,
+        model: str,
+        context_length: int,
+        base_url: str = "",
+        api_key: str = "",
+        provider: str = "",
+    ) -> None:
+        """Update model info after a model switch or fallback activation."""
+        self.model = model
+        self.base_url = base_url
+        self.api_key = api_key
+        self.provider = provider
+        self.context_length = context_length
+        self.threshold_tokens = int(context_length * self.threshold_percent)
+
    def __init__(
        self,
        model: str,
@@ -0,0 +1,184 @@
+"""Abstract base class for pluggable context engines.
+
+A context engine controls how conversation context is managed when
+approaching the model's token limit. The built-in ContextCompressor
+is the default implementation. Third-party engines (e.g. LCM) can
+replace it via the plugin system or by being placed in the
+``plugins/context_engine/<name>/`` directory.
+
+Selection is config-driven: ``context.engine`` in config.yaml.
+Default is ``"compressor"`` (the built-in). Only one engine is active.
+
+The engine is responsible for:
+  - Deciding when compaction should fire
+  - Performing compaction (summarization, DAG construction, etc.)
+  - Optionally exposing tools the agent can call (e.g. lcm_grep)
+  - Tracking token usage from API responses
+
+Lifecycle:
+  1. Engine is instantiated and registered (plugin register() or default)
+  2. on_session_start() called when a conversation begins
+  3. update_from_response() called after each API response with usage data
+  4. should_compress() checked after each turn
+  5. compress() called when should_compress() returns True
+  6. on_session_end() called at real session boundaries (CLI exit, /reset,
+     gateway session expiry) — NOT per-turn
+"""
+
+from abc import ABC, abstractmethod
+from typing import Any, Dict, List, Optional
+
+
+class ContextEngine(ABC):
+    """Base class all context engines must implement."""
+
+    # -- Identity ----------------------------------------------------------
+
+    @property
+    @abstractmethod
+    def name(self) -> str:
+        """Short identifier (e.g. 'compressor', 'lcm')."""
+
+    # -- Token state (read by run_agent.py for display/logging) ------------
+    #
+    # Engines MUST maintain these. run_agent.py reads them directly.
+
+    last_prompt_tokens: int = 0
+    last_completion_tokens: int = 0
+    last_total_tokens: int = 0
+    threshold_tokens: int = 0
+    context_length: int = 0
+    compression_count: int = 0
+
+    # -- Compaction parameters (read by run_agent.py for preflight) --------
+    #
+    # These control the preflight compression check.  Subclasses may
+    # override via __init__ or property; defaults are sensible for most
+    # engines.
+
+    threshold_percent: float = 0.75
+    protect_first_n: int = 3
+    protect_last_n: int = 6
+
+    # -- Core interface ----------------------------------------------------
+
+    @abstractmethod
+    def update_from_response(self, usage: Dict[str, Any]) -> None:
+        """Update tracked token usage from an API response.
+
+        Called after every LLM call with the usage dict from the response.
+        """
+
+    @abstractmethod
+    def should_compress(self, prompt_tokens: int = None) -> bool:
+        """Return True if compaction should fire this turn."""
+
+    @abstractmethod
+    def compress(
+        self,
+        messages: List[Dict[str, Any]],
+        current_tokens: int = None,
+    ) -> List[Dict[str, Any]]:
+        """Compact the message list and return the new message list.
+
+        This is the main entry point. The engine receives the full message
+        list and returns a (possibly shorter) list that fits within the
+        context budget. The implementation is free to summarize, build a
+        DAG, or do anything else — as long as the returned list is a valid
+        OpenAI-format message sequence.
+        """
+
+    # -- Optional: pre-flight check ----------------------------------------
+
+    def should_compress_preflight(self, messages: List[Dict[str, Any]]) -> bool:
+        """Quick rough check before the API call (no real token count yet).
+
+        Default returns False (skip pre-flight). Override if your engine
+        can do a cheap estimate.
+        """
+        return False
+
+    # -- Optional: session lifecycle ---------------------------------------
+
+    def on_session_start(self, session_id: str, **kwargs) -> None:
+        """Called when a new conversation session begins.
+
+        Use this to load persisted state (DAG, store) for the session.
+        kwargs may include hermes_home, platform, model, etc.
+        """
+
+    def on_session_end(self, session_id: str, messages: List[Dict[str, Any]]) -> None:
+        """Called at real session boundaries (CLI exit, /reset, gateway expiry).
+
+        Use this to flush state, close DB connections, etc.
+        NOT called per-turn — only when the session truly ends.
+        """
+
+    def on_session_reset(self) -> None:
+        """Called on /new or /reset. Reset per-session state.
+
+        Default resets compression_count and token tracking.
+        """
+        self.last_prompt_tokens = 0
+        self.last_completion_tokens = 0
+        self.last_total_tokens = 0
+        self.compression_count = 0
+
+    # -- Optional: tools ---------------------------------------------------
+
+    def get_tool_schemas(self) -> List[Dict[str, Any]]:
+        """Return tool schemas this engine provides to the agent.
+
+        Default returns empty list (no tools). LCM would return schemas
+        for lcm_grep, lcm_describe, lcm_expand here.
+        """
+        return []
+
+    def handle_tool_call(self, name: str, args: Dict[str, Any], **kwargs) -> str:
+        """Handle a tool call from the agent.
+
+        Only called for tool names returned by get_tool_schemas().
+        Must return a JSON string.
+
+        kwargs may include:
+          messages: the current in-memory message list (for live ingestion)
+        """
+        import json
+        return json.dumps({"error": f"Unknown context engine tool: {name}"})
+
+    # -- Optional: status / display ----------------------------------------
+
+    def get_status(self) -> Dict[str, Any]:
+        """Return status dict for display/logging.
+
+        Default returns the standard fields run_agent.py expects.
+        """
+        return {
+            "last_prompt_tokens": self.last_prompt_tokens,
+            "threshold_tokens": self.threshold_tokens,
+            "context_length": self.context_length,
+            "usage_percent": (
+                min(100, self.last_prompt_tokens / self.context_length * 100)
+                if self.context_length else 0
+            ),
+            "compression_count": self.compression_count,
+        }
+
+    # -- Optional: model switch support ------------------------------------
+
+    def update_model(
+        self,
+        model: str,
+        context_length: int,
+        base_url: str = "",
+        api_key: str = "",
+        provider: str = "",
+    ) -> None:
+        """Called when the user switches models or on fallback activation.
+
+        Default updates context_length and recalculates threshold_tokens
+        from threshold_percent. Override if your engine needs more
+        (e.g. recalculate DAG budgets, switch summary models).
+        """
+        self.context_length = context_length
+        self.threshold_tokens = int(context_length * self.threshold_percent)
@@ -0,0 +1,49 @@
+"""User-facing summaries for manual compression commands."""
+
+from __future__ import annotations
+
+from typing import Any, Sequence
+
+
+def summarize_manual_compression(
+    before_messages: Sequence[dict[str, Any]],
+    after_messages: Sequence[dict[str, Any]],
+    before_tokens: int,
+    after_tokens: int,
+) -> dict[str, Any]:
+    """Return consistent user-facing feedback for manual compression."""
+    before_count = len(before_messages)
+    after_count = len(after_messages)
+    noop = list(after_messages) == list(before_messages)
+
+    if noop:
+        headline = f"No changes from compression: {before_count} messages"
+        if after_tokens == before_tokens:
+            token_line = (
+                f"Rough transcript estimate: ~{before_tokens:,} tokens (unchanged)"
+            )
+        else:
+            token_line = (
+                f"Rough transcript estimate: ~{before_tokens:,} → "
+                f"~{after_tokens:,} tokens"
+            )
+    else:
+        headline = f"Compressed: {before_count} → {after_count} messages"
+        token_line = (
+            f"Rough transcript estimate: ~{before_tokens:,} → "
+            f"~{after_tokens:,} tokens"
+        )
+
+    note = None
+    if not noop and after_count < before_count and after_tokens > before_tokens:
+        note = (
+            "Note: fewer messages can still raise this rough transcript estimate "
+            "when compression rewrites the transcript into denser summaries."
+        )
+
+    return {
+        "noop": noop,
+        "headline": headline,
+        "token_line": token_line,
+        "note": note,
+    }
@@ -487,7 +487,7 @@ def _parse_skill_file(skill_file: Path) -> tuple[bool, dict, str]:
    (True, {}, "") to err on the side of showing the skill.
    """
    try:
-        raw = skill_file.read_text(encoding="utf-8")[:2000]
+        raw = skill_file.read_text(encoding="utf-8")
        frontmatter, _ = parse_frontmatter(raw)

        if not skill_matches_platform(frontmatter):
@@ -495,7 +495,7 @@ def _parse_skill_file(skill_file: Path) -> tuple[bool, dict, str]:

        return True, frontmatter, extract_skill_description(frontmatter)
    except Exception as e:
-        logger.debug("Failed to parse skill file %s: %s", skill_file, e)
+        logger.warning("Failed to parse skill file %s: %s", skill_file, e)
        return True, {}, ""


@@ -558,9 +558,10 @@ def build_skills_system_prompt(
    # ── Layer 1: in-process LRU cache ─────────────────────────────────
    # Include the resolved platform so per-platform disabled-skill lists
    # produce distinct cache entries (gateway serves multiple platforms).
+    from gateway.session_context import get_session_env
    _platform_hint = (
        os.environ.get("HERMES_PLATFORM")
-        or os.environ.get("HERMES_SESSION_PLATFORM")
+        or get_session_env("HERMES_SESSION_PLATFORM")
        or ""
    )
    cache_key = (
@@ -145,10 +145,11 @@ def get_disabled_skill_names(platform: str | None = None) -> Set[str]:
    if not isinstance(skills_cfg, dict):
        return set()

+    from gateway.session_context import get_session_env
    resolved_platform = (
        platform
        or os.getenv("HERMES_PLATFORM")
-        or os.getenv("HERMES_SESSION_PLATFORM")
+        or get_session_env("HERMES_SESSION_PLATFORM")
    )
    if resolved_platform:
        platform_disabled = (skills_cfg.get("platform_disabled") or {}).get(
@@ -480,6 +480,12 @@ agent:
  # Fires once per run when inactivity reaches this threshold (seconds).
  # Set to 0 to disable the warning.
  # gateway_timeout_warning: 900
+
+  # Graceful drain timeout for gateway stop/restart (seconds).
+  # The gateway stops accepting new work, waits for in-flight agents to
+  # finish, then interrupts anything still running after this timeout.
+  # 0 = no drain, interrupt immediately.
+  # restart_drain_timeout: 60
  
  # Enable verbose logging
  verbose: false
@@ -5835,21 +5835,29 @@ class HermesCLI:
        original_count = len(self.conversation_history)
        try:
            from agent.model_metadata import estimate_messages_tokens_rough
-            approx_tokens = estimate_messages_tokens_rough(self.conversation_history)
+            from agent.manual_compression_feedback import summarize_manual_compression
+            original_history = list(self.conversation_history)
+            approx_tokens = estimate_messages_tokens_rough(original_history)
            print(f"🗜️  Compressing {original_count} messages (~{approx_tokens:,} tokens)...")

-            compressed, _new_system = self.agent._compress_context(
-                self.conversation_history,
+            compressed, _ = self.agent._compress_context(
+                original_history,
                self.agent._cached_system_prompt or "",
                approx_tokens=approx_tokens,
            )
            self.conversation_history = compressed
-            new_count = len(self.conversation_history)
            new_tokens = estimate_messages_tokens_rough(self.conversation_history)
-            print(
-                f"  ✅ Compressed: {original_count} → {new_count} messages "
-                f"(~{approx_tokens:,} → ~{new_tokens:,} tokens)"
+            summary = summarize_manual_compression(
+                original_history,
+                self.conversation_history,
+                approx_tokens,
+                new_tokens,
            )
+            icon = "🗜️" if summary["noop"] else "✅"
+            print(f"  {icon} {summary['headline']}")
+            print(f"     {summary['token_line']}")
+            if summary["note"]:
+                print(f"     {summary['note']}")

        except Exception as e:
            print(f"  ❌ Compression failed: {e}")
@@ -31,7 +31,7 @@ except ImportError:
 # Configuration
 # =============================================================================

-HERMES_DIR = get_hermes_home()
+HERMES_DIR = get_hermes_home().resolve()
 CRON_DIR = HERMES_DIR / "cron"
 JOBS_FILE = CRON_DIR / "jobs.json"
 OUTPUT_DIR = CRON_DIR / "output"
@@ -338,10 +338,12 @@ def load_jobs() -> List[Dict[str, Any]]:
                    save_jobs(jobs)
                    logger.warning("Auto-repaired jobs.json (had invalid control characters)")
                return jobs
-        except Exception:
-            return []
-    except IOError:
-        return []
+        except Exception as e:
+            logger.error("Failed to auto-repair jobs.json: %s", e)
+            raise RuntimeError(f"Cron database corrupted and unrepairable: {e}") from e
+    except IOError as e:
+        logger.error("IOError reading jobs.json: %s", e)
+        raise RuntimeError(f"Failed to read cron database: {e}") from e


 def save_jobs(jobs: List[Dict[str, Any]]):
@@ -452,6 +454,7 @@ def create_job(
        "last_run_at": None,
        "last_status": None,
        "last_error": None,
+        "last_delivery_error": None,
        # Delivery configuration
        "deliver": deliver,
        "origin": origin,  # Tracks where job was created for "origin" delivery
@@ -620,8 +623,8 @@ def mark_job_run(job_id: str, success: bool, error: Optional[str] = None,

            save_jobs(jobs)
            return
-    
-    save_jobs(jobs)
+
+    logger.warning("mark_job_run: job_id %s not found, skipping save", job_id)


 def advance_next_run(job_id: str) -> bool:
@@ -769,7 +769,7 @@ def run_job(job: dict) -> tuple[bool, str, str, Optional[str]]:
            _cron_pool.shutdown(wait=False, cancel_futures=True)
            raise
        finally:
-            _cron_pool.shutdown(wait=False)
+            _cron_pool.shutdown(wait=False, cancel_futures=True)

        if _inactivity_timeout:
            # Build diagnostic summary from the agent's activity tracker.
@@ -12,7 +12,7 @@ INSTALL_DIR="/opt/hermes"
 # The "home/" subdirectory is a per-profile HOME for subprocesses (git,
 # ssh, gh, npm …).  Without it those tools write to /root which is
 # ephemeral and shared across profiles.  See issue #4426.
-mkdir -p "$HERMES_HOME"/{cron,sessions,logs,hooks,memories,skills,home}
+mkdir -p "$HERMES_HOME"/{cron,sessions,logs,hooks,memories,skills,skins,plans,workspace,home}

 # .env
 if [ ! -f "$HERMES_HOME/.env" ]; then
@@ -76,10 +76,15 @@ def build_channel_directory(adapters: Dict[Any, Any]) -> Dict[str, Any]:
        except Exception as e:
            logger.warning("Channel directory: failed to build %s: %s", platform.value, e)

-    # Telegram, WhatsApp & Signal can't enumerate chats -- pull from session history
-    for plat_name in ("telegram", "whatsapp", "signal", "weixin", "email", "sms", "bluebubbles"):
-        if plat_name not in platforms:
-            platforms[plat_name] = _build_from_sessions(plat_name)
+    # Platforms that don't support direct channel enumeration get session-based
+    # discovery automatically.  Skip infrastructure entries that aren't messaging
+    # platforms — everything else falls through to _build_from_sessions().
+    _SKIP_SESSION_DISCOVERY = frozenset({"local", "api_server", "webhook"})
+    for plat in Platform:
+        plat_name = plat.value
+        if plat_name in _SKIP_SESSION_DISCOVERY or plat_name in platforms:
+            continue
+        platforms[plat_name] = _build_from_sessions(plat_name)

    directory = {
        "updated_at": datetime.now().isoformat(),
@@ -642,6 +642,8 @@ def load_gateway_config() -> GatewayConfig:
                    os.environ["MATRIX_FREE_RESPONSE_ROOMS"] = str(frc)
                if "auto_thread" in matrix_cfg and not os.getenv("MATRIX_AUTO_THREAD"):
                    os.environ["MATRIX_AUTO_THREAD"] = str(matrix_cfg["auto_thread"]).lower()
+                if "dm_mention_threads" in matrix_cfg and not os.getenv("MATRIX_DM_MENTION_THREADS"):
+                    os.environ["MATRIX_DM_MENTION_THREADS"] = str(matrix_cfg["dm_mention_threads"]).lower()

    except Exception as e:
        logger.warning(
@@ -25,6 +25,7 @@ import hmac
 import json
 import logging
 import os
+import socket as _socket
 import re
 import sqlite3
 import time
@@ -42,6 +43,7 @@ from gateway.config import Platform, PlatformConfig
 from gateway.platforms.base import (
    BasePlatformAdapter,
    SendResult,
+    is_network_accessible,
 )

 logger = logging.getLogger(__name__)
@@ -406,7 +408,8 @@ class APIServerAdapter(BasePlatformAdapter):
        Validate Bearer token from Authorization header.

        Returns None if auth is OK, or a 401 web.Response on failure.
-        If no API key is configured, all requests are allowed.
+        If no API key is configured, all requests are allowed (only when API
+        server is local).
        """
        if not self._api_key:
            return None  # No key configured — allow all (local-only use)
@@ -641,15 +644,35 @@ class APIServerAdapter(BasePlatformAdapter):
                    _stream_q.put(delta)

            def _on_tool_progress(event_type, name, preview, args, **kwargs):
-                """Inject tool progress into the SSE stream for Open WebUI."""
+                """Send tool progress as a separate SSE event.
+
+                Previously, progress markers like ``⏰ list`` were injected
+                directly into ``delta.content``.  OpenAI-compatible frontends
+                (Open WebUI, LobeChat, …) store ``delta.content`` verbatim as
+                the assistant message and send it back on subsequent requests.
+                After enough turns the model learns to *emit* the markers as
+                plain text instead of issuing real tool calls — silently
+                hallucinating tool results.  See #6972.
+
+                The fix: push a tagged tuple ``("__tool_progress__", payload)``
+                onto the stream queue.  The SSE writer emits it as a custom
+                ``event: hermes.tool.progress`` line that compliant frontends
+                can render for UX but will *not* persist into conversation
+                history.  Clients that don't understand the custom event type
+                silently ignore it per the SSE specification.
+                """
                if event_type != "tool.started":
-                    return  # Only show tool start events in chat stream
+                    return
                if name.startswith("_"):
-                    return  # Skip internal events (_thinking)
+                    return
                from agent.display import get_tool_emoji
                emoji = get_tool_emoji(name)
                label = preview or name
-                _stream_q.put(f"\n`{emoji} {label}`\n")
+                _stream_q.put(("__tool_progress__", {
+                    "tool": name,
+                    "emoji": emoji,
+                    "label": label,
+                }))

            # Start agent in background.  agent_ref is a mutable container
            # so the SSE writer can interrupt the agent on client disconnect.
@@ -760,6 +783,29 @@ class APIServerAdapter(BasePlatformAdapter):
            }
            await response.write(f"data: {json.dumps(role_chunk)}\n\n".encode())

+            # Helper — route a queue item to the correct SSE event.
+            async def _emit(item):
+                """Write a single queue item to the SSE stream.
+
+                Plain strings are sent as normal ``delta.content`` chunks.
+                Tagged tuples ``("__tool_progress__", payload)`` are sent
+                as a custom ``event: hermes.tool.progress`` SSE event so
+                frontends can display them without storing the markers in
+                conversation history.  See #6972.
+                """
+                if isinstance(item, tuple) and len(item) == 2 and item[0] == "__tool_progress__":
+                    event_data = json.dumps(item[1])
+                    await response.write(
+                        f"event: hermes.tool.progress\ndata: {event_data}\n\n".encode()
+                    )
+                else:
+                    content_chunk = {
+                        "id": completion_id, "object": "chat.completion.chunk",
+                        "created": created, "model": model,
+                        "choices": [{"index": 0, "delta": {"content": item}, "finish_reason": None}],
+                    }
+                    await response.write(f"data: {json.dumps(content_chunk)}\n\n".encode())
+
            # Stream content chunks as they arrive from the agent
            loop = asyncio.get_event_loop()
            while True:
@@ -773,12 +819,7 @@ class APIServerAdapter(BasePlatformAdapter):
                                delta = stream_q.get_nowait()
                                if delta is None:
                                    break
-                                content_chunk = {
-                                    "id": completion_id, "object": "chat.completion.chunk",
-                                    "created": created, "model": model,
-                                    "choices": [{"index": 0, "delta": {"content": delta}, "finish_reason": None}],
-                                }
-                                await response.write(f"data: {json.dumps(content_chunk)}\n\n".encode())
+                                await _emit(delta)
                            except _q.Empty:
                                break
                        break
@@ -787,12 +828,7 @@ class APIServerAdapter(BasePlatformAdapter):
                if delta is None:  # End of stream sentinel
                    break

-                content_chunk = {
-                    "id": completion_id, "object": "chat.completion.chunk",
-                    "created": created, "model": model,
-                    "choices": [{"index": 0, "delta": {"content": delta}, "finish_reason": None}],
-                }
-                await response.write(f"data: {json.dumps(content_chunk)}\n\n".encode())
+                await _emit(delta)

            # Get usage from completed agent
            usage = {"input_tokens": 0, "output_tokens": 0, "total_tokens": 0}
@@ -1713,8 +1749,16 @@ class APIServerAdapter(BasePlatformAdapter):
            if hasattr(sweep_task, "add_done_callback"):
                sweep_task.add_done_callback(self._background_tasks.discard)

+            # Refuse to start network-accessible without authentication
+            if is_network_accessible(self._host) and not self._api_key:
+                logger.error(
+                    "[%s] Refusing to start: binding to %s requires API_SERVER_KEY. "
+                    "Set API_SERVER_KEY or use the default 127.0.0.1.",
+                    self.name, self._host,
+                )
+                return False
+
            # Port conflict detection — fail fast if port is already in use
-            import socket as _socket
            try:
                with _socket.socket(_socket.AF_INET, _socket.SOCK_STREAM) as _s:
                    _s.settimeout(1)
@@ -6,10 +6,12 @@ and implement the required methods.
 """

 import asyncio
+import ipaddress
 import logging
 import os
 import random
 import re
+import socket as _socket
 import subprocess
 import sys
 import uuid
@@ -19,6 +21,41 @@ from urllib.parse import urlsplit
 logger = logging.getLogger(__name__)


+def is_network_accessible(host: str) -> bool:
+    """Return True if *host* would expose the server beyond loopback.
+
+    Loopback addresses (127.0.0.1, ::1, IPv4-mapped ::ffff:127.0.0.1)
+    are local-only.  Unspecified addresses (0.0.0.0, ::) bind all
+    interfaces.  Hostnames are resolved; DNS failure fails closed.
+    """
+    try:
+        addr = ipaddress.ip_address(host)
+        if addr.is_loopback:
+            return False
+        # ::ffff:127.0.0.1 — Python reports is_loopback=False for mapped
+        # addresses, so check the underlying IPv4 explicitly.
+        if getattr(addr, "ipv4_mapped", None) and addr.ipv4_mapped.is_loopback:
+            return False
+        return True
+    except ValueError:
+        # when host variable is a hostname, we should try to resolve below
+        pass
+
+    try:
+        resolved = _socket.getaddrinfo(
+            host, None, _socket.AF_UNSPEC, _socket.SOCK_STREAM,
+        )
+        # if the hostname resolves into at least one non-loopback address,
+        # then we consider it to be network accessible
+        for _family, _type, _proto, _canonname, sockaddr in resolved:
+            addr = ipaddress.ip_address(sockaddr[0])
+            if not addr.is_loopback:
+                return True
+        return False
+    except (_socket.gaierror, OSError):
+        return True
+
+
 def _detect_macos_system_proxy() -> str | None:
    """Read the macOS system HTTP(S) proxy via ``scutil --proxy``.

@@ -636,6 +673,32 @@ class SendResult:
    retryable: bool = False  # True for transient connection errors — base will retry automatically


+def merge_pending_message_event(
+    pending_messages: Dict[str, MessageEvent],
+    session_key: str,
+    event: MessageEvent,
+) -> None:
+    """Store or merge a pending event for a session.
+
+    Photo bursts/albums often arrive as multiple near-simultaneous PHOTO
+    events. Merge those into the existing queued event so the next turn sees
+    the whole burst, while non-photo follow-ups still replace the pending
+    event normally.
+    """
+    existing = pending_messages.get(session_key)
+    if (
+        existing
+        and getattr(existing, "message_type", None) == MessageType.PHOTO
+        and event.message_type == MessageType.PHOTO
+    ):
+        existing.media_urls.extend(event.media_urls)
+        existing.media_types.extend(event.media_types)
+        if event.text:
+            existing.text = BasePlatformAdapter._merge_caption(existing.text, event.text)
+        return
+    pending_messages[session_key] = event
+
+
 # Error substrings that indicate a transient *connection* failure worth retrying.
 # "timeout" / "timed out" / "readtimeout" / "writetimeout" are intentionally
 # excluded: a read/write timeout on a non-idempotent call (e.g. send_message)
@@ -690,6 +753,7 @@ class BasePlatformAdapter(ABC):
        # working on a task after --replace or manual restarts.
        self._background_tasks: set[asyncio.Task] = set()
        self._expected_cancelled_tasks: set[asyncio.Task] = set()
+        self._busy_session_handler: Optional[Callable[[MessageEvent, str], Awaitable[bool]]] = None
        # Chats where auto-TTS on voice input is disabled (set by /voice off)
        self._auto_tts_disabled_chats: set = set()
        # Chats where typing indicator is paused (e.g. during approval waits).
@@ -778,6 +842,10 @@ class BasePlatformAdapter(ABC):
        an optional response string.
        """
        self._message_handler = handler
+
+    def set_busy_session_handler(self, handler: Optional[Callable[[MessageEvent, str], Awaitable[bool]]]) -> None:
+        """Set an optional handler for messages arriving during active sessions."""
+        self._busy_session_handler = handler
    
    def set_session_store(self, session_store: Any) -> None:
        """
@@ -1359,7 +1427,7 @@ class BasePlatformAdapter(ABC):
            # session lifecycle and its cleanup races with the running task
            # (see PR #4926).
            cmd = event.get_command()
-            if cmd in ("approve", "deny", "status", "stop", "new", "reset", "background"):
+            if cmd in ("approve", "deny", "status", "stop", "new", "reset", "background", "restart"):
                logger.debug(
                    "[%s] Command '/%s' bypassing active-session guard for %s",
                    self.name, cmd, session_key,
@@ -1378,19 +1446,19 @@ class BasePlatformAdapter(ABC):
                    logger.error("[%s] Command '/%s' dispatch failed: %s", self.name, cmd, e, exc_info=True)
                return

+            if self._busy_session_handler is not None:
+                try:
+                    if await self._busy_session_handler(event, session_key):
+                        return
+                except Exception as e:
+                    logger.error("[%s] Busy-session handler failed: %s", self.name, e, exc_info=True)
+
            # Special case: photo bursts/albums frequently arrive as multiple near-
            # simultaneous messages. Queue them without interrupting the active run,
            # then process them immediately after the current task finishes.
            if event.message_type == MessageType.PHOTO:
                logger.debug("[%s] Queuing photo follow-up for session %s without interrupt", self.name, session_key)
-                existing = self._pending_messages.get(session_key)
-                if existing and existing.message_type == MessageType.PHOTO:
-                    existing.media_urls.extend(event.media_urls)
-                    existing.media_types.extend(event.media_types)
-                    if event.text:
-                        existing.text = self._merge_caption(existing.text, event.text)
-                else:
-                    self._pending_messages[session_key] = event
+                merge_pending_message_event(self._pending_messages, session_key, event)
                return  # Don't interrupt now - will run after current task completes

            # Default behavior for non-photo follow-ups: interrupt the running agent
@@ -1190,6 +1190,8 @@ class FeishuAdapter(BasePlatformAdapter):
                lambda data: self._on_reaction_event("im.message.reaction.deleted_v1", data)
            )
            .register_p2_card_action_trigger(self._on_card_action_trigger)
+            .register_p2_im_chat_member_bot_added_v1(self._on_bot_added_to_chat)
+            .register_p2_im_chat_member_bot_deleted_v1(self._on_bot_removed_from_chat)
            .build()
        )

@@ -0,0 +1,20 @@
+"""Shared gateway restart constants and parsing helpers."""
+
+from hermes_cli.config import DEFAULT_CONFIG
+
+# EX_TEMPFAIL from sysexits.h — used to ask the service manager to restart
+# the gateway after a graceful drain/reload path completes.
+GATEWAY_SERVICE_RESTART_EXIT_CODE = 75
+
+DEFAULT_GATEWAY_RESTART_DRAIN_TIMEOUT = float(
+    DEFAULT_CONFIG["agent"]["restart_drain_timeout"]
+)
+
+
+def parse_restart_drain_timeout(raw: object) -> float:
+    """Parse a configured drain timeout, falling back to the shared default."""
+    try:
+        value = float(raw) if str(raw or "").strip() else DEFAULT_GATEWAY_RESTART_DRAIN_TIMEOUT
+    except (TypeError, ValueError):
+        return DEFAULT_GATEWAY_RESTART_DRAIN_TIMEOUT
+    return max(0.0, value)
@@ -186,6 +186,12 @@ if _config_path.exists():
                os.environ["HERMES_AGENT_TIMEOUT"] = str(_agent_cfg["gateway_timeout"])
            if "gateway_timeout_warning" in _agent_cfg and "HERMES_AGENT_TIMEOUT_WARNING" not in os.environ:
                os.environ["HERMES_AGENT_TIMEOUT_WARNING"] = str(_agent_cfg["gateway_timeout_warning"])
+            if "restart_drain_timeout" in _agent_cfg and "HERMES_RESTART_DRAIN_TIMEOUT" not in os.environ:
+                os.environ["HERMES_RESTART_DRAIN_TIMEOUT"] = str(_agent_cfg["restart_drain_timeout"])
+        _display_cfg = _cfg.get("display", {})
+        if _display_cfg and isinstance(_display_cfg, dict):
+            if "busy_input_mode" in _display_cfg and "HERMES_GATEWAY_BUSY_INPUT_MODE" not in os.environ:
+                os.environ["HERMES_GATEWAY_BUSY_INPUT_MODE"] = str(_display_cfg["busy_input_mode"])
        # Timezone: bridge config.yaml → HERMES_TIMEZONE env var.
        # HERMES_TIMEZONE from .env takes precedence (already in os.environ).
        _tz_cfg = _cfg.get("timezone", "")
@@ -235,7 +241,17 @@ from gateway.session import (
    build_session_key,
 )
 from gateway.delivery import DeliveryRouter
-from gateway.platforms.base import BasePlatformAdapter, MessageEvent, MessageType
+from gateway.platforms.base import (
+    BasePlatformAdapter,
+    MessageEvent,
+    MessageType,
+    merge_pending_message_event,
+)
+from gateway.restart import (
+    DEFAULT_GATEWAY_RESTART_DRAIN_TIMEOUT,
+    GATEWAY_SERVICE_RESTART_EXIT_CODE,
+    parse_restart_drain_timeout,
+)


 def _normalize_whatsapp_identifier(value: str) -> str:
@@ -471,6 +487,16 @@ class GatewayRunner:
    # Class-level defaults so partial construction in tests doesn't
    # blow up on attribute access.
    _running_agents_ts: Dict[str, float] = {}
+    _busy_input_mode: str = "interrupt"
+    _restart_drain_timeout: float = DEFAULT_GATEWAY_RESTART_DRAIN_TIMEOUT
+    _exit_code: Optional[int] = None
+    _draining: bool = False
+    _restart_requested: bool = False
+    _restart_task_started: bool = False
+    _restart_detached: bool = False
+    _restart_via_service: bool = False
+    _stop_task: Optional[asyncio.Task] = None
+    _session_model_overrides: Dict[str, Dict[str, str]] = {}
    
    def __init__(self, config: Optional[GatewayConfig] = None):
        self.config = config or load_gateway_config()
@@ -483,6 +509,8 @@ class GatewayRunner:
        self._reasoning_config = self._load_reasoning_config()
        self._service_tier = self._load_service_tier()
        self._show_reasoning = self._load_show_reasoning()
+        self._busy_input_mode = self._load_busy_input_mode()
+        self._restart_drain_timeout = self._load_restart_drain_timeout()
        self._provider_routing = self._load_provider_routing()
        self._fallback_model = self._load_fallback_model()
        self._smart_model_routing = self._load_smart_model_routing()
@@ -499,6 +527,13 @@ class GatewayRunner:
        self._exit_cleanly = False
        self._exit_with_failure = False
        self._exit_reason: Optional[str] = None
+        self._exit_code: Optional[int] = None
+        self._draining = False
+        self._restart_requested = False
+        self._restart_task_started = False
+        self._restart_detached = False
+        self._restart_via_service = False
+        self._stop_task: Optional[asyncio.Task] = None
        
        # Track running agents per session for interrupt support
        # Key: session_key, Value: AIAgent instance
@@ -759,6 +794,10 @@ class GatewayRunner:
    def exit_reason(self) -> Optional[str]:
        return self._exit_reason

+    @property
+    def exit_code(self) -> Optional[int]:
+        return self._exit_code
+
    def _session_key_for_source(self, source: SessionSource) -> str:
        """Resolve the current session key for a source, honoring gateway config when available."""
        if hasattr(self, "session_store") and self.session_store is not None:
@@ -868,6 +907,30 @@ class GatewayRunner:
        self._exit_cleanly = True
        self._exit_reason = reason
        self._shutdown_event.set()
+
+    def _running_agent_count(self) -> int:
+        return len(self._running_agents)
+
+    def _status_action_label(self) -> str:
+        return "restart" if self._restart_requested else "shutdown"
+
+    def _status_action_gerund(self) -> str:
+        return "restarting" if self._restart_requested else "shutting down"
+
+    def _queue_during_drain_enabled(self) -> bool:
+        return self._restart_requested and self._busy_input_mode == "queue"
+
+    def _update_runtime_status(self, gateway_state: Optional[str] = None, exit_reason: Optional[str] = None) -> None:
+        try:
+            from gateway.status import write_runtime_status
+            write_runtime_status(
+                gateway_state=gateway_state,
+                exit_reason=exit_reason,
+                restart_requested=self._restart_requested,
+                active_agents=self._running_agent_count(),
+            )
+        except Exception:
+            pass
    
    @staticmethod
    def _load_prefill_messages() -> List[Dict[str, Any]]:
@@ -994,6 +1057,48 @@ class GatewayRunner:
            pass
        return False

+    @staticmethod
+    def _load_busy_input_mode() -> str:
+        """Load gateway drain-time busy-input behavior from config/env."""
+        mode = os.getenv("HERMES_GATEWAY_BUSY_INPUT_MODE", "").strip().lower()
+        if not mode:
+            try:
+                import yaml as _y
+                cfg_path = _hermes_home / "config.yaml"
+                if cfg_path.exists():
+                    with open(cfg_path, encoding="utf-8") as _f:
+                        cfg = _y.safe_load(_f) or {}
+                    mode = str(cfg.get("display", {}).get("busy_input_mode", "") or "").strip().lower()
+            except Exception:
+                pass
+        return "queue" if mode == "queue" else "interrupt"
+
+    @staticmethod
+    def _load_restart_drain_timeout() -> float:
+        """Load graceful gateway restart/stop drain timeout in seconds."""
+        raw = os.getenv("HERMES_RESTART_DRAIN_TIMEOUT", "").strip()
+        if not raw:
+            try:
+                import yaml as _y
+                cfg_path = _hermes_home / "config.yaml"
+                if cfg_path.exists():
+                    with open(cfg_path, encoding="utf-8") as _f:
+                        cfg = _y.safe_load(_f) or {}
+                    raw = str(cfg.get("agent", {}).get("restart_drain_timeout", "") or "").strip()
+            except Exception:
+                pass
+        value = parse_restart_drain_timeout(raw)
+        if raw and value == DEFAULT_GATEWAY_RESTART_DRAIN_TIMEOUT:
+            try:
+                float(raw)
+            except (TypeError, ValueError):
+                logger.warning(
+                    "Invalid restart_drain_timeout '%s', using default %.0fs",
+                    raw,
+                    DEFAULT_GATEWAY_RESTART_DRAIN_TIMEOUT,
+                )
+        return value
+
    @staticmethod
    def _load_background_notifications_mode() -> str:
        """Load background process notification mode from config or env var.
@@ -1078,6 +1183,155 @@ class GatewayRunner:
            pass
        return {}

+    def _snapshot_running_agents(self) -> Dict[str, Any]:
+        return {
+            session_key: agent
+            for session_key, agent in self._running_agents.items()
+            if agent is not _AGENT_PENDING_SENTINEL
+        }
+
+    def _queue_or_replace_pending_event(self, session_key: str, event: MessageEvent) -> None:
+        adapter = self.adapters.get(event.source.platform)
+        if not adapter:
+            return
+        merge_pending_message_event(adapter._pending_messages, session_key, event)
+
+    async def _handle_active_session_busy_message(self, event: MessageEvent, session_key: str) -> bool:
+        if not self._draining:
+            return False
+
+        adapter = self.adapters.get(event.source.platform)
+        if not adapter:
+            return True
+
+        thread_meta = {"thread_id": event.source.thread_id} if event.source.thread_id else None
+        if self._queue_during_drain_enabled():
+            self._queue_or_replace_pending_event(session_key, event)
+            message = f"⏳ Gateway {self._status_action_gerund()} — queued for the next turn after it comes back."
+        else:
+            message = f"⏳ Gateway is {self._status_action_gerund()} and is not accepting another turn right now."
+
+        await adapter._send_with_retry(
+            chat_id=event.source.chat_id,
+            content=message,
+            reply_to=event.message_id,
+            metadata=thread_meta,
+        )
+        return True
+
+    async def _drain_active_agents(self, timeout: float) -> tuple[Dict[str, Any], bool]:
+        snapshot = self._snapshot_running_agents()
+        last_active_count = self._running_agent_count()
+        last_status_at = 0.0
+
+        def _maybe_update_status(force: bool = False) -> None:
+            nonlocal last_active_count, last_status_at
+            now = asyncio.get_running_loop().time()
+            active_count = self._running_agent_count()
+            if force or active_count != last_active_count or (now - last_status_at) >= 1.0:
+                self._update_runtime_status("draining")
+                last_active_count = active_count
+                last_status_at = now
+
+        if not self._running_agents:
+            _maybe_update_status(force=True)
+            return snapshot, False
+
+        _maybe_update_status(force=True)
+        if timeout <= 0:
+            return snapshot, True
+
+        deadline = asyncio.get_running_loop().time() + timeout
+        while self._running_agents and asyncio.get_running_loop().time() < deadline:
+            _maybe_update_status()
+            await asyncio.sleep(0.1)
+        timed_out = bool(self._running_agents)
+        _maybe_update_status(force=True)
+        return snapshot, timed_out
+
+    def _interrupt_running_agents(self, reason: str) -> None:
+        for session_key, agent in list(self._running_agents.items()):
+            if agent is _AGENT_PENDING_SENTINEL:
+                continue
+            try:
+                agent.interrupt(reason)
+                logger.debug("Interrupted running agent for session %s during shutdown", session_key[:20])
+            except Exception as e:
+                logger.debug("Failed interrupting agent during shutdown: %s", e)
+
+    def _finalize_shutdown_agents(self, active_agents: Dict[str, Any]) -> None:
+        for agent in active_agents.values():
+            try:
+                from hermes_cli.plugins import invoke_hook as _invoke_hook
+                _invoke_hook(
+                    "on_session_finalize",
+                    session_id=getattr(agent, "session_id", None),
+                    platform="gateway",
+                )
+            except Exception:
+                pass
+            try:
+                if hasattr(agent, "shutdown_memory_provider"):
+                    agent.shutdown_memory_provider()
+            except Exception:
+                pass
+            # Close tool resources (terminal sandboxes, browser daemons,
+            # background processes, httpx clients) to prevent zombie
+            # process accumulation.
+            try:
+                if hasattr(agent, 'close'):
+                    agent.close()
+            except Exception:
+                pass
+
+    async def _launch_detached_restart_command(self) -> None:
+        import shutil
+        import subprocess
+
+        hermes_cmd = _resolve_hermes_bin()
+        if not hermes_cmd:
+            logger.error("Could not locate hermes binary for detached /restart")
+            return
+
+        current_pid = os.getpid()
+        cmd = " ".join(shlex.quote(part) for part in hermes_cmd)
+        shell_cmd = (
+            f"while kill -0 {current_pid} 2>/dev/null; do sleep 0.2; done; "
+            f"{cmd} gateway restart"
+        )
+        setsid_bin = shutil.which("setsid")
+        if setsid_bin:
+            subprocess.Popen(
+                [setsid_bin, "bash", "-lc", shell_cmd],
+                stdout=subprocess.DEVNULL,
+                stderr=subprocess.DEVNULL,
+                start_new_session=True,
+            )
+        else:
+            subprocess.Popen(
+                ["bash", "-lc", shell_cmd],
+                stdout=subprocess.DEVNULL,
+                stderr=subprocess.DEVNULL,
+                start_new_session=True,
+            )
+
+    def request_restart(self, *, detached: bool = False, via_service: bool = False) -> bool:
+        if self._restart_task_started:
+            return False
+        self._restart_requested = True
+        self._restart_detached = detached
+        self._restart_via_service = via_service
+        self._restart_task_started = True
+
+        async def _run_restart() -> None:
+            await asyncio.sleep(0.05)
+            await self.stop(restart=True, detached_restart=detached, service_restart=via_service)
+
+        task = asyncio.create_task(_run_restart())
+        self._background_tasks.add(task)
+        task.add_done_callback(self._background_tasks.discard)
+        return True
+
    async def start(self) -> bool:
        """
        Start the gateway and all configured platform adapters.
@@ -1165,6 +1419,7 @@ class GatewayRunner:
            adapter.set_message_handler(self._handle_message)
            adapter.set_fatal_error_handler(self._handle_adapter_fatal_error)
            adapter.set_session_store(self.session_store)
+            adapter.set_busy_session_handler(self._handle_active_session_busy_message)
            
            # Try to connect
            logger.info("Connecting to %s...", platform.value)
@@ -1240,11 +1495,7 @@ class GatewayRunner:
        self.delivery_router.adapters = self.adapters
        
        self._running = True
-        try:
-            from gateway.status import write_runtime_status
-            write_runtime_status(gateway_state="running", exit_reason=None)
-        except Exception:
-            pass
+        self._update_runtime_status("running")
        
        # Emit gateway:startup hook
        hook_count = len(self.hooks.loaded_hooks)
@@ -1348,12 +1599,28 @@ class GatewayRunner:
                for key, entry in _expired_entries:
                    try:
                        await self._async_flush_memories(entry.session_id)
-                        # Shut down memory provider on the cached agent
-                        cached_agent = self._running_agents.get(key)
-                        if cached_agent and cached_agent is not _AGENT_PENDING_SENTINEL:
+                        # Shut down memory provider and close tool resources
+                        # on the cached agent.  Idle agents live in
+                        # _agent_cache (not _running_agents), so look there.
+                        _cached_agent = None
+                        _cache_lock = getattr(self, "_agent_cache_lock", None)
+                        if _cache_lock is not None:
+                            with _cache_lock:
+                                _cached = self._agent_cache.get(key)
+                                _cached_agent = _cached[0] if isinstance(_cached, tuple) else _cached if _cached else None
+                        # Fall back to _running_agents in case the agent is
+                        # still mid-turn when the expiry fires.
+                        if _cached_agent is None:
+                            _cached_agent = self._running_agents.get(key)
+                        if _cached_agent and _cached_agent is not _AGENT_PENDING_SENTINEL:
                            try:
-                                if hasattr(cached_agent, 'shutdown_memory_provider'):
-                                    cached_agent.shutdown_memory_provider()
+                                if hasattr(_cached_agent, 'shutdown_memory_provider'):
+                                    _cached_agent.shutdown_memory_provider()
+                            except Exception:
+                                pass
+                            try:
+                                if hasattr(_cached_agent, 'close'):
+                                    _cached_agent.close()
                            except Exception:
                                pass
                        # Mark as flushed and persist to disk so the flag
@@ -1463,6 +1730,7 @@ class GatewayRunner:
                    adapter.set_message_handler(self._handle_message)
                    adapter.set_fatal_error_handler(self._handle_adapter_fatal_error)
                    adapter.set_session_store(self.session_store)
+                    adapter.set_busy_session_handler(self._handle_active_session_busy_message)

                    success = await adapter.connect()
                    if success:
@@ -1509,64 +1777,108 @@ class GatewayRunner:
                    return
                await asyncio.sleep(1)

-    async def stop(self) -> None:
+    async def stop(
+        self,
+        *,
+        restart: bool = False,
+        detached_restart: bool = False,
+        service_restart: bool = False,
+    ) -> None:
        """Stop the gateway and disconnect all adapters."""
-        logger.info("Stopping gateway...")
-        self._running = False
+        if restart:
+            self._restart_requested = True
+            self._restart_detached = detached_restart
+            self._restart_via_service = service_restart
+        if self._stop_task is not None:
+            await self._stop_task
+            return

-        for session_key, agent in list(self._running_agents.items()):
-            if agent is _AGENT_PENDING_SENTINEL:
-                continue
+        async def _stop_impl() -> None:
+            logger.info(
+                "Stopping gateway%s...",
+                " for restart" if self._restart_requested else "",
+            )
+            self._running = False
+            self._draining = True
+
+            timeout = self._restart_drain_timeout
+            active_agents, timed_out = await self._drain_active_agents(timeout)
+            if timed_out:
+                logger.warning(
+                    "Gateway drain timed out after %.1fs with %d active agent(s); interrupting remaining work.",
+                    timeout,
+                    self._running_agent_count(),
+                )
+                self._interrupt_running_agents(
+                    "Gateway restarting" if self._restart_requested else "Gateway shutting down"
+                )
+                interrupt_deadline = asyncio.get_running_loop().time() + 5.0
+                while self._running_agents and asyncio.get_running_loop().time() < interrupt_deadline:
+                    self._update_runtime_status("draining")
+                    await asyncio.sleep(0.1)
+
+            if self._restart_requested and self._restart_detached:
+                try:
+                    await self._launch_detached_restart_command()
+                except Exception as e:
+                    logger.error("Failed to launch detached gateway restart: %s", e)
+
+            self._finalize_shutdown_agents(active_agents)
+
+            for platform, adapter in list(self.adapters.items()):
+                try:
+                    await adapter.cancel_background_tasks()
+                except Exception as e:
+                    logger.debug("✗ %s background-task cancel error: %s", platform.value, e)
+                try:
+                    await adapter.disconnect()
+                    logger.info("✓ %s disconnected", platform.value)
+                except Exception as e:
+                    logger.error("✗ %s disconnect error: %s", platform.value, e)
+
+            for _task in list(self._background_tasks):
+                if _task is self._stop_task:
+                    continue
+                _task.cancel()
+            self._background_tasks.clear()
+
+            self.adapters.clear()
+            self._running_agents.clear()
+            self._pending_messages.clear()
+            self._pending_approvals.clear()
+            self._shutdown_event.set()
+
+            # Global cleanup: kill any remaining tool subprocesses not tied
+            # to a specific agent (catch-all for zombie prevention).
            try:
-                agent.interrupt("Gateway shutting down")
-                logger.debug("Interrupted running agent for session %s during shutdown", session_key[:20])
-            except Exception as e:
-                logger.debug("Failed interrupting agent during shutdown: %s", e)
-            # Fire plugin on_session_finalize hook before memory shutdown
-            try:
-                from hermes_cli.plugins import invoke_hook as _invoke_hook
-                _invoke_hook("on_session_finalize",
-                             session_id=getattr(agent, 'session_id', None),
-                             platform="gateway")
+                from tools.process_registry import process_registry
+                process_registry.kill_all()
            except Exception:
                pass
-            # Shut down memory provider at actual session boundary
            try:
-                if hasattr(agent, 'shutdown_memory_provider'):
-                    agent.shutdown_memory_provider()
+                from tools.terminal_tool import cleanup_all_environments
+                cleanup_all_environments()
+            except Exception:
+                pass
+            try:
+                from tools.browser_tool import cleanup_all_browsers
+                cleanup_all_browsers()
            except Exception:
                pass

-        for platform, adapter in list(self.adapters.items()):
-            try:
-                await adapter.cancel_background_tasks()
-            except Exception as e:
-                logger.debug("✗ %s background-task cancel error: %s", platform.value, e)
-            try:
-                await adapter.disconnect()
-                logger.info("✓ %s disconnected", platform.value)
-            except Exception as e:
-                logger.error("✗ %s disconnect error: %s", platform.value, e)
+            from gateway.status import remove_pid_file
+            remove_pid_file()

-        # Cancel any pending background tasks
-        for _task in list(self._background_tasks):
-            _task.cancel()
-        self._background_tasks.clear()
+            if self._restart_requested and self._restart_via_service:
+                self._exit_code = GATEWAY_SERVICE_RESTART_EXIT_CODE
+                self._exit_reason = self._exit_reason or "Gateway restart requested"

-        self.adapters.clear()
-        self._running_agents.clear()
-        self._pending_messages.clear()
-        self._pending_approvals.clear()
-        self._shutdown_event.set()
-        
-        from gateway.status import remove_pid_file, write_runtime_status
-        remove_pid_file()
-        try:
-            write_runtime_status(gateway_state="stopped", exit_reason=self._exit_reason)
-        except Exception:
-            pass
-        
-        logger.info("Gateway stopped")
+            self._draining = False
+            self._update_runtime_status("stopped", self._exit_reason)
+            logger.info("Gateway stopped")
+
+        self._stop_task = asyncio.create_task(_stop_impl())
+        await self._stop_task
    
    async def wait_for_shutdown(self) -> None:
        """Wait for shutdown signal."""
@@ -1682,7 +1994,7 @@ class GatewayRunner:
        elif platform == Platform.MATRIX:
            from gateway.platforms.matrix import MatrixAdapter, check_matrix_requirements
            if not check_matrix_requirements():
-                logger.warning("Matrix: matrix-nio not installed or credentials not set. Run: pip install 'matrix-nio[e2e]'")
+                logger.warning("Matrix: mautrix not installed or credentials not set. Run: pip install 'mautrix[encryption]'")
                return None
            return MatrixAdapter(config)

@@ -1972,6 +2284,9 @@ class GatewayRunner:
            _evt_cmd = event.get_command()
            _cmd_def_inner = _resolve_cmd_inner(_evt_cmd) if _evt_cmd else None

+            if _cmd_def_inner and _cmd_def_inner.name == "restart":
+                return await self._handle_restart_command(event)
+
            # /stop must hard-kill the session when an agent is running.
            # A soft interrupt (agent.interrupt()) doesn't help when the agent
            # is truly hung — the executor thread is blocked and never checks
@@ -2052,18 +2367,7 @@ class GatewayRunner:
                logger.debug("PRIORITY photo follow-up for session %s — queueing without interrupt", _quick_key[:20])
                adapter = self.adapters.get(source.platform)
                if adapter:
-                    # Reuse adapter queue semantics so photo bursts merge cleanly.
-                    if _quick_key in adapter._pending_messages:
-                        existing = adapter._pending_messages[_quick_key]
-                        if getattr(existing, "message_type", None) == MessageType.PHOTO:
-                            existing.media_urls.extend(event.media_urls)
-                            existing.media_types.extend(event.media_types)
-                            if event.text:
-                                existing.text = BasePlatformAdapter._merge_caption(existing.text, event.text)
-                        else:
-                            adapter._pending_messages[_quick_key] = event
-                    else:
-                        adapter._pending_messages[_quick_key] = event
+                    merge_pending_message_event(adapter._pending_messages, _quick_key, event)
                return None

            running_agent = self._running_agents.get(_quick_key)
@@ -2081,6 +2385,14 @@ class GatewayRunner:
                if adapter:
                    adapter._pending_messages[_quick_key] = event
                return None
+            if self._draining:
+                if self._queue_during_drain_enabled():
+                    self._queue_or_replace_pending_event(_quick_key, event)
+                return (
+                    f"⏳ Gateway {self._status_action_gerund()} — queued for the next turn after it comes back."
+                    if self._queue_during_drain_enabled()
+                    else f"⏳ Gateway is {self._status_action_gerund()} and is not accepting another turn right now."
+                )
            logger.debug("PRIORITY interrupt for session %s", _quick_key[:20])
            running_agent.interrupt(event.text)
            if _quick_key in self._pending_messages:
@@ -2122,6 +2434,9 @@ class GatewayRunner:

        if canonical == "status":
            return await self._handle_status_command(event)
+
+        if canonical == "restart":
+            return await self._handle_restart_command(event)
        
        if canonical == "stop":
            return await self._handle_stop_command(event)
@@ -2220,6 +2535,9 @@ class GatewayRunner:
        if canonical == "voice":
            return await self._handle_voice_command(event)

+        if self._draining:
+            return f"⏳ Gateway is {self._status_action_gerund()} and is not accepting new work right now."
+
        # User-defined quick commands (bypass agent loop, no LLM call)
        if command:
            if isinstance(self.config, dict):
@@ -2400,8 +2718,8 @@ class GatewayRunner:
        # Build session context
        context = build_session_context(source, self.config, session_entry)
        
-        # Set environment variables for tools
-        self._set_session_env(context)
+        # Set session context variables for tools (task-local, concurrency-safe)
+        _session_env_tokens = self._set_session_env(context)
        
        # Read privacy.redact_pii from config (re-read per message)
        _redact_pii = False
@@ -3234,8 +3552,8 @@ class GatewayRunner:
                "Try again or use /reset to start a fresh session."
            )
        finally:
-            # Clear session env
-            self._clear_session_env()
+            # Restore session context variables to their pre-handler state
+            self._clear_session_env(_session_env_tokens)
    
    def _format_session_info(self) -> str:
        """Resolve current model config and return a formatted info block.
@@ -3335,8 +3653,22 @@ class GatewayRunner:
                _flush_task.add_done_callback(self._background_tasks.discard)
        except Exception as e:
            logger.debug("Gateway memory flush on reset failed: %s", e)
+        # Close tool resources on the old agent (terminal sandboxes, browser
+        # daemons, background processes) before evicting from cache.
+        # Guard with getattr because test fixtures may skip __init__.
+        _cache_lock = getattr(self, "_agent_cache_lock", None)
+        if _cache_lock is not None:
+            with _cache_lock:
+                _cached = self._agent_cache.get(session_key)
+                _old_agent = _cached[0] if isinstance(_cached, tuple) else _cached if _cached else None
+            if _old_agent is not None:
+                try:
+                    if hasattr(_old_agent, "close"):
+                        _old_agent.close()
+                except Exception:
+                    pass
        self._evict_cached_agent(session_key)
-        
+
        try:
            from tools.env_passthrough import clear_env_passthrough
            clear_env_passthrough()
@@ -3500,7 +3832,21 @@ class GatewayRunner:
            return "⚡ Force-stopped. The session is unlocked — you can send a new message."
        else:
            return "No active task to stop."
-    
+
+    async def _handle_restart_command(self, event: MessageEvent) -> str:
+        """Handle /restart command - drain active work, then restart the gateway."""
+        if self._restart_requested or self._draining:
+            count = self._running_agent_count()
+            if count:
+                return f"⏳ Draining {count} active agent(s) before restart..."
+            return "⏳ Gateway restart already in progress..."
+
+        active_agents = self._running_agent_count()
+        self.request_restart(detached=True, via_service=False)
+        if active_agents:
+            return f"⏳ Draining {active_agents} active agent(s) before restart..."
+        return "♻ Restarting gateway..."
+
    async def _handle_help_command(self, event: MessageEvent) -> str:
        """Handle /help command - list available commands."""
        from hermes_cli.commands import gateway_help_lines
@@ -3623,7 +3969,7 @@ class GatewayRunner:
        # Check for session override
        source = event.source
        session_key = self._session_key_for_source(source)
-        override = getattr(self, "_session_model_overrides", {}).get(session_key, {})
+        override = self._session_model_overrides.get(session_key, {})
        if override:
            current_model = override.get("model", current_model)
            current_provider = override.get("provider", current_provider)
@@ -3705,8 +4051,6 @@ class GatewayRunner:
                            f"via {result.provider_label or result.target_provider}. "
                            f"Adjust your self-identification accordingly.]"
                        )
-                        if not hasattr(_self, "_session_model_overrides"):
-                            _self._session_model_overrides = {}
                        _self._session_model_overrides[_session_key] = {
                            "model": result.new_model,
                            "provider": result.target_provider,
@@ -3820,8 +4164,6 @@ class GatewayRunner:
        )

        # Store session override so next agent creation uses the new model
-        if not hasattr(self, "_session_model_overrides"):
-            self._session_model_overrides = {}
        self._session_model_overrides[session_key] = {
            "model": result.new_model,
            "provider": result.target_provider,
@@ -5140,6 +5482,7 @@ class GatewayRunner:

        try:
            from run_agent import AIAgent
+            from agent.manual_compression_feedback import summarize_manual_compression
            from agent.model_metadata import estimate_messages_tokens_rough

            runtime_kwargs = _resolve_runtime_agent_kwargs()
@@ -5167,6 +5510,13 @@ class GatewayRunner:
            )
            tmp_agent._print_fn = lambda *a, **kw: None

+            compressor = tmp_agent.context_compressor
+            compress_start = compressor.protect_first_n
+            compress_start = compressor._align_boundary_forward(msgs, compress_start)
+            compress_end = compressor._find_tail_cut_by_tokens(msgs, compress_start)
+            if compress_start >= compress_end:
+                return "Nothing to compress yet (the transcript is still all protected context)."
+
            loop = asyncio.get_event_loop()
            compressed, _ = await loop.run_in_executor(
                None,
@@ -5187,13 +5537,17 @@ class GatewayRunner:
            self.session_store.update_session(
                session_entry.session_key, last_prompt_tokens=0
            )
-            new_count = len(compressed)
            new_tokens = estimate_messages_tokens_rough(compressed)
-
-            return (
-                f"🗜️ Compressed: {original_count} → {new_count} messages\n"
-                f"~{approx_tokens:,} → ~{new_tokens:,} tokens"
+            summary = summarize_manual_compression(
+                msgs,
+                compressed,
+                approx_tokens,
+                new_tokens,
            )
+            lines = [f"🗜️ {summary['headline']}", summary["token_line"]]
+            if summary["note"]:
+                lines.append(summary["note"])
+            return "\n".join(lines)
        except Exception as e:
            logger.warning("Manual compress failed: %s", e)
            return f"Compression failed: {e}"
@@ -6120,20 +6474,27 @@ class GatewayRunner:

        return True

-    def _set_session_env(self, context: SessionContext) -> None:
-        """Set environment variables for the current session."""
-        os.environ["HERMES_SESSION_PLATFORM"] = context.source.platform.value
-        os.environ["HERMES_SESSION_CHAT_ID"] = context.source.chat_id
-        if context.source.chat_name:
-            os.environ["HERMES_SESSION_CHAT_NAME"] = context.source.chat_name
-        if context.source.thread_id:
-            os.environ["HERMES_SESSION_THREAD_ID"] = str(context.source.thread_id)
-    
-    def _clear_session_env(self) -> None:
-        """Clear session environment variables."""
-        for var in ["HERMES_SESSION_PLATFORM", "HERMES_SESSION_CHAT_ID", "HERMES_SESSION_CHAT_NAME", "HERMES_SESSION_THREAD_ID"]:
-            if var in os.environ:
-                del os.environ[var]
+    def _set_session_env(self, context: SessionContext) -> list:
+        """Set session context variables for the current async task.
+
+        Uses ``contextvars`` instead of ``os.environ`` so that concurrent
+        gateway messages cannot overwrite each other's session state.
+
+        Returns a list of reset tokens; pass them to ``_clear_session_env``
+        in a ``finally`` block.
+        """
+        from gateway.session_context import set_session_vars
+        return set_session_vars(
+            platform=context.source.platform.value,
+            chat_id=context.source.chat_id,
+            chat_name=context.source.chat_name or "",
+            thread_id=str(context.source.thread_id) if context.source.thread_id else "",
+        )
+
+    def _clear_session_env(self, tokens: list) -> None:
+        """Restore session context variables to their pre-handler values."""
+        from gateway.session_context import clear_session_vars
+        clear_session_vars(tokens)
    
    async def _enrich_message_with_vision(
        self,
@@ -7300,6 +7661,8 @@ class GatewayRunner:
                await asyncio.sleep(0.05)
            if session_key:
                self._running_agents[session_key] = agent_holder[0]
+                if self._draining:
+                    self._update_runtime_status("draining")
        
        tracking_task = asyncio.create_task(track_agent())
        
@@ -7499,12 +7862,19 @@ class GatewayRunner:
            # Track fallback model state: if the agent switched to a
            # fallback model during this run, persist it so /model shows
            # the actually-active model instead of the config default.
+            # Skip eviction when the run failed — evicting a failed agent
+            # forces MCP reinit on the next message for no benefit (the
+            # same error will recur).  This was the root cause of #7130:
+            # a bad model ID triggered fallback → eviction → recreation →
+            # MCP reinit → same 400 → loop, burning 91% CPU for hours.
            _agent = agent_holder[0]
-            if _agent is not None and hasattr(_agent, 'model'):
+            _result_for_fb = result_holder[0]
+            _run_failed = _result_for_fb.get("failed") if _result_for_fb else False
+            if _agent is not None and hasattr(_agent, 'model') and not _run_failed:
                _cfg_model = _resolve_gateway_model()
                if _agent.model != _cfg_model and not self._is_intentional_model_switch(session_key, _agent.model):
-                    # Fallback activated — evict cached agent so the next
-                    # message starts fresh and retries the primary model.
+                    # Fallback activated on a successful run — evict cached
+                    # agent so the next message retries the primary model.
                    self._evict_cached_agent(session_key)

            # Check if we were interrupted OR have a queued message (/queue).
@@ -7545,6 +7915,14 @@ class GatewayRunner:
                    except Exception:
                        pass

+            if self._draining and pending:
+                logger.info(
+                    "Discarding pending follow-up for session %s during gateway %s",
+                    session_key[:20] if session_key else "?",
+                    self._status_action_label(),
+                )
+                pending = None
+
            if pending:
                logger.debug("Processing pending message: '%s...'", pending[:40])
                
@@ -7621,6 +7999,8 @@ class GatewayRunner:
                del self._running_agents[session_key]
            if session_key:
                self._running_agents_ts.pop(session_key, None)
+            if self._draining:
+                self._update_runtime_status("draining")
            
            # Wait for cancelled tasks
            for task in [progress_task, interrupt_monitor, tracking_task, _notify_task]:
@@ -7818,13 +8198,21 @@ async def start_gateway(config: Optional[GatewayConfig] = None, replace: bool =
    runner = GatewayRunner(config)
    
    # Set up signal handlers
-    def signal_handler():
+    def shutdown_signal_handler():
        asyncio.create_task(runner.stop())
+
+    def restart_signal_handler():
+        runner.request_restart(detached=False, via_service=True)
    
    loop = asyncio.get_event_loop()
    for sig in (signal.SIGINT, signal.SIGTERM):
        try:
-            loop.add_signal_handler(sig, signal_handler)
+            loop.add_signal_handler(sig, shutdown_signal_handler)
+        except NotImplementedError:
+            pass
+    if hasattr(signal, "SIGUSR1"):
+        try:
+            loop.add_signal_handler(signal.SIGUSR1, restart_signal_handler)
        except NotImplementedError:
            pass
    
@@ -7874,6 +8262,9 @@ async def start_gateway(config: Optional[GatewayConfig] = None, replace: bool =
    except Exception:
        pass

+    if runner.exit_code is not None:
+        raise SystemExit(runner.exit_code)
+
    return True


@@ -0,0 +1,113 @@
+"""
+Session-scoped context variables for the Hermes gateway.
+
+Replaces the previous ``os.environ``-based session state
+(``HERMES_SESSION_PLATFORM``, ``HERMES_SESSION_CHAT_ID``, etc.) with
+Python's ``contextvars.ContextVar``.
+
+**Why this matters**
+
+The gateway processes messages concurrently via ``asyncio``.  When two
+messages arrive at the same time the old code did:
+
+    os.environ["HERMES_SESSION_THREAD_ID"] = str(context.source.thread_id)
+
+Because ``os.environ`` is *process-global*, Message A's value was
+silently overwritten by Message B before Message A's agent finished
+running.  Background-task notifications and tool calls therefore routed
+to the wrong thread.
+
+``contextvars.ContextVar`` values are *task-local*: each ``asyncio``
+task (and any ``run_in_executor`` thread it spawns) gets its own copy,
+so concurrent messages never interfere.
+
+**Backward compatibility**
+
+The public helper ``get_session_env(name, default="")`` mirrors the old
+``os.getenv("HERMES_SESSION_*", ...)`` calls.  Existing tool code only
+needs to replace the import + call site:
+
+    # before
+    import os
+    platform = os.getenv("HERMES_SESSION_PLATFORM", "")
+
+    # after
+    from gateway.session_context import get_session_env
+    platform = get_session_env("HERMES_SESSION_PLATFORM", "")
+"""
+
+from contextvars import ContextVar
+
+# ---------------------------------------------------------------------------
+# Per-task session variables
+# ---------------------------------------------------------------------------
+
+_SESSION_PLATFORM: ContextVar[str] = ContextVar("HERMES_SESSION_PLATFORM", default="")
+_SESSION_CHAT_ID: ContextVar[str] = ContextVar("HERMES_SESSION_CHAT_ID", default="")
+_SESSION_CHAT_NAME: ContextVar[str] = ContextVar("HERMES_SESSION_CHAT_NAME", default="")
+_SESSION_THREAD_ID: ContextVar[str] = ContextVar("HERMES_SESSION_THREAD_ID", default="")
+
+_VAR_MAP = {
+    "HERMES_SESSION_PLATFORM": _SESSION_PLATFORM,
+    "HERMES_SESSION_CHAT_ID": _SESSION_CHAT_ID,
+    "HERMES_SESSION_CHAT_NAME": _SESSION_CHAT_NAME,
+    "HERMES_SESSION_THREAD_ID": _SESSION_THREAD_ID,
+}
+
+
+def set_session_vars(
+    platform: str = "",
+    chat_id: str = "",
+    chat_name: str = "",
+    thread_id: str = "",
+) -> list:
+    """Set all session context variables and return reset tokens.
+
+    Call ``clear_session_vars(tokens)`` in a ``finally`` block to restore
+    the previous values when the handler exits.
+
+    Returns a list of ``Token`` objects (one per variable) that can be
+    passed to ``clear_session_vars``.
+    """
+    tokens = [
+        _SESSION_PLATFORM.set(platform),
+        _SESSION_CHAT_ID.set(chat_id),
+        _SESSION_CHAT_NAME.set(chat_name),
+        _SESSION_THREAD_ID.set(thread_id),
+    ]
+    return tokens
+
+
+def clear_session_vars(tokens: list) -> None:
+    """Restore session context variables to their pre-handler values."""
+    if not tokens:
+        return
+    vars_in_order = [
+        _SESSION_PLATFORM,
+        _SESSION_CHAT_ID,
+        _SESSION_CHAT_NAME,
+        _SESSION_THREAD_ID,
+    ]
+    for var, token in zip(vars_in_order, tokens):
+        var.reset(token)
+
+
+def get_session_env(name: str, default: str = "") -> str:
+    """Read a session context variable by its legacy ``HERMES_SESSION_*`` name.
+
+    Drop-in replacement for ``os.getenv("HERMES_SESSION_*", default)``.
+
+    Resolution order:
+    1. Context variable (set by the gateway for concurrency-safe access)
+    2. ``os.environ`` (used by CLI, cron scheduler, and tests)
+    3. *default*
+    """
+    import os
+
+    var = _VAR_MAP.get(name)
+    if var is not None:
+        value = var.get()
+        if value:
+            return value
+    # Fall back to os.environ for CLI, cron, and test compatibility
+    return os.getenv(name, default)
@@ -158,6 +158,8 @@ def _build_runtime_status_record() -> dict[str, Any]:
    payload.update({
        "gateway_state": "starting",
        "exit_reason": None,
+        "restart_requested": False,
+        "active_agents": 0,
        "platforms": {},
        "updated_at": _utc_now_iso(),
    })
@@ -218,6 +220,8 @@ def write_runtime_status(
    *,
    gateway_state: Optional[str] = None,
    exit_reason: Optional[str] = None,
+    restart_requested: Optional[bool] = None,
+    active_agents: Optional[int] = None,
    platform: Optional[str] = None,
    platform_state: Optional[str] = None,
    error_code: Optional[str] = None,
@@ -236,6 +240,10 @@ def write_runtime_status(
        payload["gateway_state"] = gateway_state
    if exit_reason is not None:
        payload["exit_reason"] = exit_reason
+    if restart_requested is not None:
+        payload["restart_requested"] = bool(restart_requested)
+    if active_agents is not None:
+        payload["active_agents"] = max(0, int(active_agents))

    if platform is not None:
        platform_payload = payload["platforms"].get(platform, {})
@@ -19,10 +19,9 @@ import subprocess
 import sys
 from pathlib import Path

-logger = logging.getLogger(__name__)
+from hermes_constants import is_wsl as _is_wsl

-# Cache WSL detection (checked once per process)
-_wsl_detected: bool | None = None
+logger = logging.getLogger(__name__)


 def save_clipboard_image(dest: Path) -> bool:
@@ -217,19 +216,6 @@ def _windows_save(dest: Path) -> bool:

 # ── Linux ────────────────────────────────────────────────────────────────

-def _is_wsl() -> bool:
-    """Detect if running inside WSL (1 or 2)."""
-    global _wsl_detected
-    if _wsl_detected is not None:
-        return _wsl_detected
-    try:
-        with open("/proc/version", "r") as f:
-            _wsl_detected = "microsoft" in f.read().lower()
-    except Exception:
-        _wsl_detected = False
-    return _wsl_detected
-
-
 def _linux_save(dest: Path) -> bool:
    """Try clipboard backends in priority order: WSL → Wayland → X11."""
    if _is_wsl():
@@ -140,6 +140,8 @@ COMMAND_REGISTRY: list[CommandDef] = [
    CommandDef("commands", "Browse all commands and skills (paginated)", "Info",
               gateway_only=True, args_hint="[page]"),
    CommandDef("help", "Show available commands", "Info"),
+    CommandDef("restart", "Gracefully restart the gateway after draining active runs", "Session",
+               gateway_only=True),
    CommandDef("usage", "Show token usage and rate limits for the current session", "Info"),
    CommandDef("insights", "Show usage insights and analytics", "Info",
               args_hint="[days]"),
@@ -269,6 +269,11 @@ DEFAULT_CONFIG = {
        # tools or receiving API responses.  Only fires when the agent has
        # been completely idle for this duration.  0 = unlimited.
        "gateway_timeout": 1800,
+        # Graceful drain timeout for gateway stop/restart (seconds).
+        # The gateway stops accepting new work, waits for running agents
+        # to finish, then interrupts any remaining runs after the timeout.
+        # 0 = no drain, interrupt immediately.
+        "restart_drain_timeout": 60,
        "service_tier": "",
        # Tool-use enforcement: injects system prompt guidance that tells the
        # model to actually call tools instead of describing intended actions.
@@ -504,6 +509,16 @@ DEFAULT_CONFIG = {
        "max_ms": 2500,
    },
    
+    # Context engine -- controls how the context window is managed when
+    # approaching the model's token limit.
+    # "compressor" = built-in lossy summarization (default).
+    # Set to a plugin name to activate an alternative engine (e.g. "lcm"
+    # for Lossless Context Management).  The engine must be installed as
+    # a plugin in plugins/context_engine/<name>/ or ~/.hermes/plugins/.
+    "context": {
+        "engine": "compressor",
+    },
+
    # Persistent memory -- bounded curated memory injected into system prompt
    "memory": {
        "memory_enabled": True,
@@ -528,6 +543,8 @@ DEFAULT_CONFIG = {
        "api_key": "",     # API key for delegation.base_url (falls back to OPENAI_API_KEY)
        "max_iterations": 50,  # per-subagent iteration cap (each subagent gets its own budget,
                               # independent of the parent's max_iterations)
+        "reasoning_effort": "",  # reasoning effort for subagents: "xhigh", "high", "medium",
+                                 # "low", "minimal", "none" (empty = inherit parent's level)
    },

    # Ephemeral prefill messages file — JSON list of {role, content} dicts
@@ -1209,8 +1226,8 @@ OPTIONAL_ENV_VARS = {
        "advanced": True,
    },
    "API_SERVER_KEY": {
-        "description": "Bearer token for API server authentication. If empty, all requests are allowed (local use only).",
-        "prompt": "API server auth key (optional)",
+        "description": "Bearer token for API server authentication. Required for non-loopback binding; server refuses to start without it. On loopback (127.0.0.1), all requests are allowed if empty.",
+        "prompt": "API server auth key (required for network access)",
        "url": None,
        "password": True,
        "category": "messaging",
@@ -1225,7 +1242,7 @@ OPTIONAL_ENV_VARS = {
        "advanced": True,
    },
    "API_SERVER_HOST": {
-        "description": "Host/bind address for the API server (default: 127.0.0.1). Use 0.0.0.0 for network access — requires API_SERVER_KEY for security.",
+        "description": "Host/bind address for the API server (default: 127.0.0.1). Use 0.0.0.0 for network access — server refuses to start without API_SERVER_KEY.",
        "prompt": "API server host",
        "url": None,
        "password": False,
@@ -1450,7 +1467,7 @@ _KNOWN_ROOT_KEYS = {
    "_config_version", "model", "providers", "fallback_model",
    "fallback_providers", "credential_pool_strategies", "toolsets",
    "agent", "terminal", "display", "compression", "delegation",
-    "auxiliary", "custom_providers", "memory", "gateway",
+    "auxiliary", "custom_providers", "context", "memory", "gateway",
 }

 # Valid fields inside a custom_providers list entry
@@ -2773,6 +2790,10 @@ def set_config_value(key: str, value: str):
        "terminal.timeout": "TERMINAL_TIMEOUT",
        "terminal.sandbox_dir": "TERMINAL_SANDBOX_DIR",
        "terminal.persistent_shell": "TERMINAL_PERSISTENT_SHELL",
+        "terminal.container_cpu": "TERMINAL_CONTAINER_CPU",
+        "terminal.container_memory": "TERMINAL_CONTAINER_MEMORY",
+        "terminal.container_disk": "TERMINAL_CONTAINER_DISK",
+        "terminal.container_persistent": "TERMINAL_CONTAINER_PERSISTENT",
    }
    if key in _config_to_env_sync:
        save_env_value(_config_to_env_sync[key], str(value))
@@ -160,6 +160,133 @@ def curses_checklist(
        return _numbered_fallback(title, items, selected, cancel_returns, status_fn)


+def curses_radiolist(
+    title: str,
+    items: List[str],
+    selected: int = 0,
+    *,
+    cancel_returns: int | None = None,
+) -> int:
+    """Curses single-select radio list. Returns the selected index.
+
+    Args:
+        title: Header line displayed above the list.
+        items: Display labels for each row.
+        selected: Index that starts selected (pre-selected).
+        cancel_returns: Returned on ESC/q. Defaults to the original *selected*.
+    """
+    if cancel_returns is None:
+        cancel_returns = selected
+
+    if not sys.stdin.isatty():
+        return cancel_returns
+
+    try:
+        import curses
+        result_holder: list = [None]
+
+        def _draw(stdscr):
+            curses.curs_set(0)
+            if curses.has_colors():
+                curses.start_color()
+                curses.use_default_colors()
+                curses.init_pair(1, curses.COLOR_GREEN, -1)
+                curses.init_pair(2, curses.COLOR_YELLOW, -1)
+            cursor = selected
+            scroll_offset = 0
+
+            while True:
+                stdscr.clear()
+                max_y, max_x = stdscr.getmaxyx()
+
+                # Header
+                try:
+                    hattr = curses.A_BOLD
+                    if curses.has_colors():
+                        hattr |= curses.color_pair(2)
+                    stdscr.addnstr(0, 0, title, max_x - 1, hattr)
+                    stdscr.addnstr(
+                        1, 0,
+                        "  \u2191\u2193 navigate  ENTER/SPACE select  ESC cancel",
+                        max_x - 1, curses.A_DIM,
+                    )
+                except curses.error:
+                    pass
+
+                # Scrollable item list
+                visible_rows = max_y - 4
+                if cursor < scroll_offset:
+                    scroll_offset = cursor
+                elif cursor >= scroll_offset + visible_rows:
+                    scroll_offset = cursor - visible_rows + 1
+
+                for draw_i, i in enumerate(
+                    range(scroll_offset, min(len(items), scroll_offset + visible_rows))
+                ):
+                    y = draw_i + 3
+                    if y >= max_y - 1:
+                        break
+                    radio = "\u25cf" if i == selected else "\u25cb"
+                    arrow = "\u2192" if i == cursor else " "
+                    line = f" {arrow} ({radio}) {items[i]}"
+                    attr = curses.A_NORMAL
+                    if i == cursor:
+                        attr = curses.A_BOLD
+                        if curses.has_colors():
+                            attr |= curses.color_pair(1)
+                    try:
+                        stdscr.addnstr(y, 0, line, max_x - 1, attr)
+                    except curses.error:
+                        pass
+
+                stdscr.refresh()
+                key = stdscr.getch()
+
+                if key in (curses.KEY_UP, ord("k")):
+                    cursor = (cursor - 1) % len(items)
+                elif key in (curses.KEY_DOWN, ord("j")):
+                    cursor = (cursor + 1) % len(items)
+                elif key in (ord(" "), curses.KEY_ENTER, 10, 13):
+                    result_holder[0] = cursor
+                    return
+                elif key in (27, ord("q")):
+                    result_holder[0] = cancel_returns
+                    return
+
+        curses.wrapper(_draw)
+        flush_stdin()
+        return result_holder[0] if result_holder[0] is not None else cancel_returns
+
+    except Exception:
+        return _radio_numbered_fallback(title, items, selected, cancel_returns)
+
+
+def _radio_numbered_fallback(
+    title: str,
+    items: List[str],
+    selected: int,
+    cancel_returns: int,
+) -> int:
+    """Text-based numbered fallback for radio selection."""
+    print(color(f"\n  {title}", Colors.YELLOW))
+    print(color("  Select by number, Enter to confirm.\n", Colors.DIM))
+
+    for i, label in enumerate(items):
+        marker = color("(\u25cf)", Colors.GREEN) if i == selected else "(\u25cb)"
+        print(f"  {marker} {i + 1:>2}. {label}")
+    print()
+    try:
+        val = input(color(f"  Choice [default {selected + 1}]: ", Colors.DIM)).strip()
+        if not val:
+            return selected
+        idx = int(val) - 1
+        if 0 <= idx < len(items):
+            return idx
+        return selected
+    except (ValueError, KeyboardInterrupt, EOFError):
+        return cancel_returns
+
+
 def _numbered_fallback(
    title: str,
    items: List[str],
@@ -15,7 +15,19 @@ from pathlib import Path
 PROJECT_ROOT = Path(__file__).parent.parent.resolve()

 from gateway.status import terminate_pid
-from hermes_cli.config import get_env_value, get_hermes_home, save_env_value, is_managed, managed_error
+from gateway.restart import (
+    DEFAULT_GATEWAY_RESTART_DRAIN_TIMEOUT,
+    GATEWAY_SERVICE_RESTART_EXIT_CODE,
+    parse_restart_drain_timeout,
+)
+from hermes_cli.config import (
+    get_env_value,
+    get_hermes_home,
+    is_managed,
+    managed_error,
+    read_raw_config,
+    save_env_value,
+)
 # display_hermes_home is imported lazily at call sites to avoid ImportError
 # when hermes_constants is cached from a pre-update version during `hermes update`.
 from hermes_cli.setup import (
@@ -92,6 +104,59 @@ def _get_service_pids() -> set:
    return pids


+def _get_parent_pid(pid: int) -> int | None:
+    """Return the parent PID for ``pid``, or ``None`` when unavailable."""
+    if pid <= 1:
+        return None
+    try:
+        result = subprocess.run(
+            ["ps", "-o", "ppid=", "-p", str(pid)],
+            capture_output=True,
+            text=True,
+            timeout=5,
+        )
+    except (FileNotFoundError, subprocess.TimeoutExpired):
+        return None
+    if result.returncode != 0:
+        return None
+    raw = result.stdout.strip()
+    if not raw:
+        return None
+    try:
+        parent_pid = int(raw.splitlines()[-1].strip())
+    except ValueError:
+        return None
+    return parent_pid if parent_pid > 0 else None
+
+
+def _is_pid_ancestor_of_current_process(target_pid: int) -> bool:
+    """Return True when ``target_pid`` is this process or one of its ancestors."""
+    if target_pid <= 0:
+        return False
+
+    pid = os.getpid()
+    seen: set[int] = set()
+    while pid and pid not in seen:
+        if pid == target_pid:
+            return True
+        seen.add(pid)
+        pid = _get_parent_pid(pid) or 0
+    return False
+
+
+def _request_gateway_self_restart(pid: int) -> bool:
+    """Ask a running gateway ancestor to restart itself asynchronously."""
+    if not hasattr(signal, "SIGUSR1"):
+        return False
+    if not _is_pid_ancestor_of_current_process(pid):
+        return False
+    try:
+        os.kill(pid, signal.SIGUSR1)
+    except (ProcessLookupError, PermissionError, OSError):
+        return False
+    return True
+
+
 def find_gateway_pids(exclude_pids: set | None = None) -> list:
    """Find PIDs of running gateway processes.

@@ -226,11 +291,33 @@ def is_linux() -> bool:
    return sys.platform.startswith('linux')


-from hermes_constants import is_termux
+from hermes_constants import is_termux, is_wsl
+
+
+def _wsl_systemd_operational() -> bool:
+    """Check if systemd is actually running as PID 1 on WSL.
+
+    WSL2 with ``systemd=true`` in wsl.conf has working systemd.
+    WSL2 without it (or WSL1) does not — systemctl commands fail.
+    """
+    try:
+        result = subprocess.run(
+            ["systemctl", "is-system-running"],
+            capture_output=True, text=True, timeout=5,
+        )
+        # "running", "degraded", "starting" all mean systemd is PID 1
+        status = result.stdout.strip().lower()
+        return status in ("running", "degraded", "starting", "initializing")
+    except (FileNotFoundError, subprocess.TimeoutExpired, OSError):
+        return False


 def supports_systemd_services() -> bool:
-    return is_linux() and not is_termux()
+    if not is_linux() or is_termux():
+        return False
+    if is_wsl():
+        return _wsl_systemd_operational()
+    return True


 def is_macos() -> bool:
@@ -665,6 +752,7 @@ def generate_systemd_unit(system: bool = False, run_as_user: str | None = None)
            path_entries.append(resolved_node_dir)

    common_bin_paths = ["/usr/local/sbin", "/usr/local/bin", "/usr/sbin", "/usr/bin", "/sbin", "/bin"]
+    restart_timeout = max(60, int(_get_restart_drain_timeout() or 0))

    if system:
        username, group_name, home_dir = _system_service_identity(run_as_user)
@@ -703,9 +791,11 @@ Environment="VIRTUAL_ENV={venv_dir}"
 Environment="HERMES_HOME={hermes_home}"
 Restart=on-failure
 RestartSec=30
+RestartForceExitStatus={GATEWAY_SERVICE_RESTART_EXIT_CODE}
 KillMode=mixed
 KillSignal=SIGTERM
-TimeoutStopSec=60
+ExecReload=/bin/kill -USR1 $MAINPID
+TimeoutStopSec={restart_timeout}
 StandardOutput=journal
 StandardError=journal

@@ -733,9 +823,11 @@ Environment="VIRTUAL_ENV={venv_dir}"
 Environment="HERMES_HOME={hermes_home}"
 Restart=on-failure
 RestartSec=30
+RestartForceExitStatus={GATEWAY_SERVICE_RESTART_EXIT_CODE}
 KillMode=mixed
 KillSignal=SIGTERM
-TimeoutStopSec=60
+ExecReload=/bin/kill -USR1 $MAINPID
+TimeoutStopSec={restart_timeout}
 StandardOutput=journal
 StandardError=journal

@@ -838,6 +930,20 @@ def _select_systemd_scope(system: bool = False) -> bool:
    return get_systemd_unit_path(system=True).exists() and not get_systemd_unit_path(system=False).exists()


+def _get_restart_drain_timeout() -> float:
+    """Return the configured gateway restart drain timeout in seconds."""
+    raw = os.getenv("HERMES_RESTART_DRAIN_TIMEOUT", "").strip()
+    if not raw:
+        cfg = read_raw_config()
+        agent_cfg = cfg.get("agent", {}) if isinstance(cfg, dict) else {}
+        raw = str(
+            agent_cfg.get(
+                "restart_drain_timeout", DEFAULT_GATEWAY_RESTART_DRAIN_TIMEOUT
+            )
+        )
+    return parse_restart_drain_timeout(raw)
+
+
 def systemd_install(force: bool = False, system: bool = False, run_as_user: str | None = None):
    if system:
        _require_root_for_system_service("install")
@@ -923,7 +1029,13 @@ def systemd_restart(system: bool = False):
    if system:
        _require_root_for_system_service("restart")
    refresh_systemd_unit_if_needed(system=system)
-    subprocess.run(_systemctl_cmd(system) + ["restart", get_service_name()], check=True, timeout=90)
+    from gateway.status import get_running_pid
+
+    pid = get_running_pid()
+    if pid is not None and _request_gateway_self_restart(pid):
+        print(f"✓ {_service_scope_label(system).capitalize()} service restart requested")
+        return
+    subprocess.run(_systemctl_cmd(system) + ["reload-or-restart", get_service_name()], check=True, timeout=90)
    print(f"✓ {_service_scope_label(system).capitalize()} service restarted")


@@ -1211,7 +1323,7 @@ def launchd_stop():
    _wait_for_gateway_exit(timeout=10.0, force_after=5.0)
    print("✓ Service stopped")

-def _wait_for_gateway_exit(timeout: float = 10.0, force_after: float = 5.0):
+def _wait_for_gateway_exit(timeout: float = 10.0, force_after: float | None = 5.0) -> bool:
    """Wait for the gateway process (by saved PID) to exit.

    Uses the PID from the gateway.pid file — not launchd labels — so this
@@ -1226,21 +1338,21 @@ def _wait_for_gateway_exit(timeout: float = 10.0, force_after: float = 5.0):
    from gateway.status import get_running_pid

    deadline = time.monotonic() + timeout
-    force_deadline = time.monotonic() + force_after
+    force_deadline = (time.monotonic() + force_after) if force_after is not None else None
    force_sent = False

    while time.monotonic() < deadline:
        pid = get_running_pid()
        if pid is None:
-            return  # Process exited cleanly.
+            return True  # Process exited cleanly.

-        if not force_sent and time.monotonic() >= force_deadline:
+        if force_after is not None and not force_sent and time.monotonic() >= force_deadline:
            # Grace period expired — force-kill the specific PID.
            try:
                terminate_pid(pid, force=True)
                print(f"⚠ Gateway PID {pid} did not exit gracefully; sent SIGKILL")
            except (ProcessLookupError, PermissionError, OSError):
-                return  # Already gone or we can't touch it.
+                return True  # Already gone or we can't touch it.
            force_sent = True

        time.sleep(0.3)
@@ -1249,15 +1361,30 @@ def _wait_for_gateway_exit(timeout: float = 10.0, force_after: float = 5.0):
    remaining_pid = get_running_pid()
    if remaining_pid is not None:
        print(f"⚠ Gateway PID {remaining_pid} still running after {timeout}s — restart may fail")
+        return False
+    return True


 def launchd_restart():
    label = get_launchd_label()
    target = f"{_launchd_domain()}/{label}"
-    # Use kickstart -k so launchd performs an atomic kill+restart.
-    # A two-step stop/start from inside the gateway's own process tree
-    # would kill the shell before the start command is reached.
+    drain_timeout = _get_restart_drain_timeout()
+    from gateway.status import get_running_pid
+
    try:
+        pid = get_running_pid()
+        if pid is not None and _request_gateway_self_restart(pid):
+            print("✓ Service restart requested")
+            return
+        if pid is not None:
+            try:
+                terminate_pid(pid, force=False)
+            except (ProcessLookupError, PermissionError, OSError):
+                pid = None
+            if pid is not None:
+                exited = _wait_for_gateway_exit(timeout=drain_timeout, force_after=None)
+                if not exited:
+                    print(f"⚠ Gateway drain timed out after {drain_timeout:.0f}s — forcing launchd restart")
        subprocess.run(["launchctl", "kickstart", "-k", target], check=True, timeout=90)
        print("✓ Service restarted")
    except subprocess.CalledProcessError as e:
@@ -1442,7 +1569,7 @@ _PLATFORMS = [
            "   Or via API: curl -X POST https://your-server/_matrix/client/v3/login \\",
            "     -d '{\"type\":\"m.login.password\",\"user\":\"@bot:server\",\"password\":\"...\"}'",
            "4. Alternatively, provide user ID + password and Hermes will log in directly",
-            "5. For E2EE: set MATRIX_ENCRYPTION=true (requires pip install 'matrix-nio[e2e]')",
+            "5. For E2EE: set MATRIX_ENCRYPTION=true (requires pip install 'mautrix[encryption]')",
            "6. To find your user ID: it's @username:your-server (shown in Element profile)",
        ],
        "vars": [
@@ -1728,6 +1855,8 @@ def _runtime_health_lines() -> list[str]:
    lines: list[str] = []
    gateway_state = state.get("gateway_state")
    exit_reason = state.get("exit_reason")
+    active_agents = state.get("active_agents")
+    restart_requested = state.get("restart_requested")
    platforms = state.get("platforms", {}) or {}

    for platform, pdata in platforms.items():
@@ -1737,6 +1866,10 @@ def _runtime_health_lines() -> list[str]:

    if gateway_state == "startup_failed" and exit_reason:
        lines.append(f"⚠ Last startup issue: {exit_reason}")
+    elif gateway_state == "draining":
+        action = "restart" if restart_requested else "shutdown"
+        count = int(active_agents or 0)
+        lines.append(f"⏳ Gateway draining for {action} ({count} active agent(s))")
    elif gateway_state == "stopped" and exit_reason:
        lines.append(f"⚠ Last shutdown reason: {exit_reason}")

@@ -2244,7 +2377,8 @@ def gateway_setup():
            print()
            if supports_systemd_services() or is_macos():
                platform_name = "systemd" if supports_systemd_services() else "launchd"
-                if prompt_yes_no(f"  Install the gateway as a {platform_name} service? (runs in background, starts on boot)", True):
+                wsl_note = " (note: services may not survive WSL restarts)" if is_wsl() else ""
+                if prompt_yes_no(f"  Install the gateway as a {platform_name} service?{wsl_note} (runs in background, starts on boot)", True):
                    try:
                        installed_scope = None
                        did_install = False
@@ -2269,16 +2403,21 @@ def gateway_setup():
                    print_info("  You can install later: hermes gateway install")
                    if supports_systemd_services():
                        print_info("  Or as a boot-time service: sudo hermes gateway install --system")
-                    print_info("  Or run in foreground:  hermes gateway")
+                    print_info("  Or run in foreground:  hermes gateway run")
+            elif is_wsl():
+                print_info("  WSL detected but systemd is not running.")
+                print_info("  Run in foreground: hermes gateway run")
+                print_info("  For persistence:   tmux new -s hermes 'hermes gateway run'")
+                print_info("  To enable systemd: add systemd=true to /etc/wsl.conf, then 'wsl --shutdown'")
            else:
                if is_termux():
                    from hermes_constants import display_hermes_home as _dhh
                    print_info("  Termux does not use systemd/launchd services.")
-                    print_info("  Run in foreground: hermes gateway")
-                    print_info(f"  Or start it manually in the background (best effort): nohup hermes gateway >{_dhh()}/logs/gateway.log 2>&1 &")
+                    print_info("  Run in foreground: hermes gateway run")
+                    print_info(f"  Or start it manually in the background (best effort): nohup hermes gateway run >{_dhh()}/logs/gateway.log 2>&1 &")
                else:
                    print_info("  Service install not supported on this platform.")
-                    print_info("  Run in foreground: hermes gateway")
+                    print_info("  Run in foreground: hermes gateway run")
    else:
        print()
        print_info("No platforms configured. Run 'hermes gateway setup' when ready.")
@@ -2319,9 +2458,23 @@ def gateway_command(args):
            print("Run manually: hermes gateway")
            sys.exit(1)
        if supports_systemd_services():
+            if is_wsl():
+                print_warning("WSL detected — systemd services may not survive WSL restarts.")
+                print_info("  Consider running in foreground instead: hermes gateway run")
+                print_info("  Or use tmux/screen for persistence: tmux new -s hermes 'hermes gateway run'")
+                print()
            systemd_install(force=force, system=system, run_as_user=run_as_user)
        elif is_macos():
            launchd_install(force)
+        elif is_wsl():
+            print("WSL detected but systemd is not running.")
+            print("Either enable systemd (add systemd=true to /etc/wsl.conf and restart WSL)")
+            print("or run the gateway in foreground mode:")
+            print()
+            print("  hermes gateway run                              # direct foreground")
+            print("  tmux new -s hermes 'hermes gateway run'         # persistent via tmux")
+            print("  nohup hermes gateway run > ~/.hermes/logs/gateway.log 2>&1 &  # background")
+            sys.exit(1)
        else:
            print("Service installation not supported on this platform.")
            print("Run manually: hermes gateway run")
@@ -2354,6 +2507,16 @@ def gateway_command(args):
            systemd_start(system=system)
        elif is_macos():
            launchd_start()
+        elif is_wsl():
+            print("WSL detected but systemd is not available.")
+            print("Run the gateway in foreground mode instead:")
+            print()
+            print("  hermes gateway run                              # direct foreground")
+            print("  tmux new -s hermes 'hermes gateway run'         # persistent via tmux")
+            print("  nohup hermes gateway run > ~/.hermes/logs/gateway.log 2>&1 &  # background")
+            print()
+            print("To enable systemd: add systemd=true to /etc/wsl.conf and run 'wsl --shutdown' from PowerShell.")
+            sys.exit(1)
        else:
            print("Not supported on this platform.")
            sys.exit(1)
@@ -2488,6 +2651,10 @@ def gateway_command(args):
                if is_termux():
                    print("Termux note:")
                    print("  Android may stop background jobs when Termux is suspended")
+                elif is_wsl():
+                    print("WSL note:")
+                    print("  The gateway is running in foreground/manual mode (recommended for WSL).")
+                    print("  Use tmux or screen for persistence across terminal closes.")
                else:
                    print("To install as a service:")
                    print("  hermes gateway install")
@@ -2502,9 +2669,12 @@ def gateway_command(args):
                        print(f"  {line}")
                print()
                print("To start:")
-                print("  hermes gateway          # Run in foreground")
+                print("  hermes gateway run      # Run in foreground")
                if is_termux():
-                    print("  nohup hermes gateway > ~/.hermes/logs/gateway.log 2>&1 &  # Best-effort background start")
+                    print("  nohup hermes gateway run > ~/.hermes/logs/gateway.log 2>&1 &  # Best-effort background start")
+                elif is_wsl():
+                    print("  tmux new -s hermes 'hermes gateway run'         # persistent via tmux")
+                    print("  nohup hermes gateway run > ~/.hermes/logs/gateway.log 2>&1 &  # background")
                else:
                    print("  hermes gateway install  # Install as user service")
                    print("  sudo hermes gateway install --system  # Install as boot-time system service")
@@ -4447,7 +4447,7 @@ For more help on a command:
    gateway_subparsers = gateway_parser.add_subparsers(dest="gateway_command")
    
    # gateway run (default)
-    gateway_run = gateway_subparsers.add_parser("run", help="Run gateway in foreground")
+    gateway_run = gateway_subparsers.add_parser("run", help="Run gateway in foreground (recommended for WSL, Docker, Termux)")
    gateway_run.add_argument("-v", "--verbose", action="count", default=0,
                             help="Increase stderr log verbosity (-v=INFO, -vv=DEBUG)")
    gateway_run.add_argument("-q", "--quiet", action="store_true",
@@ -4456,7 +4456,7 @@ For more help on a command:
                             help="Replace any existing gateway instance (useful for systemd)")
    
    # gateway start
-    gateway_start = gateway_subparsers.add_parser("start", help="Start gateway service")
+    gateway_start = gateway_subparsers.add_parser("start", help="Start the installed systemd/launchd background service")
    gateway_start.add_argument("--system", action="store_true", help="Target the Linux system-level gateway service")
    
    # gateway stop
@@ -4474,7 +4474,7 @@ For more help on a command:
    gateway_status.add_argument("--system", action="store_true", help="Target the Linux system-level gateway service")
    
    # gateway install
-    gateway_install = gateway_subparsers.add_parser("install", help="Install gateway as service")
+    gateway_install = gateway_subparsers.add_parser("install", help="Install gateway as a systemd/launchd background service")
    gateway_install.add_argument("--force", action="store_true", help="Force reinstall")
    gateway_install.add_argument("--system", action="store_true", help="Install as a Linux system-level service (starts at boot)")
    gateway_install.add_argument("--run-as-user", dest="run_as_user", help="User account the Linux system service should run as")
@@ -201,8 +201,7 @@ class PluginContext:

        The *setup_fn* receives an argparse subparser and should add any
        arguments/sub-subparsers.  If *handler_fn* is provided it is set
-        as the default dispatch function via ``set_defaults(func=...)``.
-        """
+        as the default dispatch function via ``set_defaults(func=...)``."""
        self._manager._cli_commands[name] = {
            "name": name,
            "help": help,
@@ -213,6 +212,38 @@ class PluginContext:
        }
        logger.debug("Plugin %s registered CLI command: %s", self.manifest.name, name)

+    # -- context engine registration -----------------------------------------
+
+    def register_context_engine(self, engine) -> None:
+        """Register a context engine to replace the built-in ContextCompressor.
+
+        Only one context engine plugin is allowed. If a second plugin tries
+        to register one, it is rejected with a warning.
+
+        The engine must be an instance of ``agent.context_engine.ContextEngine``.
+        """
+        if self._manager._context_engine is not None:
+            logger.warning(
+                "Plugin '%s' tried to register a context engine, but one is "
+                "already registered. Only one context engine plugin is allowed.",
+                self.manifest.name,
+            )
+            return
+        # Defer the import to avoid circular deps at module level
+        from agent.context_engine import ContextEngine
+        if not isinstance(engine, ContextEngine):
+            logger.warning(
+                "Plugin '%s' tried to register a context engine that does not "
+                "inherit from ContextEngine. Ignoring.",
+                self.manifest.name,
+            )
+            return
+        self._manager._context_engine = engine
+        logger.info(
+            "Plugin '%s' registered context engine: %s",
+            self.manifest.name, engine.name,
+        )
+
    # -- hook registration --------------------------------------------------

    def register_hook(self, hook_name: str, callback: Callable) -> None:
@@ -245,6 +276,7 @@ class PluginManager:
        self._hooks: Dict[str, List[Callable]] = {}
        self._plugin_tool_names: Set[str] = set()
        self._cli_commands: Dict[str, dict] = {}
+        self._context_engine = None  # Set by a plugin via register_context_engine()
        self._discovered: bool = False
        self._cli_ref = None  # Set by CLI after plugin discovery

@@ -566,6 +598,11 @@ def get_plugin_cli_commands() -> Dict[str, dict]:
    return dict(get_plugin_manager()._cli_commands)


+def get_plugin_context_engine():
+    """Return the plugin-registered context engine, or None."""
+    return get_plugin_manager()._context_engine
+
+
 def get_plugin_toolsets() -> List[tuple]:
    """Return plugin toolsets as ``(key, label, description)`` tuples.

@@ -531,7 +531,7 @@ def cmd_disable(name: str) -> None:

    disabled.add(name)
    _save_disabled_set(disabled)
-    console.print(f"[yellow]⊘[/yellow] Plugin [bold]{name}[/bold] disabled. Takes effect on next session.")
+    console.print(f"[yellow]\u2298[/yellow] Plugin [bold]{name}[/bold] disabled. Takes effect on next session.")


 def cmd_list() -> None:
@@ -594,8 +594,152 @@ def cmd_list() -> None:
    console.print("[dim]Enable/disable:[/dim] hermes plugins enable/disable <name>")


+# ---------------------------------------------------------------------------
+# Provider plugin discovery helpers
+# ---------------------------------------------------------------------------
+
+
+def _discover_memory_providers() -> list[tuple[str, str]]:
+    """Return [(name, description), ...] for available memory providers."""
+    try:
+        from plugins.memory import discover_memory_providers
+        return [(name, desc) for name, desc, _avail in discover_memory_providers()]
+    except Exception:
+        return []
+
+
+def _discover_context_engines() -> list[tuple[str, str]]:
+    """Return [(name, description), ...] for available context engines."""
+    try:
+        from plugins.context_engine import discover_context_engines
+        return [(name, desc) for name, desc, _avail in discover_context_engines()]
+    except Exception:
+        return []
+
+
+def _get_current_memory_provider() -> str:
+    """Return the current memory.provider from config (empty = built-in)."""
+    try:
+        from hermes_cli.config import load_config
+        config = load_config()
+        return config.get("memory", {}).get("provider", "") or ""
+    except Exception:
+        return ""
+
+
+def _get_current_context_engine() -> str:
+    """Return the current context.engine from config."""
+    try:
+        from hermes_cli.config import load_config
+        config = load_config()
+        return config.get("context", {}).get("engine", "compressor") or "compressor"
+    except Exception:
+        return "compressor"
+
+
+def _save_memory_provider(name: str) -> None:
+    """Persist memory.provider to config.yaml."""
+    from hermes_cli.config import load_config, save_config
+    config = load_config()
+    if "memory" not in config:
+        config["memory"] = {}
+    config["memory"]["provider"] = name
+    save_config(config)
+
+
+def _save_context_engine(name: str) -> None:
+    """Persist context.engine to config.yaml."""
+    from hermes_cli.config import load_config, save_config
+    config = load_config()
+    if "context" not in config:
+        config["context"] = {}
+    config["context"]["engine"] = name
+    save_config(config)
+
+
+def _configure_memory_provider() -> bool:
+    """Launch a radio picker for memory providers. Returns True if changed."""
+    from hermes_cli.curses_ui import curses_radiolist
+
+    current = _get_current_memory_provider()
+    providers = _discover_memory_providers()
+
+    # Build items: "built-in" first, then discovered providers
+    items = ["built-in (default)"]
+    names = [""]  # empty string = built-in
+    selected = 0
+
+    for name, desc in providers:
+        names.append(name)
+        label = f"{name} \u2014 {desc}" if desc else name
+        items.append(label)
+        if name == current:
+            selected = len(items) - 1
+
+    # If current provider isn't in discovered list, add it
+    if current and current not in names:
+        names.append(current)
+        items.append(f"{current} (not found)")
+        selected = len(items) - 1
+
+    choice = curses_radiolist(
+        title="Memory Provider (select one)",
+        items=items,
+        selected=selected,
+    )
+
+    new_provider = names[choice]
+    if new_provider != current:
+        _save_memory_provider(new_provider)
+        return True
+    return False
+
+
+def _configure_context_engine() -> bool:
+    """Launch a radio picker for context engines. Returns True if changed."""
+    from hermes_cli.curses_ui import curses_radiolist
+
+    current = _get_current_context_engine()
+    engines = _discover_context_engines()
+
+    # Build items: "compressor" first (built-in), then discovered engines
+    items = ["compressor (default)"]
+    names = ["compressor"]
+    selected = 0
+
+    for name, desc in engines:
+        names.append(name)
+        label = f"{name} \u2014 {desc}" if desc else name
+        items.append(label)
+        if name == current:
+            selected = len(items) - 1
+
+    # If current engine isn't in discovered list and isn't compressor, add it
+    if current != "compressor" and current not in names:
+        names.append(current)
+        items.append(f"{current} (not found)")
+        selected = len(items) - 1
+
+    choice = curses_radiolist(
+        title="Context Engine (select one)",
+        items=items,
+        selected=selected,
+    )
+
+    new_engine = names[choice]
+    if new_engine != current:
+        _save_context_engine(new_engine)
+        return True
+    return False
+
+
+# ---------------------------------------------------------------------------
+# Composite plugins UI
+# ---------------------------------------------------------------------------
+
+
 def cmd_toggle() -> None:
-    """Interactive curses checklist to enable/disable installed plugins."""
+    """Interactive composite UI — general plugins + provider plugin categories."""
    from rich.console import Console

    try:
@@ -606,18 +750,13 @@ def cmd_toggle() -> None:
    console = Console()
    plugins_dir = _plugins_dir()

+    # -- General plugins discovery --
    dirs = sorted(d for d in plugins_dir.iterdir() if d.is_dir())
-    if not dirs:
-        console.print("[dim]No plugins installed.[/dim]")
-        console.print("[dim]Install with:[/dim] hermes plugins install owner/repo")
-        return
-
    disabled = _get_disabled_set()

-    # Build items list: "name — description" for display
-    names = []
-    labels = []
-    selected = set()
+    plugin_names = []
+    plugin_labels = []
+    plugin_selected = set()

    for i, d in enumerate(dirs):
        manifest_file = d / "plugin.yaml"
@@ -633,36 +772,335 @@ def cmd_toggle() -> None:
            except Exception:
                pass

-        names.append(name)
-        label = f"{name} — {description}" if description else name
-        labels.append(label)
+        plugin_names.append(name)
+        label = f"{name} \u2014 {description}" if description else name
+        plugin_labels.append(label)

        if name not in disabled and d.name not in disabled:
-            selected.add(i)
+            plugin_selected.add(i)

-    from hermes_cli.curses_ui import curses_checklist
+    # -- Provider categories --
+    current_memory = _get_current_memory_provider() or "built-in"
+    current_context = _get_current_context_engine()
+    categories = [
+        ("Memory Provider", current_memory, _configure_memory_provider),
+        ("Context Engine", current_context, _configure_context_engine),
+    ]

-    result = curses_checklist(
-        title="Plugins — toggle enabled/disabled",
-        items=labels,
-        selected=selected,
-    )
+    has_plugins = bool(plugin_names)
+    has_categories = bool(categories)

-    # Compute new disabled set from deselected items
+    if not has_plugins and not has_categories:
+        console.print("[dim]No plugins installed and no provider categories available.[/dim]")
+        console.print("[dim]Install with:[/dim] hermes plugins install owner/repo")
+        return
+
+    # Non-TTY fallback
+    if not sys.stdin.isatty():
+        console.print("[dim]Interactive mode requires a terminal.[/dim]")
+        return
+
+    # Launch the composite curses UI
+    try:
+        import curses
+        _run_composite_ui(curses, plugin_names, plugin_labels, plugin_selected,
+                          disabled, categories, console)
+    except ImportError:
+        _run_composite_fallback(plugin_names, plugin_labels, plugin_selected,
+                                disabled, categories, console)
+
+
+def _run_composite_ui(curses, plugin_names, plugin_labels, plugin_selected,
+                      disabled, categories, console):
+    """Custom curses screen with checkboxes + category action rows."""
+    from hermes_cli.curses_ui import flush_stdin
+
+    chosen = set(plugin_selected)
+    n_plugins = len(plugin_names)
+    # Total rows: plugins + separator + categories
+    # separator is not navigable
+    n_categories = len(categories)
+    total_items = n_plugins + n_categories  # navigable items
+
+    result_holder = {"plugins_changed": False, "providers_changed": False}
+
+    def _draw(stdscr):
+        curses.curs_set(0)
+        if curses.has_colors():
+            curses.start_color()
+            curses.use_default_colors()
+            curses.init_pair(1, curses.COLOR_GREEN, -1)
+            curses.init_pair(2, curses.COLOR_YELLOW, -1)
+            curses.init_pair(3, curses.COLOR_CYAN, -1)
+            curses.init_pair(4, 8, -1)  # dim gray
+        cursor = 0
+        scroll_offset = 0
+
+        while True:
+            stdscr.clear()
+            max_y, max_x = stdscr.getmaxyx()
+
+            # Header
+            try:
+                hattr = curses.A_BOLD
+                if curses.has_colors():
+                    hattr |= curses.color_pair(2)
+                stdscr.addnstr(0, 0, "Plugins", max_x - 1, hattr)
+                stdscr.addnstr(
+                    1, 0,
+                    "  \u2191\u2193 navigate  SPACE toggle  ENTER configure/confirm  ESC done",
+                    max_x - 1, curses.A_DIM,
+                )
+            except curses.error:
+                pass
+
+            # Build display rows
+            # Row layout:
+            #   [plugins section header] (not navigable, skipped in scroll math)
+            #   plugin checkboxes (navigable, indices 0..n_plugins-1)
+            #   [separator] (not navigable)
+            #   [categories section header] (not navigable)
+            #   category action rows (navigable, indices n_plugins..total_items-1)
+
+            visible_rows = max_y - 4
+            if cursor < scroll_offset:
+                scroll_offset = cursor
+            elif cursor >= scroll_offset + visible_rows:
+                scroll_offset = cursor - visible_rows + 1
+
+            y = 3  # start drawing after header
+
+            # Determine which items are visible based on scroll
+            # We need to map logical cursor positions to screen rows
+            # accounting for non-navigable separator/headers
+
+            draw_row = 0  # tracks navigable item index
+
+            # --- General Plugins section ---
+            if n_plugins > 0:
+                # Section header
+                if y < max_y - 1:
+                    try:
+                        sattr = curses.A_BOLD
+                        if curses.has_colors():
+                            sattr |= curses.color_pair(2)
+                        stdscr.addnstr(y, 0, "  General Plugins", max_x - 1, sattr)
+                    except curses.error:
+                        pass
+                    y += 1
+
+                for i in range(n_plugins):
+                    if y >= max_y - 1:
+                        break
+                    check = "\u2713" if i in chosen else " "
+                    arrow = "\u2192" if i == cursor else " "
+                    line = f" {arrow} [{check}] {plugin_labels[i]}"
+                    attr = curses.A_NORMAL
+                    if i == cursor:
+                        attr = curses.A_BOLD
+                        if curses.has_colors():
+                            attr |= curses.color_pair(1)
+                    try:
+                        stdscr.addnstr(y, 0, line, max_x - 1, attr)
+                    except curses.error:
+                        pass
+                    y += 1
+
+            # --- Separator ---
+            if y < max_y - 1:
+                y += 1  # blank line
+
+            # --- Provider Plugins section ---
+            if n_categories > 0 and y < max_y - 1:
+                try:
+                    sattr = curses.A_BOLD
+                    if curses.has_colors():
+                        sattr |= curses.color_pair(2)
+                    stdscr.addnstr(y, 0, "  Provider Plugins", max_x - 1, sattr)
+                except curses.error:
+                    pass
+                y += 1
+
+                for ci, (cat_name, cat_current, _cat_fn) in enumerate(categories):
+                    if y >= max_y - 1:
+                        break
+                    cat_idx = n_plugins + ci
+                    arrow = "\u2192" if cat_idx == cursor else " "
+                    line = f" {arrow}   {cat_name:<24} \u25b8 {cat_current}"
+                    attr = curses.A_NORMAL
+                    if cat_idx == cursor:
+                        attr = curses.A_BOLD
+                        if curses.has_colors():
+                            attr |= curses.color_pair(3)
+                    try:
+                        stdscr.addnstr(y, 0, line, max_x - 1, attr)
+                    except curses.error:
+                        pass
+                    y += 1
+
+            stdscr.refresh()
+            key = stdscr.getch()
+
+            if key in (curses.KEY_UP, ord("k")):
+                if total_items > 0:
+                    cursor = (cursor - 1) % total_items
+            elif key in (curses.KEY_DOWN, ord("j")):
+                if total_items > 0:
+                    cursor = (cursor + 1) % total_items
+            elif key == ord(" "):
+                if cursor < n_plugins:
+                    # Toggle general plugin
+                    chosen.symmetric_difference_update({cursor})
+                else:
+                    # Provider category — launch sub-screen
+                    ci = cursor - n_plugins
+                    if 0 <= ci < n_categories:
+                        curses.endwin()
+                        _cat_name, _cat_cur, cat_fn = categories[ci]
+                        changed = cat_fn()
+                        if changed:
+                            result_holder["providers_changed"] = True
+                            # Refresh current values
+                            categories[ci] = (
+                                _cat_name,
+                                _get_current_memory_provider() or "built-in" if ci == 0
+                                else _get_current_context_engine(),
+                                cat_fn,
+                            )
+                        # Re-enter curses
+                        stdscr = curses.initscr()
+                        curses.noecho()
+                        curses.cbreak()
+                        stdscr.keypad(True)
+                        if curses.has_colors():
+                            curses.start_color()
+                            curses.use_default_colors()
+                            curses.init_pair(1, curses.COLOR_GREEN, -1)
+                            curses.init_pair(2, curses.COLOR_YELLOW, -1)
+                            curses.init_pair(3, curses.COLOR_CYAN, -1)
+                            curses.init_pair(4, 8, -1)
+                        curses.curs_set(0)
+            elif key in (curses.KEY_ENTER, 10, 13):
+                if cursor < n_plugins:
+                    # ENTER on a plugin checkbox — confirm and exit
+                    result_holder["plugins_changed"] = True
+                    return
+                else:
+                    # ENTER on a category — same as SPACE, launch sub-screen
+                    ci = cursor - n_plugins
+                    if 0 <= ci < n_categories:
+                        curses.endwin()
+                        _cat_name, _cat_cur, cat_fn = categories[ci]
+                        changed = cat_fn()
+                        if changed:
+                            result_holder["providers_changed"] = True
+                            categories[ci] = (
+                                _cat_name,
+                                _get_current_memory_provider() or "built-in" if ci == 0
+                                else _get_current_context_engine(),
+                                cat_fn,
+                            )
+                        stdscr = curses.initscr()
+                        curses.noecho()
+                        curses.cbreak()
+                        stdscr.keypad(True)
+                        if curses.has_colors():
+                            curses.start_color()
+                            curses.use_default_colors()
+                            curses.init_pair(1, curses.COLOR_GREEN, -1)
+                            curses.init_pair(2, curses.COLOR_YELLOW, -1)
+                            curses.init_pair(3, curses.COLOR_CYAN, -1)
+                            curses.init_pair(4, 8, -1)
+                        curses.curs_set(0)
+            elif key in (27, ord("q")):
+                # Save plugin changes on exit
+                result_holder["plugins_changed"] = True
+                return
+
+    curses.wrapper(_draw)
+    flush_stdin()
+
+    # Persist general plugin changes
    new_disabled = set()
-    for i, name in enumerate(names):
-        if i not in result:
+    for i, name in enumerate(plugin_names):
+        if i not in chosen:
            new_disabled.add(name)

    if new_disabled != disabled:
        _save_disabled_set(new_disabled)
-        enabled_count = len(names) - len(new_disabled)
+        enabled_count = len(plugin_names) - len(new_disabled)
        console.print(
-            f"\n[green]✓[/green] {enabled_count} enabled, {len(new_disabled)} disabled. "
-            f"Takes effect on next session."
+            f"\n[green]\u2713[/green] General plugins: {enabled_count} enabled, "
+            f"{len(new_disabled)} disabled."
        )
-    else:
-        console.print("\n[dim]No changes.[/dim]")
+    elif n_plugins > 0:
+        console.print("\n[dim]General plugins unchanged.[/dim]")
+
+    if result_holder["providers_changed"]:
+        new_memory = _get_current_memory_provider() or "built-in"
+        new_context = _get_current_context_engine()
+        console.print(
+            f"[green]\u2713[/green] Memory provider: [bold]{new_memory}[/bold]  "
+            f"Context engine: [bold]{new_context}[/bold]"
+        )
+
+    if n_plugins > 0 or result_holder["providers_changed"]:
+        console.print("[dim]Changes take effect on next session.[/dim]")
+    console.print()
+
+
+def _run_composite_fallback(plugin_names, plugin_labels, plugin_selected,
+                            disabled, categories, console):
+    """Text-based fallback for the composite plugins UI."""
+    from hermes_cli.colors import Colors, color
+
+    print(color("\n  Plugins", Colors.YELLOW))
+
+    # General plugins
+    if plugin_names:
+        chosen = set(plugin_selected)
+        print(color("\n  General Plugins", Colors.YELLOW))
+        print(color("  Toggle by number, Enter to confirm.\n", Colors.DIM))
+
+        while True:
+            for i, label in enumerate(plugin_labels):
+                marker = color("[\u2713]", Colors.GREEN) if i in chosen else "[ ]"
+                print(f"  {marker} {i + 1:>2}. {label}")
+            print()
+            try:
+                val = input(color("  Toggle # (or Enter to confirm): ", Colors.DIM)).strip()
+                if not val:
+                    break
+                idx = int(val) - 1
+                if 0 <= idx < len(plugin_names):
+                    chosen.symmetric_difference_update({idx})
+            except (ValueError, KeyboardInterrupt, EOFError):
+                return
+            print()
+
+        new_disabled = set()
+        for i, name in enumerate(plugin_names):
+            if i not in chosen:
+                new_disabled.add(name)
+        if new_disabled != disabled:
+            _save_disabled_set(new_disabled)
+
+    # Provider categories
+    if categories:
+        print(color("\n  Provider Plugins", Colors.YELLOW))
+        for ci, (cat_name, cat_current, cat_fn) in enumerate(categories):
+            print(f"  {ci + 1}. {cat_name} [{cat_current}]")
+        print()
+        try:
+            val = input(color("  Configure # (or Enter to skip): ", Colors.DIM)).strip()
+            if val:
+                ci = int(val) - 1
+                if 0 <= ci < len(categories):
+                    categories[ci][2]()  # call the configure function
+        except (ValueError, KeyboardInterrupt, EOFError):
+            pass
+
+    print()


 def plugins_command(args) -> None:
@@ -1925,9 +1925,9 @@ def _setup_matrix():
            save_env_value("MATRIX_ENCRYPTION", "true")
            print_success("E2EE enabled")

-        matrix_pkg = "matrix-nio[e2e]" if want_e2ee else "matrix-nio"
+        matrix_pkg = "mautrix[encryption]" if want_e2ee else "mautrix"
        try:
-            __import__("nio")
+            __import__("mautrix")
        except ImportError:
            print_info(f"Installing {matrix_pkg}...")
            import subprocess
@@ -168,6 +168,27 @@ def is_termux() -> bool:
    return bool(os.getenv("TERMUX_VERSION") or "com.termux/files/usr" in prefix)


+_wsl_detected: bool | None = None
+
+
+def is_wsl() -> bool:
+    """Return True when running inside WSL (Windows Subsystem for Linux).
+
+    Checks ``/proc/version`` for the ``microsoft`` marker that both WSL1
+    and WSL2 inject.  Result is cached for the process lifetime.
+    Import-safe — no heavy deps.
+    """
+    global _wsl_detected
+    if _wsl_detected is not None:
+        return _wsl_detected
+    try:
+        with open("/proc/version", "r") as f:
+            _wsl_detected = "microsoft" in f.read().lower()
+    except Exception:
+        _wsl_detected = False
+    return _wsl_detected
+
+
 OPENROUTER_BASE_URL = "https://openrouter.ai/api/v1"
 OPENROUTER_MODELS_URL = f"{OPENROUTER_BASE_URL}/models"

@@ -0,0 +1,219 @@
+"""Context engine plugin discovery.
+
+Scans ``plugins/context_engine/<name>/`` directories for context engine
+plugins.  Each subdirectory must contain ``__init__.py`` with a class
+implementing the ContextEngine ABC.
+
+Context engines are separate from the general plugin system — they live
+in the repo and are always available without user installation.  Only ONE
+can be active at a time, selected via ``context.engine`` in config.yaml.
+The default engine is ``"compressor"`` (the built-in ContextCompressor).
+
+Usage:
+    from plugins.context_engine import discover_context_engines, load_context_engine
+
+    available = discover_context_engines()   # [(name, desc, available), ...]
+    engine = load_context_engine("lcm")      # ContextEngine instance
+"""
+
+from __future__ import annotations
+
+import importlib
+import importlib.util
+import logging
+import sys
+from pathlib import Path
+from typing import List, Optional, Tuple
+
+logger = logging.getLogger(__name__)
+
+_CONTEXT_ENGINE_PLUGINS_DIR = Path(__file__).parent
+
+
+def discover_context_engines() -> List[Tuple[str, str, bool]]:
+    """Scan plugins/context_engine/ for available engines.
+
+    Returns list of (name, description, is_available) tuples.
+    Does NOT import the engines — just reads plugin.yaml for metadata
+    and does a lightweight availability check.
+    """
+    results = []
+    if not _CONTEXT_ENGINE_PLUGINS_DIR.is_dir():
+        return results
+
+    for child in sorted(_CONTEXT_ENGINE_PLUGINS_DIR.iterdir()):
+        if not child.is_dir() or child.name.startswith(("_", ".")):
+            continue
+        init_file = child / "__init__.py"
+        if not init_file.exists():
+            continue
+
+        # Read description from plugin.yaml if available
+        desc = ""
+        yaml_file = child / "plugin.yaml"
+        if yaml_file.exists():
+            try:
+                import yaml
+                with open(yaml_file) as f:
+                    meta = yaml.safe_load(f) or {}
+                desc = meta.get("description", "")
+            except Exception:
+                pass
+
+        # Quick availability check — try loading and calling is_available()
+        available = True
+        try:
+            engine = _load_engine_from_dir(child)
+            if engine is None:
+                available = False
+            elif hasattr(engine, "is_available"):
+                available = engine.is_available()
+        except Exception:
+            available = False
+
+        results.append((child.name, desc, available))
+
+    return results
+
+
+def load_context_engine(name: str) -> Optional["ContextEngine"]:
+    """Load and return a ContextEngine instance by name.
+
+    Returns None if the engine is not found or fails to load.
+    """
+    engine_dir = _CONTEXT_ENGINE_PLUGINS_DIR / name
+    if not engine_dir.is_dir():
+        logger.debug("Context engine '%s' not found in %s", name, _CONTEXT_ENGINE_PLUGINS_DIR)
+        return None
+
+    try:
+        engine = _load_engine_from_dir(engine_dir)
+        if engine:
+            return engine
+        logger.warning("Context engine '%s' loaded but no engine instance found", name)
+        return None
+    except Exception as e:
+        logger.warning("Failed to load context engine '%s': %s", name, e)
+        return None
+
+
+def _load_engine_from_dir(engine_dir: Path) -> Optional["ContextEngine"]:
+    """Import an engine module and extract the ContextEngine instance.
+
+    The module must have either:
+    - A register(ctx) function (plugin-style) — we simulate a ctx
+    - A top-level class that extends ContextEngine — we instantiate it
+    """
+    name = engine_dir.name
+    module_name = f"plugins.context_engine.{name}"
+    init_file = engine_dir / "__init__.py"
+
+    if not init_file.exists():
+        return None
+
+    # Check if already loaded
+    if module_name in sys.modules:
+        mod = sys.modules[module_name]
+    else:
+        # Handle relative imports within the plugin
+        # First ensure the parent packages are registered
+        for parent in ("plugins", "plugins.context_engine"):
+            if parent not in sys.modules:
+                parent_path = Path(__file__).parent
+                if parent == "plugins":
+                    parent_path = parent_path.parent
+                parent_init = parent_path / "__init__.py"
+                if parent_init.exists():
+                    spec = importlib.util.spec_from_file_location(
+                        parent, str(parent_init),
+                        submodule_search_locations=[str(parent_path)]
+                    )
+                    if spec:
+                        parent_mod = importlib.util.module_from_spec(spec)
+                        sys.modules[parent] = parent_mod
+                        try:
+                            spec.loader.exec_module(parent_mod)
+                        except Exception:
+                            pass
+
+        # Now load the engine module
+        spec = importlib.util.spec_from_file_location(
+            module_name, str(init_file),
+            submodule_search_locations=[str(engine_dir)]
+        )
+        if not spec:
+            return None
+
+        mod = importlib.util.module_from_spec(spec)
+        sys.modules[module_name] = mod
+
+        # Register submodules so relative imports work
+        for sub_file in engine_dir.glob("*.py"):
+            if sub_file.name == "__init__.py":
+                continue
+            sub_name = sub_file.stem
+            full_sub_name = f"{module_name}.{sub_name}"
+            if full_sub_name not in sys.modules:
+                sub_spec = importlib.util.spec_from_file_location(
+                    full_sub_name, str(sub_file)
+                )
+                if sub_spec:
+                    sub_mod = importlib.util.module_from_spec(sub_spec)
+                    sys.modules[full_sub_name] = sub_mod
+                    try:
+                        sub_spec.loader.exec_module(sub_mod)
+                    except Exception as e:
+                        logger.debug("Failed to load submodule %s: %s", full_sub_name, e)
+
+        try:
+            spec.loader.exec_module(mod)
+        except Exception as e:
+            logger.debug("Failed to exec_module %s: %s", module_name, e)
+            sys.modules.pop(module_name, None)
+            return None
+
+    # Try register(ctx) pattern first (how plugins are written)
+    if hasattr(mod, "register"):
+        collector = _EngineCollector()
+        try:
+            mod.register(collector)
+            if collector.engine:
+                return collector.engine
+        except Exception as e:
+            logger.debug("register() failed for %s: %s", name, e)
+
+    # Fallback: find a ContextEngine subclass and instantiate it
+    from agent.context_engine import ContextEngine
+    for attr_name in dir(mod):
+        attr = getattr(mod, attr_name, None)
+        if (isinstance(attr, type) and issubclass(attr, ContextEngine)
+                and attr is not ContextEngine):
+            try:
+                return attr()
+            except Exception:
+                pass
+
+    return None
+
+
+class _EngineCollector:
+    """Fake plugin context that captures register_context_engine calls."""
+
+    def __init__(self):
+        self.engine = None
+
+    def register_context_engine(self, engine):
+        self.engine = engine
+
+    # No-op for other registration methods
+    def register_tool(self, *args, **kwargs):
+        pass
+
+    def register_hook(self, *args, **kwargs):
+        pass
+
+    def register_cli_command(self, *args, **kwargs):
+        pass
+
+    def register_memory_provider(self, *args, **kwargs):
+        pass
@@ -16,7 +16,7 @@ dependencies = [
  "anthropic>=0.39.0,<1",
  "python-dotenv>=1.2.1,<2",
  "fire>=0.7.1,<1",
-  "httpx>=0.28.1,<1",
+  "httpx[socks]>=0.28.1,<1",
  "rich>=14.3.3,<15",
  "tenacity>=9.1.4,<10",
  "pyyaml>=6.0.2,<7",
@@ -43,7 +43,7 @@ dev = ["debugpy>=1.8.0,<2", "pytest>=9.0.2,<10", "pytest-asyncio>=1.3.0,<2", "py
 messaging = ["python-telegram-bot[webhooks]>=22.6,<23", "discord.py[voice]>=2.7.1,<3", "aiohttp>=3.13.3,<4", "slack-bolt>=1.18.0,<2", "slack-sdk>=3.27.0,<4"]
 cron = ["croniter>=6.0.0,<7"]
 slack = ["slack-bolt>=1.18.0,<2", "slack-sdk>=3.27.0,<4"]
-matrix = ["matrix-nio[e2e]>=0.24.0,<1", "Markdown>=3.6,<4"]
+matrix = ["mautrix[encryption]>=0.20,<1", "Markdown>=3.6,<4"]
 cli = ["simple-term-menu>=1.0,<2"]
 tts-premium = ["elevenlabs>=1.0,<2"]
 voice = [
@@ -88,10 +88,10 @@ all = [
  "hermes-agent[modal]",
  "hermes-agent[daytona]",
  "hermes-agent[messaging]",
-  # matrix excluded: python-olm (required by matrix-nio[e2e]) is upstream-broken
-  # on modern macOS (archived libolm, C++ errors with Clang 21+). Including it
-  # here causes the entire [all] install to fail, dropping all other extras.
-  # Users who need Matrix can install manually: pip install 'hermes-agent[matrix]'
+  # matrix: python-olm (required by matrix-nio[e2e]) is upstream-broken on
+  # modern macOS (archived libolm, C++ errors with Clang 21+).  On Linux the
+  # [matrix] extra's own marker pulls in the [e2e] variant automatically.
+  "hermes-agent[matrix]; sys_platform == 'linux'",
  "hermes-agent[cron]",
  "hermes-agent[cli]",
  "hermes-agent[dev]",
@@ -1268,20 +1268,88 @@ class AIAgent:
                                        pass
                        break
        
-        self.context_compressor = ContextCompressor(
-            model=self.model,
-            threshold_percent=compression_threshold,
-            protect_first_n=3,
-            protect_last_n=compression_protect_last,
-            summary_target_ratio=compression_target_ratio,
-            summary_model_override=compression_summary_model,
-            quiet_mode=self.quiet_mode,
-            base_url=self.base_url,
-            api_key=getattr(self, "api_key", ""),
-            config_context_length=_config_context_length,
-            provider=self.provider,
-        )
+        # Select context engine: config-driven (like memory providers).
+        # 1. Check config.yaml context.engine setting
+        # 2. Check plugins/context_engine/<name>/ directory (repo-shipped)
+        # 3. Check general plugin system (user-installed plugins)
+        # 4. Fall back to built-in ContextCompressor
+        _selected_engine = None
+        _engine_name = "compressor"  # default
+        try:
+            _ctx_cfg = _agent_cfg.get("context", {}) if isinstance(_agent_cfg, dict) else {}
+            _engine_name = _ctx_cfg.get("engine", "compressor") or "compressor"
+        except Exception:
+            pass
+
+        if _engine_name != "compressor":
+            # Try loading from plugins/context_engine/<name>/
+            try:
+                from plugins.context_engine import load_context_engine
+                _selected_engine = load_context_engine(_engine_name)
+            except Exception as _ce_load_err:
+                logger.debug("Context engine load from plugins/context_engine/: %s", _ce_load_err)
+
+            # Try general plugin system as fallback
+            if _selected_engine is None:
+                try:
+                    from hermes_cli.plugins import get_plugin_context_engine
+                    _candidate = get_plugin_context_engine()
+                    if _candidate and _candidate.name == _engine_name:
+                        _selected_engine = _candidate
+                except Exception:
+                    pass
+
+            if _selected_engine is None:
+                logger.warning(
+                    "Context engine '%s' not found — falling back to built-in compressor",
+                    _engine_name,
+                )
+        # else: config says "compressor" — use built-in, don't auto-activate plugins
+
+        if _selected_engine is not None:
+            self.context_compressor = _selected_engine
+            if not self.quiet_mode:
+                logger.info("Using context engine: %s", _selected_engine.name)
+        else:
+            self.context_compressor = ContextCompressor(
+                model=self.model,
+                threshold_percent=compression_threshold,
+                protect_first_n=3,
+                protect_last_n=compression_protect_last,
+                summary_target_ratio=compression_target_ratio,
+                summary_model_override=compression_summary_model,
+                quiet_mode=self.quiet_mode,
+                base_url=self.base_url,
+                api_key=getattr(self, "api_key", ""),
+                config_context_length=_config_context_length,
+                provider=self.provider,
+            )
        self.compression_enabled = compression_enabled
+
+        # Inject context engine tool schemas (e.g. lcm_grep, lcm_describe, lcm_expand)
+        self._context_engine_tool_names: set = set()
+        if hasattr(self, "context_compressor") and self.context_compressor and self.tools is not None:
+            for _schema in self.context_compressor.get_tool_schemas():
+                _wrapped = {"type": "function", "function": _schema}
+                self.tools.append(_wrapped)
+                _tname = _schema.get("name", "")
+                if _tname:
+                    self.valid_tool_names.add(_tname)
+                    self._context_engine_tool_names.add(_tname)
+
+        # Notify context engine of session start
+        if hasattr(self, "context_compressor") and self.context_compressor:
+            try:
+                self.context_compressor.on_session_start(
+                    self.session_id,
+                    hermes_home=str(get_hermes_home()),
+                    platform=self.platform or "cli",
+                    model=self.model,
+                    context_length=getattr(self.context_compressor, "context_length", 0),
+                )
+            except Exception as _ce_err:
+                logger.debug("Context engine on_session_start: %s", _ce_err)
+
        self._subdirectory_hints = SubdirectoryHintTracker(
            working_dir=os.getenv("TERMINAL_CWD") or None,
        )
@@ -1347,11 +1415,13 @@ class AIAgent:
            "api_key": getattr(self, "api_key", ""),
            "client_kwargs": dict(self._client_kwargs),
            "use_prompt_caching": self._use_prompt_caching,
-            # Compressor state that _try_activate_fallback() overwrites
-            "compressor_model": _cc.model,
-            "compressor_base_url": _cc.base_url,
+            # Context engine state that _try_activate_fallback() overwrites.
+            # Use getattr for model/base_url/api_key/provider since plugin
+            # engines may not have these (they're ContextCompressor-specific).
+            "compressor_model": getattr(_cc, "model", self.model),
+            "compressor_base_url": getattr(_cc, "base_url", self.base_url),
            "compressor_api_key": getattr(_cc, "api_key", ""),
-            "compressor_provider": _cc.provider,
+            "compressor_provider": getattr(_cc, "provider", self.provider),
            "compressor_context_length": _cc.context_length,
            "compressor_threshold_tokens": _cc.threshold_tokens,
        }
@@ -1397,15 +1467,9 @@ class AIAgent:
        # Turn counter (added after reset_session_state was first written — #2635)
        self._user_turn_count = 0

-        # Context compressor internal counters (if present)
+        # Context engine reset (works for both built-in compressor and plugins)
        if hasattr(self, "context_compressor") and self.context_compressor:
-            self.context_compressor.last_prompt_tokens = 0
-            self.context_compressor.last_completion_tokens = 0
-            self.context_compressor.compression_count = 0
-            self.context_compressor._context_probed = False
-            self.context_compressor._context_probe_persistable = False
-            # Iterative summary from previous session must not bleed into new one (#2635)
-            self.context_compressor._previous_summary = None
+            self.context_compressor.on_session_reset()
    
    def switch_model(self, new_model, new_provider, api_key='', base_url='', api_mode=''):
        """Switch the model/provider in-place for a live agent.
@@ -1486,13 +1550,12 @@ class AIAgent:
                provider=self.provider,
                config_context_length=getattr(self, "_config_context_length", None),
            )
-            self.context_compressor.model = self.model
-            self.context_compressor.base_url = self.base_url
-            self.context_compressor.api_key = self.api_key
-            self.context_compressor.provider = self.provider
-            self.context_compressor.context_length = new_context_length
-            self.context_compressor.threshold_tokens = int(
-                new_context_length * self.context_compressor.threshold_percent
+            self.context_compressor.update_model(
+                model=self.model,
+                context_length=new_context_length,
+                base_url=self.base_url,
+                api_key=getattr(self, "api_key", ""),
+                provider=self.provider,
            )

        # ── Invalidate cached system prompt so it rebuilds next turn ──
@@ -1508,10 +1571,10 @@ class AIAgent:
            "api_key": getattr(self, "api_key", ""),
            "client_kwargs": dict(self._client_kwargs),
            "use_prompt_caching": self._use_prompt_caching,
-            "compressor_model": _cc.model if _cc else self.model,
-            "compressor_base_url": _cc.base_url if _cc else self.base_url,
+            "compressor_model": getattr(_cc, "model", self.model) if _cc else self.model,
+            "compressor_base_url": getattr(_cc, "base_url", self.base_url) if _cc else self.base_url,
            "compressor_api_key": getattr(_cc, "api_key", "") if _cc else "",
-            "compressor_provider": _cc.provider if _cc else self.provider,
+            "compressor_provider": getattr(_cc, "provider", self.provider) if _cc else self.provider,
            "compressor_context_length": _cc.context_length if _cc else 0,
            "compressor_threshold_tokens": _cc.threshold_tokens if _cc else 0,
        }
@@ -1977,19 +2040,14 @@ class AIAgent:
            except Exception as e:
                logger.debug("Background memory/skill review failed: %s", e)
            finally:
-                # Explicitly close the OpenAI/httpx client so GC doesn't
-                # try to clean it up on a dead asyncio event loop (which
-                # produces "Event loop is closed" errors in the terminal).
+                # Close all resources (httpx client, subprocesses, etc.) so
+                # GC doesn't try to clean them up on a dead asyncio event
+                # loop (which produces "Event loop is closed" errors).
                if review_agent is not None:
-                    client = getattr(review_agent, "client", None)
-                    if client is not None:
-                        try:
-                            review_agent._close_openai_client(
-                                client, reason="bg_review_done", shared=True
-                            )
-                            review_agent.client = None
-                        except Exception:
-                            pass
+                    try:
+                        review_agent.close()
+                    except Exception:
+                        pass

        t = threading.Thread(target=_run_review, daemon=True, name="bg-review")
        t.start()
@@ -2713,10 +2771,11 @@ class AIAgent:
        }

    def shutdown_memory_provider(self, messages: list = None) -> None:
-        """Shut down the memory provider — call at actual session boundaries.
+        """Shut down the memory provider and context engine — call at actual session boundaries.

        This calls on_session_end() then shutdown_all() on the memory
-        manager. NOT called per-turn — only at CLI exit, /reset, gateway
+        manager, and on_session_end() on the context engine.
+        NOT called per-turn — only at CLI exit, /reset, gateway
        session expiry, etc.
        """
        if self._memory_manager:
@@ -2728,7 +2787,74 @@ class AIAgent:
                self._memory_manager.shutdown_all()
            except Exception:
                pass
+        # Notify context engine of session end (flush DAG, close DBs, etc.)
+        if hasattr(self, "context_compressor") and self.context_compressor:
+            try:
+                self.context_compressor.on_session_end(
+                    self.session_id or "",
+                    messages or [],
+                )
+            except Exception:
+                pass
    
+    def close(self) -> None:
+        """Release all resources held by this agent instance.
+
+        Cleans up subprocess resources that would otherwise become orphans:
+        - Background processes tracked in ProcessRegistry
+        - Terminal sandbox environments
+        - Browser daemon sessions
+        - Active child agents (subagent delegation)
+        - OpenAI/httpx client connections
+
+        Safe to call multiple times (idempotent).  Each cleanup step is
+        independently guarded so a failure in one does not prevent the rest.
+        """
+        task_id = getattr(self, "session_id", None) or ""
+
+        # 1. Kill background processes for this task
+        try:
+            from tools.process_registry import process_registry
+            process_registry.kill_all(task_id=task_id)
+        except Exception:
+            pass
+
+        # 2. Clean terminal sandbox environments
+        try:
+            from tools.terminal_tool import cleanup_vm
+            cleanup_vm(task_id)
+        except Exception:
+            pass
+
+        # 3. Clean browser daemon sessions
+        try:
+            from tools.browser_tool import cleanup_browser
+            cleanup_browser(task_id)
+        except Exception:
+            pass
+
+        # 4. Close active child agents
+        try:
+            with self._active_children_lock:
+                children = list(self._active_children)
+                self._active_children.clear()
+            for child in children:
+                try:
+                    child.close()
+                except Exception:
+                    pass
+        except Exception:
+            pass
+
+        # 5. Close the OpenAI/httpx client
+        try:
+            client = getattr(self, "client", None)
+            if client is not None:
+                self._close_openai_client(client, reason="agent_close", shared=True)
+                self.client = None
+        except Exception:
+            pass
+
    def _hydrate_todo_store(self, history: List[Dict[str, Any]]) -> None:
        """
        Recover todo state from conversation history.
@@ -4299,7 +4425,7 @@ class AIAgent:
            self._anthropic_api_key = runtime_key
            self._anthropic_base_url = runtime_base
            self._anthropic_client = build_anthropic_client(runtime_key, runtime_base)
-            self._is_anthropic_oauth = _is_oauth_token(runtime_key) if self.provider == "anthropic" else False
+            self._is_anthropic_oauth = _is_oauth_token(runtime_key)
            self.api_key = runtime_key
            self.base_url = runtime_base
            return
@@ -5187,13 +5313,12 @@ class AIAgent:
                    self.model, base_url=self.base_url,
                    api_key=self.api_key, provider=self.provider,
                )
-                self.context_compressor.model = self.model
-                self.context_compressor.base_url = self.base_url
-                self.context_compressor.api_key = self.api_key
-                self.context_compressor.provider = self.provider
-                self.context_compressor.context_length = fb_context_length
-                self.context_compressor.threshold_tokens = int(
-                    fb_context_length * self.context_compressor.threshold_percent
+                self.context_compressor.update_model(
+                    model=self.model,
+                    context_length=fb_context_length,
+                    base_url=self.base_url,
+                    api_key=getattr(self, "api_key", ""),
+                    provider=self.provider,
                )

            self._emit_status(
@@ -5253,14 +5378,15 @@ class AIAgent:
                    shared=True,
                )

-            # ── Restore context compressor state ──
+            # ── Restore context engine state ──
            cc = self.context_compressor
-            cc.model = rt["compressor_model"]
-            cc.base_url = rt["compressor_base_url"]
-            cc.api_key = rt["compressor_api_key"]
-            cc.provider = rt["compressor_provider"]
-            cc.context_length = rt["compressor_context_length"]
-            cc.threshold_tokens = rt["compressor_threshold_tokens"]
+            cc.update_model(
+                model=rt["compressor_model"],
+                context_length=rt["compressor_context_length"],
+                base_url=rt["compressor_base_url"],
+                api_key=rt["compressor_api_key"],
+                provider=rt["compressor_provider"],
+            )

            # ── Reset fallback chain for the new turn ──
            self._fallback_activated = False
@@ -6825,6 +6951,29 @@ class AIAgent:
                        spinner.stop(cute_msg)
                    elif self._should_emit_quiet_tool_messages():
                        self._vprint(f"  {cute_msg}")
+            elif self._context_engine_tool_names and function_name in self._context_engine_tool_names:
+                # Context engine tools (lcm_grep, lcm_describe, lcm_expand, etc.)
+                spinner = None
+                if self.quiet_mode and not self.tool_progress_callback:
+                    face = random.choice(KawaiiSpinner.KAWAII_WAITING)
+                    emoji = _get_tool_emoji(function_name)
+                    preview = _build_tool_preview(function_name, function_args) or function_name
+                    spinner = KawaiiSpinner(f"{face} {emoji} {preview}", spinner_type='dots', print_fn=self._print_fn)
+                    spinner.start()
+                _ce_result = None
+                try:
+                    function_result = self.context_compressor.handle_tool_call(function_name, function_args, messages=messages)
+                    _ce_result = function_result
+                except Exception as tool_error:
+                    function_result = json.dumps({"error": f"Context engine tool '{function_name}' failed: {tool_error}"})
+                    logger.error("context_engine.handle_tool_call raised for %s: %s", function_name, tool_error, exc_info=True)
+                finally:
+                    tool_duration = time.time() - tool_start_time
+                    cute_msg = _get_cute_tool_message_impl(function_name, function_args, tool_duration, result=_ce_result)
+                    if spinner:
+                        spinner.stop(cute_msg)
+                    elif self.quiet_mode:
+                        self._vprint(f"  {cute_msg}")
            elif self._memory_manager and self._memory_manager.has_tool(function_name):
                # Memory provider tools (hindsight_retain, honcho_search, etc.)
                # These are not in the tool registry — route through MemoryManager.
@@ -7708,6 +7857,7 @@ class AIAgent:

            finish_reason = "stop"
            response = None  # Guard against UnboundLocalError if all retries fail
+            api_kwargs = None  # Guard against UnboundLocalError in except handler

            while retry_count < max_retries:
                try:
@@ -8138,7 +8288,7 @@ class AIAgent:
                        # Cache discovered context length after successful call.
                        # Only persist limits confirmed by the provider (parsed
                        # from the error message), not guessed probe tiers.
-                        if self.context_compressor._context_probed:
+                        if getattr(self.context_compressor, "_context_probed", False):
                            ctx = self.context_compressor.context_length
                            if getattr(self.context_compressor, "_context_probe_persistable", False):
                                save_context_length(self.model, self.base_url, ctx)
@@ -8477,16 +8627,22 @@ class AIAgent:
                        compressor = self.context_compressor
                        old_ctx = compressor.context_length
                        if old_ctx > _reduced_ctx:
-                            compressor.context_length = _reduced_ctx
-                            compressor.threshold_tokens = int(
-                                _reduced_ctx * compressor.threshold_percent
+                            compressor.update_model(
+                                model=self.model,
+                                context_length=_reduced_ctx,
+                                base_url=self.base_url,
+                                api_key=getattr(self, "api_key", ""),
+                                provider=self.provider,
                            )
-                            compressor._context_probed = True
-                            # Don't persist — this is a subscription-tier
-                            # limitation, not a model capability.  If the user
-                            # later enables extra usage the 1M limit should
-                            # come back automatically.
-                            compressor._context_probe_persistable = False
+                            # Context probing flags — only set on built-in
+                            # compressor (plugin engines manage their own).
+                            if hasattr(compressor, "_context_probed"):
+                                compressor._context_probed = True
+                                # Don't persist — this is a subscription-tier
+                                # limitation, not a model capability.  If the
+                                # user later enables extra usage the 1M limit
+                                # should come back automatically.
+                                compressor._context_probe_persistable = False
                            self._vprint(
                                f"{self.log_prefix}⚠️  Anthropic long-context tier "
                                f"requires extra usage — reducing context: "
@@ -8650,17 +8806,25 @@ class AIAgent:
                            new_ctx = get_next_probe_tier(old_ctx)

                        if new_ctx and new_ctx < old_ctx:
-                            compressor.context_length = new_ctx
-                            compressor.threshold_tokens = int(new_ctx * compressor.threshold_percent)
-                            compressor._context_probed = True
-                            # Only persist limits parsed from the provider's
-                            # error message (a real number).  Guessed fallback
-                            # tiers from get_next_probe_tier() should stay
-                            # in-memory only — persisting them pollutes the
-                            # cache with wrong values.
-                            compressor._context_probe_persistable = bool(
-                                parsed_limit and parsed_limit == new_ctx
+                            compressor.update_model(
+                                model=self.model,
+                                context_length=new_ctx,
+                                base_url=self.base_url,
+                                api_key=getattr(self, "api_key", ""),
+                                provider=self.provider,
                            )
+                            # Context probing flags — only set on built-in
+                            # compressor (plugin engines manage their own).
+                            if hasattr(compressor, "_context_probed"):
+                                compressor._context_probed = True
+                                # Only persist limits parsed from the provider's
+                                # error message (a real number).  Guessed fallback
+                                # tiers from get_next_probe_tier() should stay
+                                # in-memory only — persisting them pollutes the
+                                # cache with wrong values.
+                                compressor._context_probe_persistable = bool(
+                                    parsed_limit and parsed_limit == new_ctx
+                                )
                            self._vprint(f"{self.log_prefix}⚠️  Context length exceeded — stepping down: {old_ctx:,} → {new_ctx:,} tokens", force=True)
                        else:
                            self._vprint(f"{self.log_prefix}⚠️  Context length exceeded at minimum tier — attempting compression...", force=True)
@@ -8742,9 +8906,10 @@ class AIAgent:
                        if self._try_activate_fallback():
                            retry_count = 0
                            continue
-                        self._dump_api_request_debug(
-                            api_kwargs, reason="non_retryable_client_error", error=api_error,
-                        )
+                        if api_kwargs is not None:
+                            self._dump_api_request_debug(
+                                api_kwargs, reason="non_retryable_client_error", error=api_error,
+                            )
                        self._emit_status(
                            f"❌ Non-retryable error (HTTP {status_code}): "
                            f"{self._summarize_api_error(api_error)}"
@@ -8847,9 +9012,10 @@ class AIAgent:
                            self.log_prefix, max_retries, _final_summary,
                            _provider, _model, len(api_messages), f"{approx_tokens:,}",
                        )
-                        self._dump_api_request_debug(
-                            api_kwargs, reason="max_retries_exhausted", error=api_error,
-                        )
+                        if api_kwargs is not None:
+                            self._dump_api_request_debug(
+                                api_kwargs, reason="max_retries_exhausted", error=api_error,
+                            )
                        self._persist_session(messages, conversation_history)
                        _final_response = f"API call failed after {max_retries} retries: {_final_summary}"
                        if _is_stream_drop:
@@ -9403,7 +9569,8 @@ class AIAgent:
                        fallback = getattr(self, '_last_content_with_tools', None)
                        if fallback:
                            _turn_exit_reason = "fallback_prior_turn_content"
-                            logger.debug("Empty follow-up after tool calls — using prior turn content as final response")
+                            logger.info("Empty follow-up after tool calls — using prior turn content as final response")
+                            self._emit_status("↻ Empty response after tool calls — using earlier content as final answer")
                            self._last_content_with_tools = None
                            self._empty_content_retries = 0
                            for i in range(len(messages) - 1, -1, -1):
@@ -9434,9 +9601,13 @@ class AIAgent:
                        )
                        if _has_structured and self._thinking_prefill_retries < 2:
                            self._thinking_prefill_retries += 1
-                            self._vprint(
-                                f"{self.log_prefix}↻ Thinking-only response — "
-                                f"prefilling to continue "
+                            logger.info(
+                                "Thinking-only response (no visible content) — "
+                                "prefilling to continue (%d/2)",
+                                self._thinking_prefill_retries,
+                            )
+                            self._emit_status(
+                                f"↻ Thinking-only response — prefilling to continue "
                                f"({self._thinking_prefill_retries}/2)"
                            )
                            interim_msg = self._build_assistant_message(
@@ -9452,23 +9623,57 @@ class AIAgent:
                        # Model returned nothing — no content, no
                        # structured reasoning, no tool calls.  Common
                        # with open models (transient provider issues,
-                        # rate limits, sampling flukes).  Silently retry
-                        # up to 3 times before giving up.  Skip when
+                        # rate limits, sampling flukes).  Retry up to 3
+                        # times before attempting fallback.  Skip when
                        # content has inline <think> tags (model chose
                        # to reason, just no visible text).
                        _truly_empty = not final_response.strip()
                        if _truly_empty and not _has_structured and self._empty_content_retries < 3:
                            self._empty_content_retries += 1
-                            self._vprint(
-                                f"{self.log_prefix}↻ Empty response (no content or reasoning) "
-                                f"— retrying ({self._empty_content_retries}/3)",
-                                force=True,
+                            logger.warning(
+                                "Empty response (no content or reasoning) — "
+                                "retry %d/3 (model=%s)",
+                                self._empty_content_retries, self.model,
+                            )
+                            self._emit_status(
+                                f"⚠️ Empty response from model — retrying "
+                                f"({self._empty_content_retries}/3)"
                            )
                            continue

-                        # Exhausted prefill attempts, empty retries, or
-                        # structured reasoning with no content —
-                        # fall through to "(empty)" terminal.
+                        # ── Exhausted retries — try fallback provider ──
+                        # Before giving up with "(empty)", attempt to
+                        # switch to the next provider in the fallback
+                        # chain.  This covers the case where a model
+                        # (e.g. GLM-4.5-Air) consistently returns empty
+                        # due to context degradation or provider issues.
+                        if _truly_empty and self._fallback_chain:
+                            logger.warning(
+                                "Empty response after %d retries — "
+                                "attempting fallback (model=%s, provider=%s)",
+                                self._empty_content_retries, self.model,
+                                self.provider,
+                            )
+                            self._emit_status(
+                                "⚠️ Model returning empty responses — "
+                                "switching to fallback provider..."
+                            )
+                            if self._try_activate_fallback():
+                                self._empty_content_retries = 0
+                                self._emit_status(
+                                    f"↻ Switched to fallback: {self.model} "
+                                    f"({self.provider})"
+                                )
+                                logger.info(
+                                    "Fallback activated after empty responses: "
+                                    "now using %s on %s",
+                                    self.model, self.provider,
+                                )
+                                continue
+
+                        # Exhausted retries and fallback chain (or no
+                        # fallback configured).  Fall through to the
+                        # "(empty)" terminal.
                        _turn_exit_reason = "empty_response_exhausted"
                        reasoning_text = self._extract_reasoning(assistant_message)
                        assistant_msg = self._build_assistant_message(assistant_message, finish_reason)
@@ -9477,9 +9682,28 @@ class AIAgent:

                        if reasoning_text:
                            reasoning_preview = reasoning_text[:500] + "..." if len(reasoning_text) > 500 else reasoning_text
-                            self._vprint(f"{self.log_prefix}ℹ️  Reasoning-only response (no visible content). Reasoning: {reasoning_preview}")
+                            logger.warning(
+                                "Reasoning-only response (no visible content) "
+                                "after exhausting retries and fallback. "
+                                "Reasoning: %s", reasoning_preview,
+                            )
+                            self._emit_status(
+                                "⚠️ Model produced reasoning but no visible "
+                                "response after all retries. Returning empty."
+                            )
                        else:
-                            self._vprint(f"{self.log_prefix}ℹ️  Empty response (no content or reasoning) after 3 retries.")
+                            logger.warning(
+                                "Empty response (no content or reasoning) "
+                                "after %d retries. No fallback available. "
+                                "model=%s provider=%s",
+                                self._empty_content_retries, self.model,
+                                self.provider,
+                            )
+                            self._emit_status(
+                                "❌ Model returned no content after all retries"
+                                + (" and fallback attempts." if self._fallback_chain else
+                                   ". No fallback providers configured.")
+                            )

                        final_response = "(empty)"
                        break
@@ -1082,10 +1082,19 @@ install_node_deps() {
        log_success "Node.js dependencies installed"

        # Install Playwright browser + system dependencies.
-        # Playwright's install-deps only supports apt/dnf/zypper natively.
+        # Playwright's --with-deps only supports apt-based systems natively.
        # For Arch/Manjaro we install the system libs via pacman first.
+        # Other systems must install Chromium dependencies manually.
        log_info "Installing browser engine (Playwright Chromium)..."
        case "$DISTRO" in
+            ubuntu|debian|raspbian|pop|linuxmint|elementary|zorin|kali|parrot)
+                log_info "Playwright may request sudo to install browser system dependencies (shared libraries)."
+                log_info "This is standard Playwright setup — Hermes itself does not require root access."
+                cd "$INSTALL_DIR" && npx playwright install --with-deps chromium 2>/dev/null || {
+                    log_warn "Playwright browser installation failed — browser tools will not work."
+                    log_warn "Try running manually: cd $INSTALL_DIR && npx playwright install --with-deps chromium"
+                }
+                ;;
            arch|manjaro)
                if command -v pacman &> /dev/null; then
                    log_info "Arch/Manjaro detected — installing Chromium system dependencies via pacman..."
@@ -1100,15 +1109,35 @@ install_node_deps() {
                        log_warn "  sudo pacman -S nss atk at-spi2-core cups libdrm libxkbcommon mesa pango cairo alsa-lib"
                    fi
                fi
-                cd "$INSTALL_DIR" && npx playwright install chromium 2>/dev/null || true
+                cd "$INSTALL_DIR" && npx playwright install chromium 2>/dev/null || {
+                    log_warn "Playwright browser installation failed — browser tools will not work."
+                }
+                ;;
+            fedora|rhel|centos|rocky|alma)
+                log_warn "Playwright does not support automatic dependency installation on RPM-based systems."
+                log_info "Install Chromium system dependencies manually before using browser tools:"
+                log_info "  sudo dnf install nss atk at-spi2-core cups-libs libdrm libxkbcommon mesa-libgbm pango cairo alsa-lib"
+                cd "$INSTALL_DIR" && npx playwright install chromium 2>/dev/null || {
+                    log_warn "Playwright browser installation failed — install dependencies above and retry."
+                }
+                ;;
+            opensuse*|sles)
+                log_warn "Playwright does not support automatic dependency installation on zypper-based systems."
+                log_info "Install Chromium system dependencies manually before using browser tools:"
+                log_info "  sudo zypper install mozilla-nss libatk-1_0-0 at-spi2-core cups-libs libdrm2 libxkbcommon0 Mesa-libgbm1 pango cairo libasound2"
+                cd "$INSTALL_DIR" && npx playwright install chromium 2>/dev/null || {
+                    log_warn "Playwright browser installation failed — install dependencies above and retry."
+                }
                ;;
            *)
-                log_info "Playwright may request sudo to install browser system dependencies (shared libraries)."
-                log_info "This is standard Playwright setup — Hermes itself does not require root access."
-                cd "$INSTALL_DIR" && npx playwright install --with-deps chromium 2>/dev/null || true
+                log_warn "Playwright does not support automatic dependency installation on $DISTRO."
+                log_info "Install Chromium/browser system dependencies for your distribution, then run:"
+                log_info "  cd $INSTALL_DIR && npx playwright install chromium"
+                log_info "Browser tools will not work until dependencies are installed."
+                cd "$INSTALL_DIR" && npx playwright install chromium 2>/dev/null || true
                ;;
        esac
-        log_success "Browser engine installed"
+        log_success "Browser engine setup complete"
    fi

    # Install WhatsApp bridge dependencies
@@ -203,3 +203,30 @@ For segmented videos (quotes, scenes, chapters), render each as a separate clip
 | `references/inputs.md` | Audio analysis (FFT, bands, beats), video sampling, image conversion, text/lyrics, TTS integration (ElevenLabs, voice assignment, audio mixing) |
 | `references/optimization.md` | Hardware detection, quality profiles, vectorized patterns, parallel rendering, memory management, performance budgets |
 | `references/troubleshooting.md` | NumPy broadcasting traps, blend mode pitfalls, multiprocessing/pickling, brightness diagnostics, ffmpeg issues, font problems, common mistakes |
+
+---
+
+## Creative Divergence (use only when user requests experimental/creative/unique output)
+
+If the user asks for creative, experimental, surprising, or unconventional output, select the strategy that best fits and reason through its steps BEFORE generating code.
+
+- **Forced Connections** — when the user wants cross-domain inspiration ("make it look organic," "industrial aesthetic")
+- **Conceptual Blending** — when the user names two things to combine ("ocean meets music," "space + calligraphy")
+- **Oblique Strategies** — when the user is maximally open ("surprise me," "something I've never seen")
+
+### Forced Connections
+1. Pick a domain unrelated to the visual goal (weather systems, microbiology, architecture, fluid dynamics, textile weaving)
+2. List its core visual/structural elements (erosion → gradual reveal; mitosis → splitting duplication; weaving → interlocking patterns)
+3. Map those elements onto ASCII characters and animation patterns
+4. Synthesize — what does "erosion" or "crystallization" look like in a character grid?
+
+### Conceptual Blending
+1. Name two distinct visual/conceptual spaces (e.g., ocean waves + sheet music)
+2. Map correspondences (crests = high notes, troughs = rests, foam = staccato)
+3. Blend selectively — keep the most interesting mappings, discard forced ones
+4. Develop emergent properties that exist only in the blend
+
+### Oblique Strategies
+1. Draw one: "Honor thy error as a hidden intention" / "Use an old idea" / "What would your closest friend do?" / "Emphasize the flaws" / "Turn it upside down" / "Only a part, not the whole" / "Reverse"
+2. Interpret the directive against the current ASCII animation challenge
+3. Apply the lateral insight to the visual design before writing code
@@ -0,0 +1,147 @@
+---
+name: ideation
+title: Creative Ideation — Constraint-Driven Project Generation
+description: "Generate project ideas through creative constraints. Use when the user says 'I want to build something', 'give me a project idea', 'I'm bored', 'what should I make', 'inspire me', or any variant of 'I have tools but no direction'. Works for code, art, hardware, writing, tools, and anything that can be made."
+version: 1.0.0
+author: SHL0MS
+license: MIT
+metadata:
+  hermes:
+    tags: [Creative, Ideation, Projects, Brainstorming, Inspiration]
+    category: creative
+    requires_toolsets: []
+---
+
+# Creative Ideation
+
+Generate project ideas through creative constraints. Constraint + direction = creativity.
+
+## How It Works
+
+1. **Pick a constraint** from the library below — random, or matched to the user's domain/mood
+2. **Interpret it broadly** — a coding prompt can become a hardware project, an art prompt can become a CLI tool
+3. **Generate 3 concrete project ideas** that satisfy the constraint
+4. **If they pick one, build it** — create the project, write the code, ship it
+
+## The Rule
+
+Every prompt is interpreted as broadly as possible. "Does this include X?" → Yes. The prompts provide direction and mild constraint. Without either, there is no creativity.
+
+## Constraint Library
+
+### For Developers
+
+**Solve your own itch:**
+Build the tool you wished existed this week. Under 50 lines. Ship it today.
+
+**Automate the annoying thing:**
+What's the most tedious part of your workflow? Script it away. Two hours to fix a problem that costs you five minutes a day.
+
+**The CLI tool that should exist:**
+Think of a command you've wished you could type. `git undo-that-thing-i-just-did`. `docker why-is-this-broken`. `npm explain-yourself`. Now build it.
+
+**Nothing new except glue:**
+Make something entirely from existing APIs, libraries, and datasets. The only original contribution is how you connect them.
+
+**Frankenstein week:**
+Take something that does X and make it do Y. A git repo that plays music. A Dockerfile that generates poetry. A cron job that sends compliments.
+
+**Subtract:**
+How much can you remove from a codebase before it breaks? Strip a tool to its minimum viable function. Delete until only the essence remains.
+
+**High concept, low effort:**
+A deep idea, lazily executed. The concept should be brilliant. The implementation should take an afternoon. If it takes longer, you're overthinking it.
+
+### For Makers & Artists
+
+**Blatantly copy something:**
+Pick something you admire — a tool, an artwork, an interface. Recreate it from scratch. The learning is in the gap between your version and theirs.
+
+**One million of something:**
+One million is both a lot and not that much. One million pixels is a 1MB photo. One million API calls is a Tuesday. One million of anything becomes interesting at scale.
+
+**Make something that dies:**
+A website that loses a feature every day. A chatbot that forgets. A countdown to nothing. An exercise in rot, killing, or letting go.
+
+**Do a lot of math:**
+Generative geometry, shader golf, mathematical art, computational origami. Time to re-learn what an arcsin is.
+
+### For Anyone
+
+**Text is the universal interface:**
+Build something where text is the only interface. No buttons, no graphics, just words in and words out. Text can go in and out of almost anything.
+
+**Start at the punchline:**
+Think of something that would be a funny sentence. Work backwards to make it real. "I taught my thermostat to gaslight me" → now build it.
+
+**Hostile UI:**
+Make something intentionally painful to use. A password field that requires 47 conditions. A form where every label lies. A CLI that judges your commands.
+
+**Take two:**
+Remember an old project. Do it again from scratch. No looking at the original. See what changed about how you think.
+
+See `references/full-prompt-library.md` for 30+ additional constraints across communication, scale, philosophy, transformation, and more.
+
+## Matching Constraints to Users
+
+| User says | Pick from |
+|-----------|-----------|
+| "I want to build something" (no direction) | Random — any constraint |
+| "I'm learning [language]" | Blatantly copy something, Automate the annoying thing |
+| "I want something weird" | Hostile UI, Frankenstein week, Start at the punchline |
+| "I want something useful" | Solve your own itch, The CLI that should exist, Automate the annoying thing |
+| "I want something beautiful" | Do a lot of math, One million of something |
+| "I'm burned out" | High concept low effort, Make something that dies |
+| "Weekend project" | Nothing new except glue, Start at the punchline |
+| "I want a challenge" | One million of something, Subtract, Take two |
+
+## Output Format
+
+```
+## Constraint: [Name]
+> [The constraint, one sentence]
+
+### Ideas
+
+1. **[One-line pitch]**
+   [2-3 sentences: what you'd build and why it's interesting]
+   ⏱ [weekend / week / month] • 🔧 [stack]
+
+2. **[One-line pitch]**
+   [2-3 sentences]
+   ⏱ ... • 🔧 ...
+
+3. **[One-line pitch]**
+   [2-3 sentences]
+   ⏱ ... • 🔧 ...
+```
+
+## Example
+
+```
+## Constraint: The CLI tool that should exist
+> Think of a command you've wished you could type. Now build it.
+
+### Ideas
+
+1. **`git whatsup` — show what happened while you were away**
+   Compares your last active commit to HEAD and summarizes what changed,
+   who committed, and what PRs merged. Like a morning standup from your repo.
+   ⏱ weekend • 🔧 Python, GitPython, click
+
+2. **`explain 503` — HTTP status codes for humans**
+   Pipe any status code or error message and get a plain-English explanation
+   with common causes and fixes. Pulls from a curated database, not an LLM.
+   ⏱ weekend • 🔧 Rust or Go, static dataset
+
+3. **`deps why <package>` — why is this in my dependency tree**
+   Traces a transitive dependency back to the direct dependency that pulled
+   it in. Answers "why do I have 47 copies of lodash" in one command.
+   ⏱ weekend • 🔧 Node.js, npm/yarn lockfile parsing
+```
+
+After the user picks one, start building — create the project, write the code, iterate.
+
+## Attribution
+
+Constraint approach inspired by [wttdotm.com/prompts.html](https://wttdotm.com/prompts.html). Adapted and expanded for software development and general-purpose ideation.
@@ -0,0 +1,110 @@
+# Full Prompt Library
+
+Extended constraint library beyond the core set in SKILL.md. Load these when the user wants more variety or a specific category.
+
+## Communication & Connection
+
+**Create a means of distribution:**
+The project works when you can use what you made to give something to somebody else.
+
+**Make a way to communicate:**
+The project works when you can hold a conversation with someone else using what you created. Not chat — something weirder.
+
+**Write a love letter:**
+To a person, a programming language, a game, a place, a tool. On paper, in code, in music, in light. Mail it.
+
+**Mail chess / Asynchronous games:**
+Something turn-based played with no time limit. No requirement to be there at the same time. The game happens in the gaps.
+
+**Twitch plays X:**
+A group of people share control over something. Collective input, emergent behavior.
+
+## Screens & Interfaces
+
+**Something for your desktop:**
+You spend a lot of time there. Spruce it up. A custom clock, a pet that lives in your terminal, a wallpaper that changes based on your git activity.
+
+**One screen, two screen, old screen, new screen:**
+Take something you associate with one screen and put it on a very different one. DOOM on a smart fridge. A spreadsheet on a watch. A terminal in a painting.
+
+**Make a mirror:**
+Something that reflects the viewer back at themselves. A website that shows your browsing history. A CLI that prints your git sins.
+
+## Philosophy & Concept
+
+**Code as koan, koan as code:**
+What is the sound of one hand clapping? A program that answers a question it wasn't asked. A function that returns before it's called.
+
+**The useless tree:**
+Make something useless. Deliberately, completely, beautifully useless. No utility. No purpose. No point. That's the point.
+
+**Artificial stupidity:**
+Make fun of AI by showcasing its faults. Mistrain it. Lie to it. Build the opposite of what AI is supposed to be good at.
+
+**"I use technology in order to hate it properly":**
+Make something inspired by the tension between loving and hating your tools.
+
+**The more things change, the more they stay the same:**
+Reflect on time, difference, and similarity.
+
+## Transformation
+
+**Translate:**
+Take something meant for one audience and make it understandable by another. A research paper as a children's book. An API as a board game. A song as an architecture diagram.
+
+**I mean, I GUESS you could store something that way:**
+The project works when you can save and open something. Store data in DNS caches. Encode a novel in emoji. Write a file system on top of something that isn't a file system.
+
+**I mean, I GUESS those could be pixels:**
+The project works when you can display an image. Render anything visual in a medium that wasn't meant for rendering.
+
+## Identity & Reflection
+
+**Make a self-portrait:**
+Be yourself? Be fake? Be real? In code, in data, in sound, in a directory structure.
+
+**Make a pun:**
+The stupider the better. Physical, digital, linguistic, visual. The project IS the joke.
+
+**Doors, walls, borders, barriers, boundaries:**
+Things that intermediate two places: opening, closing, permeating, excluding, combining.
+
+## Scale & Repetition
+
+**Lists!:**
+Itemizations, taxonomies, exhaustive recountings, iterations. This one. A list of list of lists.
+
+**Did you mean *recursion*?**
+Did you mean recursion?
+
+**Animals:**
+Lions, and tigers, and bears. Crab logic gates. Fish plays the stock market.
+
+**Cats:**
+Where would the internet be without them.
+
+## Starting Points
+
+**An idea that comes from a book:**
+Read something. Make something inspired by it.
+
+**Go to a museum:**
+Project ensues.
+
+**NPC loot:**
+What do you drop when you die? What do you take on your journey? Build the item.
+
+**Mythological objects and entities:**
+Pandora's box, the ocarina of time, the palantir. Build the artifact.
+
+**69:**
+Nice. Make something with the joke being the number 69.
+
+**Office Space printer scene:**
+Capture the same energy. Channel the catharsis of destroying the thing that frustrates you.
+
+**Borges week:**
+Something inspired by the Argentine. The library of babel. The map that is the territory.
+
+**Lights!:**
+LED throwies, light installations, illuminated anything. Make something that glows.
@@ -239,3 +239,26 @@ Always iterate at `-ql`. Only render `-qh` for final output.
 | `references/paper-explainer.md` | Turning research papers into animations — workflow, templates, domain patterns |
 | `references/decorations.md` | SurroundingRectangle, Brace, arrows, DashedLine, Angle, annotation lifecycle |
 | `references/production-quality.md` | Pre-code, pre-render, post-render checklists, spatial layout, color, tempo |
+
+---
+
+## Creative Divergence (use only when user requests experimental/creative/unique output)
+
+If the user asks for creative, experimental, or unconventional explanatory approaches, select a strategy and reason through it BEFORE designing the animation.
+
+- **SCAMPER** — when the user wants a fresh take on a standard explanation
+- **Assumption Reversal** — when the user wants to challenge how something is typically taught
+
+### SCAMPER Transformation
+Take a standard mathematical/technical visualization and transform it:
+- **Substitute**: replace the standard visual metaphor (number line → winding path, matrix → city grid)
+- **Combine**: merge two explanation approaches (algebraic + geometric simultaneously)
+- **Reverse**: derive backward — start from the result and deconstruct to axioms
+- **Modify**: exaggerate a parameter to show why it matters (10x the learning rate, 1000x the sample size)
+- **Eliminate**: remove all notation — explain purely through animation and spatial relationships
+
+### Assumption Reversal
+1. List what's "standard" about how this topic is visualized (left-to-right, 2D, discrete steps, formal notation)
+2. Pick the most fundamental assumption
+3. Reverse it (right-to-left derivation, 3D embedding of a 2D concept, continuous morphing instead of steps, zero notation)
+4. Explore what the reversal reveals that the standard approach hides
@@ -511,3 +511,37 @@ When building p5.js sketches:
 | `references/export-pipeline.md` | `saveCanvas()`, `saveGif()`, `saveFrames()`, deterministic headless capture, ffmpeg frame-to-video, CCapture.js, SVG export, per-clip architecture, platform export (fxhash), video gotchas |
 | `references/troubleshooting.md` | Performance profiling, per-pixel budgets, common mistakes, browser compatibility, WebGL debugging, font loading issues, pixel density traps, memory leaks, CORS |
 | `templates/viewer.html` | Interactive viewer template: seed navigation (prev/next/random/jump), parameter sliders, download PNG, responsive canvas. Start from this for explorable generative art |
+
+---
+
+## Creative Divergence (use only when user requests experimental/creative/unique output)
+
+If the user asks for creative, experimental, surprising, or unconventional output, select the strategy that best fits and reason through its steps BEFORE generating code.
+
+- **Conceptual Blending** — when the user names two things to combine or wants hybrid aesthetics
+- **SCAMPER** — when the user wants a twist on a known generative art pattern
+- **Distance Association** — when the user gives a single concept and wants exploration ("make something about time")
+
+### Conceptual Blending
+1. Name two distinct visual systems (e.g., particle physics + handwriting)
+2. Map correspondences (particles = ink drops, forces = pen pressure, fields = letterforms)
+3. Blend selectively — keep mappings that produce interesting emergent visuals
+4. Code the blend as a unified system, not two systems side-by-side
+
+### SCAMPER Transformation
+Take a known generative pattern (flow field, particle system, L-system, cellular automata) and systematically transform it:
+- **Substitute**: replace circles with text characters, lines with gradients
+- **Combine**: merge two patterns (flow field + voronoi)
+- **Adapt**: apply a 2D pattern to a 3D projection
+- **Modify**: exaggerate scale, warp the coordinate space
+- **Purpose**: use a physics sim for typography, a sorting algorithm for color
+- **Eliminate**: remove the grid, remove color, remove symmetry
+- **Reverse**: run the simulation backward, invert the parameter space
+
+### Distance Association
+1. Anchor on the user's concept (e.g., "loneliness")
+2. Generate associations at three distances:
+   - Close (obvious): empty room, single figure, silence
+   - Medium (interesting): one fish in a school swimming the wrong way, a phone with no notifications, the gap between subway cars
+   - Far (abstract): prime numbers, asymptotic curves, the color of 3am
+3. Develop the medium-distance associations — they're specific enough to visualize but unexpected enough to be interesting
@@ -39,8 +39,13 @@ class TestIsOAuthToken:
        assert _is_oauth_token("sk-ant-api03-abcdef1234567890") is False

    def test_managed_key(self):
-        # Managed keys from ~/.claude.json are NOT regular API keys
-        assert _is_oauth_token("ou1R1z-ft0A-bDeZ9wAA") is True
+        # Managed keys from ~/.claude.json without a recognisable Anthropic
+        # prefix are not positively identified as OAuth.  They enter the system
+        # via diagnostics-only read_claude_managed_key(), not via
+        # resolve_anthropic_token(), so they don't reach the OAuth gate in
+        # practice.  Third-party provider keys (MiniMax, Alibaba) also lack
+        # the sk-ant- prefix and must NOT be treated as OAuth.
+        assert _is_oauth_token("ou1R1z-ft0A-bDeZ9wAA") is False

    def test_jwt_token(self):
        # JWTs from OAuth flow
@@ -658,6 +658,19 @@ class TestGetTextAuxiliaryClient:
        assert client is None
        assert model is None

+    def test_custom_endpoint_uses_codex_wrapper_when_runtime_requests_responses_api(self):
+        with patch("agent.auxiliary_client._resolve_custom_runtime",
+                   return_value=("https://api.openai.com/v1", "sk-test", "codex_responses")), \
+             patch("agent.auxiliary_client._read_main_model", return_value="gpt-5.3-codex"), \
+             patch("agent.auxiliary_client.OpenAI") as mock_openai:
+            client, model = get_text_auxiliary_client()
+
+        from agent.auxiliary_client import CodexAuxiliaryClient
+        assert isinstance(client, CodexAuxiliaryClient)
+        assert model == "gpt-5.3-codex"
+        assert mock_openai.call_args.kwargs["base_url"] == "https://api.openai.com/v1"
+        assert mock_openai.call_args.kwargs["api_key"] == "sk-test"
+

 class TestVisionClientFallback:
    """Vision client auto mode resolves known-good multimodal backends."""
@@ -743,6 +756,69 @@ class TestAuxiliaryPoolAwareness:
        assert call_kwargs["base_url"] == "https://api.githubcopilot.com"
        assert call_kwargs["default_headers"]["Editor-Version"]

+    def test_copilot_responses_api_model_wrapped_in_codex_client(self, monkeypatch):
+        """Copilot GPT-5+ models (needing Responses API) are wrapped in CodexAuxiliaryClient."""
+        monkeypatch.delenv("GITHUB_TOKEN", raising=False)
+        monkeypatch.delenv("GH_TOKEN", raising=False)
+
+        with (
+            patch(
+                "hermes_cli.auth.resolve_api_key_provider_credentials",
+                return_value={
+                    "provider": "copilot",
+                    "api_key": "test-token",
+                    "base_url": "https://api.githubcopilot.com",
+                    "source": "gh auth token",
+                },
+            ),
+            patch("agent.auxiliary_client.OpenAI"),
+        ):
+            client, model = resolve_provider_client("copilot", model="gpt-5.4-mini")
+
+        from agent.auxiliary_client import CodexAuxiliaryClient
+        assert isinstance(client, CodexAuxiliaryClient)
+        assert model == "gpt-5.4-mini"
+
+    def test_copilot_chat_completions_model_not_wrapped(self, monkeypatch):
+        """Copilot models using Chat Completions are returned as plain OpenAI clients."""
+        monkeypatch.delenv("GITHUB_TOKEN", raising=False)
+        monkeypatch.delenv("GH_TOKEN", raising=False)
+
+        with (
+            patch(
+                "hermes_cli.auth.resolve_api_key_provider_credentials",
+                return_value={
+                    "provider": "copilot",
+                    "api_key": "test-token",
+                    "base_url": "https://api.githubcopilot.com",
+                    "source": "gh auth token",
+                },
+            ),
+            patch("agent.auxiliary_client.OpenAI") as mock_openai,
+        ):
+            client, model = resolve_provider_client("copilot", model="gpt-4.1-mini")
+
+        from agent.auxiliary_client import CodexAuxiliaryClient
+        assert not isinstance(client, CodexAuxiliaryClient)
+        assert model == "gpt-4.1-mini"
+        # Should be the raw mock OpenAI client
+        assert client is mock_openai.return_value
+
+    def test_vision_auto_uses_active_provider_as_fallback(self, monkeypatch):
+        """When no OpenRouter/Nous available, vision auto falls back to active provider."""
+        monkeypatch.setenv("ANTHROPIC_API_KEY", "***")
+        with (
+            patch("agent.auxiliary_client._read_nous_auth", return_value=None),
+            patch("agent.auxiliary_client._read_main_provider", return_value="anthropic"),
+            patch("agent.auxiliary_client._read_main_model", return_value="claude-sonnet-4"),
+            patch("agent.anthropic_adapter.build_anthropic_client", return_value=MagicMock()),
+            patch("agent.anthropic_adapter.resolve_anthropic_token", return_value="***"),
+        ):
+            client, model = get_vision_auxiliary_client()
+
+        assert client is not None
+        assert client.__class__.__name__ == "AnthropicAuxiliaryClient"
+
    def test_vision_auto_prefers_active_provider_over_openrouter(self, monkeypatch):
        """Active provider is tried before OpenRouter in vision auto."""
        monkeypatch.setenv("OPENROUTER_API_KEY", "or-key")
@@ -0,0 +1,250 @@
+"""Tests for the ContextEngine ABC and plugin slot."""
+
+import json
+import pytest
+from typing import Any, Dict, List
+
+from agent.context_engine import ContextEngine
+from agent.context_compressor import ContextCompressor
+
+
+# ---------------------------------------------------------------------------
+# A minimal concrete engine for testing the ABC
+# ---------------------------------------------------------------------------
+
+class StubEngine(ContextEngine):
+    """Minimal engine that satisfies the ABC without doing real work."""
+
+    def __init__(self, context_length=200000, threshold_pct=0.50):
+        self.context_length = context_length
+        self.threshold_tokens = int(context_length * threshold_pct)
+        self._compress_called = False
+        self._tools_called = []
+
+    @property
+    def name(self) -> str:
+        return "stub"
+
+    def update_from_response(self, usage: Dict[str, Any]) -> None:
+        self.last_prompt_tokens = usage.get("prompt_tokens", 0)
+        self.last_completion_tokens = usage.get("completion_tokens", 0)
+        self.last_total_tokens = usage.get("total_tokens", 0)
+
+    def should_compress(self, prompt_tokens: int = None) -> bool:
+        tokens = prompt_tokens if prompt_tokens is not None else self.last_prompt_tokens
+        return tokens >= self.threshold_tokens
+
+    def compress(self, messages: List[Dict[str, Any]], current_tokens: int = None) -> List[Dict[str, Any]]:
+        self._compress_called = True
+        self.compression_count += 1
+        # Trivial: just return as-is
+        return messages
+
+    def get_tool_schemas(self) -> List[Dict[str, Any]]:
+        return [
+            {
+                "name": "stub_search",
+                "description": "Search the stub engine",
+                "parameters": {"type": "object", "properties": {}},
+            }
+        ]
+
+    def handle_tool_call(self, name: str, args: Dict[str, Any]) -> str:
+        self._tools_called.append(name)
+        return json.dumps({"ok": True, "tool": name})
+
+
+# ---------------------------------------------------------------------------
+# ABC contract tests
+# ---------------------------------------------------------------------------
+
+class TestContextEngineABC:
+    """Verify the ABC enforces the required interface."""
+
+    def test_cannot_instantiate_abc_directly(self):
+        with pytest.raises(TypeError):
+            ContextEngine()
+
+    def test_missing_methods_raises(self):
+        """A subclass missing required methods cannot be instantiated."""
+        class Incomplete(ContextEngine):
+            @property
+            def name(self):
+                return "incomplete"
+        with pytest.raises(TypeError):
+            Incomplete()
+
+    def test_stub_engine_satisfies_abc(self):
+        engine = StubEngine()
+        assert isinstance(engine, ContextEngine)
+        assert engine.name == "stub"
+
+    def test_compressor_is_context_engine(self):
+        c = ContextCompressor(model="test", quiet_mode=True, config_context_length=200000)
+        assert isinstance(c, ContextEngine)
+        assert c.name == "compressor"
+
+
+# ---------------------------------------------------------------------------
+# Default method behavior
+# ---------------------------------------------------------------------------
+
+class TestDefaults:
+    """Verify ABC default implementations work correctly."""
+
+    def test_default_tool_schemas_empty(self):
+        engine = StubEngine()
+        # StubEngine overrides this, so test the base via super
+        assert ContextEngine.get_tool_schemas(engine) == []
+
+    def test_default_handle_tool_call_returns_error(self):
+        engine = StubEngine()
+        result = ContextEngine.handle_tool_call(engine, "unknown", {})
+        data = json.loads(result)
+        assert "error" in data
+
+    def test_default_get_status(self):
+        engine = StubEngine()
+        engine.last_prompt_tokens = 50000
+        status = engine.get_status()
+        assert status["last_prompt_tokens"] == 50000
+        assert status["context_length"] == 200000
+        assert status["threshold_tokens"] == 100000
+        assert 0 < status["usage_percent"] <= 100
+
+    def test_on_session_reset(self):
+        engine = StubEngine()
+        engine.last_prompt_tokens = 999
+        engine.compression_count = 3
+        engine.on_session_reset()
+        assert engine.last_prompt_tokens == 0
+        assert engine.compression_count == 0
+
+    def test_should_compress_preflight_default_false(self):
+        engine = StubEngine()
+        assert engine.should_compress_preflight([]) is False
+
+
+# ---------------------------------------------------------------------------
+# StubEngine behavior
+# ---------------------------------------------------------------------------
+
+class TestStubEngine:
+
+    def test_should_compress(self):
+        engine = StubEngine(context_length=100000, threshold_pct=0.50)
+        assert not engine.should_compress(40000)
+        assert engine.should_compress(50000)
+        assert engine.should_compress(60000)
+
+    def test_compress_tracks_count(self):
+        engine = StubEngine()
+        msgs = [{"role": "user", "content": "hello"}]
+        result = engine.compress(msgs)
+        assert result == msgs
+        assert engine._compress_called
+        assert engine.compression_count == 1
+
+    def test_tool_schemas(self):
+        engine = StubEngine()
+        schemas = engine.get_tool_schemas()
+        assert len(schemas) == 1
+        assert schemas[0]["name"] == "stub_search"
+
+    def test_handle_tool_call(self):
+        engine = StubEngine()
+        result = engine.handle_tool_call("stub_search", {})
+        assert json.loads(result)["ok"] is True
+        assert "stub_search" in engine._tools_called
+
+    def test_update_from_response(self):
+        engine = StubEngine()
+        engine.update_from_response({"prompt_tokens": 1000, "completion_tokens": 200, "total_tokens": 1200})
+        assert engine.last_prompt_tokens == 1000
+        assert engine.last_completion_tokens == 200
+
+
+# ---------------------------------------------------------------------------
+# ContextCompressor session reset via ABC
+# ---------------------------------------------------------------------------
+
+class TestCompressorSessionReset:
+    """Verify ContextCompressor.on_session_reset() clears all state."""
+
+    def test_reset_clears_state(self):
+        c = ContextCompressor(model="test", quiet_mode=True, config_context_length=200000)
+        c.last_prompt_tokens = 50000
+        c.compression_count = 3
+        c._previous_summary = "some old summary"
+        c._context_probed = True
+        c._context_probe_persistable = True
+
+        c.on_session_reset()
+
+        assert c.last_prompt_tokens == 0
+        assert c.last_completion_tokens == 0
+        assert c.last_total_tokens == 0
+        assert c.compression_count == 0
+        assert c._context_probed is False
+        assert c._context_probe_persistable is False
+        assert c._previous_summary is None
+
+
+# ---------------------------------------------------------------------------
+# Plugin slot (PluginManager integration)
+# ---------------------------------------------------------------------------
+
+class TestPluginContextEngineSlot:
+    """Test register_context_engine on PluginContext."""
+
+    def test_register_engine(self):
+        from hermes_cli.plugins import PluginManager, PluginContext, PluginManifest
+        mgr = PluginManager()
+        manifest = PluginManifest(name="test-lcm")
+        ctx = PluginContext(manifest, mgr)
+
+        engine = StubEngine()
+        ctx.register_context_engine(engine)
+
+        assert mgr._context_engine is engine
+        assert mgr._context_engine.name == "stub"
+
+    def test_reject_second_engine(self):
+        from hermes_cli.plugins import PluginManager, PluginContext, PluginManifest
+        mgr = PluginManager()
+        manifest = PluginManifest(name="test-lcm")
+        ctx = PluginContext(manifest, mgr)
+
+        engine1 = StubEngine()
+        engine2 = StubEngine()
+        ctx.register_context_engine(engine1)
+        ctx.register_context_engine(engine2)  # should be rejected
+
+        assert mgr._context_engine is engine1
+
+    def test_reject_non_engine(self):
+        from hermes_cli.plugins import PluginManager, PluginContext, PluginManifest
+        mgr = PluginManager()
+        manifest = PluginManifest(name="test-bad")
+        ctx = PluginContext(manifest, mgr)
+
+        ctx.register_context_engine("not an engine")
+        assert mgr._context_engine is None
+
+    def test_get_plugin_context_engine(self):
+        from hermes_cli.plugins import PluginManager, PluginContext, PluginManifest, get_plugin_context_engine, _plugin_manager
+        import hermes_cli.plugins as plugins_mod
+
+        # Inject a test manager
+        old_mgr = plugins_mod._plugin_manager
+        try:
+            mgr = PluginManager()
+            plugins_mod._plugin_manager = mgr
+
+            assert get_plugin_context_engine() is None
+
+            engine = StubEngine()
+            mgr._context_engine = engine
+            assert get_plugin_context_engine() is engine
+        finally:
+            plugins_mod._plugin_manager = old_mgr
@@ -0,0 +1,66 @@
+"""Tests for CLI manual compression messaging."""
+
+from unittest.mock import MagicMock, patch
+
+from tests.cli.test_cli_init import _make_cli
+
+
+def _make_history() -> list[dict[str, str]]:
+    return [
+        {"role": "user", "content": "one"},
+        {"role": "assistant", "content": "two"},
+        {"role": "user", "content": "three"},
+        {"role": "assistant", "content": "four"},
+    ]
+
+
+def test_manual_compress_reports_noop_without_success_banner(capsys):
+    shell = _make_cli()
+    history = _make_history()
+    shell.conversation_history = history
+    shell.agent = MagicMock()
+    shell.agent.compression_enabled = True
+    shell.agent._cached_system_prompt = ""
+    shell.agent._compress_context.return_value = (list(history), "")
+
+    def _estimate(messages):
+        assert messages == history
+        return 100
+
+    with patch("agent.model_metadata.estimate_messages_tokens_rough", side_effect=_estimate):
+        shell._manual_compress()
+
+    output = capsys.readouterr().out
+    assert "No changes from compression" in output
+    assert "✅ Compressed" not in output
+    assert "Rough transcript estimate: ~100 tokens (unchanged)" in output
+
+
+def test_manual_compress_explains_when_token_estimate_rises(capsys):
+    shell = _make_cli()
+    history = _make_history()
+    compressed = [
+        history[0],
+        {"role": "assistant", "content": "Dense summary that still counts as more tokens."},
+        history[-1],
+    ]
+    shell.conversation_history = history
+    shell.agent = MagicMock()
+    shell.agent.compression_enabled = True
+    shell.agent._cached_system_prompt = ""
+    shell.agent._compress_context.return_value = (compressed, "")
+
+    def _estimate(messages):
+        if messages == history:
+            return 100
+        if messages == compressed:
+            return 120
+        raise AssertionError(f"unexpected transcript: {messages!r}")
+
+    with patch("agent.model_metadata.estimate_messages_tokens_rough", side_effect=_estimate):
+        shell._manual_compress()
+
+    output = capsys.readouterr().out
+    assert "✅ Compressed: 4 → 3 messages" in output
+    assert "Rough transcript estimate: ~100 → ~120 tokens" in output
+    assert "denser summaries" in output
@@ -1,4 +1,4 @@
-"""Shared fixtures for Telegram gateway e2e tests.
+"""Shared fixtures for gateway e2e tests (Telegram, Discord).

 These tests exercise the full async message flow:
    adapter.handle_message(event)
@@ -14,19 +14,22 @@ import sys
 import uuid
 from datetime import datetime
 from types import SimpleNamespace
-from unittest.mock import AsyncMock, MagicMock
+from unittest.mock import AsyncMock, MagicMock, patch
+
+import pytest

 from gateway.config import GatewayConfig, Platform, PlatformConfig
 from gateway.platforms.base import MessageEvent, SendResult
 from gateway.session import SessionEntry, SessionSource, build_session_key


-#Ensure telegram module is available (mock it if not installed)
+# Platform library mocks

+# Ensure telegram module is available (mock it if not installed)
 def _ensure_telegram_mock():
    """Install mock telegram modules so TelegramAdapter can be imported."""
    if "telegram" in sys.modules and hasattr(sys.modules["telegram"], "__file__"):
-        return  # Real library installed
+        return # Real library installed

    telegram_mod = MagicMock()
    telegram_mod.Update = MagicMock()
@@ -51,24 +54,118 @@ def _ensure_telegram_mock():
        sys.modules.setdefault(name, telegram_mod)


-_ensure_telegram_mock()
+# Ensure discord module is available (mock it if not installed)
+def _ensure_discord_mock():
+    """Install mock discord modules so DiscordAdapter can be imported."""
+    if "discord" in sys.modules and hasattr(sys.modules["discord"], "__file__"):
+        return # Real library installed

+    discord_mod = MagicMock()
+    discord_mod.Intents.default.return_value = MagicMock()
+    discord_mod.DMChannel = type("DMChannel", (), {})
+    discord_mod.Thread = type("Thread", (), {})
+    discord_mod.ForumChannel = type("ForumChannel", (), {})
+    discord_mod.Interaction = object
+    discord_mod.app_commands = SimpleNamespace(
+        describe=lambda **kwargs: (lambda fn: fn),
+        choices=lambda **kwargs: (lambda fn: fn),
+        Choice=lambda **kwargs: SimpleNamespace(**kwargs),
+    )
+    discord_mod.opus.is_loaded.return_value = True
+
+    ext_mod = MagicMock()
+    commands_mod = MagicMock()
+    commands_mod.Bot = MagicMock
+    ext_mod.commands = commands_mod
+
+    sys.modules.setdefault("discord", discord_mod)
+    sys.modules.setdefault("discord.ext", ext_mod)
+    sys.modules.setdefault("discord.ext.commands", commands_mod)
+    sys.modules.setdefault("discord.opus", discord_mod.opus)
+
+
+def _ensure_slack_mock():
+    """Install mock slack modules so SlackAdapter can be imported."""
+    if "slack_bolt" in sys.modules and hasattr(sys.modules["slack_bolt"], "__file__"):
+        return  # Real library installed
+
+    slack_bolt = MagicMock()
+    slack_bolt.async_app.AsyncApp = MagicMock
+    slack_bolt.adapter.socket_mode.async_handler.AsyncSocketModeHandler = MagicMock
+
+    slack_sdk = MagicMock()
+    slack_sdk.web.async_client.AsyncWebClient = MagicMock
+
+    for name, mod in [
+        ("slack_bolt", slack_bolt),
+        ("slack_bolt.async_app", slack_bolt.async_app),
+        ("slack_bolt.adapter", slack_bolt.adapter),
+        ("slack_bolt.adapter.socket_mode", slack_bolt.adapter.socket_mode),
+        ("slack_bolt.adapter.socket_mode.async_handler", slack_bolt.adapter.socket_mode.async_handler),
+        ("slack_sdk", slack_sdk),
+        ("slack_sdk.web", slack_sdk.web),
+        ("slack_sdk.web.async_client", slack_sdk.web.async_client),
+    ]:
+        sys.modules.setdefault(name, mod)
+
+
+_ensure_telegram_mock()
+_ensure_discord_mock()
+_ensure_slack_mock()
+
+from gateway.platforms.discord import DiscordAdapter   # noqa: E402
 from gateway.platforms.telegram import TelegramAdapter  # noqa: E402

+import gateway.platforms.slack as _slack_mod  # noqa: E402
+_slack_mod.SLACK_AVAILABLE = True
+from gateway.platforms.slack import SlackAdapter  # noqa: E402

-#GatewayRunner factory (based on tests/gateway/test_status_command.py)

-def make_runner(session_entry: SessionEntry) -> "GatewayRunner":
+# Platform-generic factories
+
+def make_source(platform: Platform, chat_id: str = "e2e-chat-1", user_id: str = "e2e-user-1") -> SessionSource:
+    return SessionSource(
+        platform=platform,
+        chat_id=chat_id,
+        user_id=user_id,
+        user_name="e2e_tester",
+        chat_type="dm",
+    )
+
+
+def make_session_entry(platform: Platform, source: SessionSource = None) -> SessionEntry:
+    source = source or make_source(platform)
+    return SessionEntry(
+        session_key=build_session_key(source),
+        session_id=f"sess-{uuid.uuid4().hex[:8]}",
+        created_at=datetime.now(),
+        updated_at=datetime.now(),
+        platform=platform,
+        chat_type="dm",
+    )
+
+
+def make_event(platform: Platform, text: str = "/help", chat_id: str = "e2e-chat-1", user_id: str = "e2e-user-1") -> MessageEvent:
+    return MessageEvent(
+        text=text,
+        source=make_source(platform, chat_id, user_id),
+        message_id=f"msg-{uuid.uuid4().hex[:8]}",
+    )
+
+
+def make_runner(platform: Platform, session_entry: SessionEntry = None) -> "GatewayRunner":
    """Create a GatewayRunner with mocked internals for e2e testing.

    Skips __init__ to avoid filesystem/network side effects.
-    All command-dispatch dependencies are wired manually.
    """
    from gateway.run import GatewayRunner

+    if session_entry is None:
+        session_entry = make_session_entry(platform)
+
    runner = object.__new__(GatewayRunner)
    runner.config = GatewayConfig(
-        platforms={Platform.TELEGRAM: PlatformConfig(enabled=True, token="e2e-test-token")}
+        platforms={platform: PlatformConfig(enabled=True, token="e2e-test-token")}
    )
    runner.adapters = {}
    runner._voice_mode = {}
@@ -99,7 +196,6 @@ def make_runner(session_entry: SessionEntry) -> "GatewayRunner":
    runner._capture_gateway_honcho_if_configured = lambda *a, **kw: None
    runner._emit_gateway_run_progress = AsyncMock()

-    # Pairing store (used by authorization rejection path)
    runner.pairing_store = MagicMock()
    runner.pairing_store._is_rate_limited = MagicMock(return_value=False)
    runner.pairing_store.generate_code = MagicMock(return_value="ABC123")
@@ -107,67 +203,63 @@ def make_runner(session_entry: SessionEntry) -> "GatewayRunner":
    return runner


-#TelegramAdapter factory
+def make_adapter(platform: Platform, runner=None):
+    """Create a platform adapter wired to *runner*, with send methods mocked."""
+    if runner is None:
+        runner = make_runner(platform)

-def make_adapter(runner) -> TelegramAdapter:
-    """Create a TelegramAdapter wired to *runner*, with send methods mocked.
-
-    connect() is NOT called — no polling, no token lock, no real HTTP.
-    """
    config = PlatformConfig(enabled=True, token="e2e-test-token")
-    adapter = TelegramAdapter(config)

-    # Mock outbound methods so tests can capture what was sent
+    if platform == Platform.DISCORD:
+        with patch.object(DiscordAdapter, "_load_participated_threads", return_value=set()):
+            adapter = DiscordAdapter(config)
+        platform_key = Platform.DISCORD
+    elif platform == Platform.SLACK:
+        adapter = SlackAdapter(config)
+        platform_key = Platform.SLACK
+    else:
+        adapter = TelegramAdapter(config)
+        platform_key = Platform.TELEGRAM
+
    adapter.send = AsyncMock(return_value=SendResult(success=True, message_id="e2e-resp-1"))
    adapter.send_typing = AsyncMock()

-    # Wire adapter ↔ runner
    adapter.set_message_handler(runner._handle_message)
-    runner.adapters[Platform.TELEGRAM] = adapter
+    runner.adapters[platform_key] = adapter

    return adapter


-#Helpers
-
-def make_source(chat_id: str = "e2e-chat-1", user_id: str = "e2e-user-1") -> SessionSource:
-    return SessionSource(
-        platform=Platform.TELEGRAM,
-        chat_id=chat_id,
-        user_id=user_id,
-        user_name="e2e_tester",
-        chat_type="dm",
-    )
-
-
-def make_event(text: str, chat_id: str = "e2e-chat-1", user_id: str = "e2e-user-1") -> MessageEvent:
-    return MessageEvent(
-        text=text,
-        source=make_source(chat_id, user_id),
-        message_id=f"msg-{uuid.uuid4().hex[:8]}",
-    )
-
-
-def make_session_entry(source: SessionSource = None) -> SessionEntry:
-    source = source or make_source()
-    return SessionEntry(
-        session_key=build_session_key(source),
-        session_id=f"sess-{uuid.uuid4().hex[:8]}",
-        created_at=datetime.now(),
-        updated_at=datetime.now(),
-        platform=Platform.TELEGRAM,
-        chat_type="dm",
-    )
-
-
-async def send_and_capture(adapter: TelegramAdapter, text: str, **event_kwargs) -> AsyncMock:
-    """Send a message through the full e2e flow and return the send mock.
-
-    Drives: adapter.handle_message → background task → runner dispatch → adapter.send.
-    """
-    event = make_event(text, **event_kwargs)
+async def send_and_capture(adapter, text: str, platform: Platform, **event_kwargs) -> AsyncMock:
+    """Send a message through the full e2e flow and return the send mock."""
+    event = make_event(platform, text, **event_kwargs)
    adapter.send.reset_mock()
    await adapter.handle_message(event)
-    # Let the background task complete
    await asyncio.sleep(0.3)
    return adapter.send
+
+
+# Parametrized fixtures for platform-generic tests
+@pytest.fixture(params=[Platform.TELEGRAM, Platform.DISCORD, Platform.SLACK], ids=["telegram", "discord", "slack"])
+def platform(request):
+    return request.param
+
+
+@pytest.fixture()
+def source(platform):
+    return make_source(platform)
+
+
+@pytest.fixture()
+def session_entry(platform, source):
+    return make_session_entry(platform, source)
+
+
+@pytest.fixture()
+def runner(platform, session_entry):
+    return make_runner(platform, session_entry)
+
+
+@pytest.fixture()
+def adapter(platform, runner):
+    return make_adapter(platform, runner)
@@ -1,4 +1,4 @@
-"""E2E tests for Telegram gateway slash commands.
+"""E2E tests for gateway slash commands (Telegram, Discord).

 Each test drives a message through the full async pipeline:
    adapter.handle_message(event)
@@ -7,6 +7,7 @@ Each test drives a message through the full async pipeline:
        → adapter.send() (captured for assertions)

 No LLM involved — only gateway-level commands are tested.
+Tests are parametrized over platforms via the ``platform`` fixture in conftest.
 """

 import asyncio
@@ -15,46 +16,15 @@ from unittest.mock import AsyncMock
 import pytest

 from gateway.platforms.base import SendResult
-from tests.e2e.conftest import (
-    make_adapter,
-    make_event,
-    make_runner,
-    make_session_entry,
-    make_source,
-    send_and_capture,
-)
+from tests.e2e.conftest import make_event, send_and_capture


-#Fixtures
-
-@pytest.fixture()
-def source():
-    return make_source()
-
-
-@pytest.fixture()
-def session_entry(source):
-    return make_session_entry(source)
-
-
-@pytest.fixture()
-def runner(session_entry):
-    return make_runner(session_entry)
-
-
-@pytest.fixture()
-def adapter(runner):
-    return make_adapter(runner)
-
-
-#Tests
-
-class TestTelegramSlashCommands:
+class TestSlashCommands:
    """Gateway slash commands dispatched through the full adapter pipeline."""

    @pytest.mark.asyncio
-    async def test_help_returns_command_list(self, adapter):
-        send = await send_and_capture(adapter, "/help")
+    async def test_help_returns_command_list(self, adapter, platform):
+        send = await send_and_capture(adapter, "/help", platform)

        send.assert_called_once()
        response_text = send.call_args[1].get("content") or send.call_args[0][1]
@@ -62,24 +32,23 @@ class TestTelegramSlashCommands:
        assert "/status" in response_text

    @pytest.mark.asyncio
-    async def test_status_shows_session_info(self, adapter):
-        send = await send_and_capture(adapter, "/status")
+    async def test_status_shows_session_info(self, adapter, platform):
+        send = await send_and_capture(adapter, "/status", platform)

        send.assert_called_once()
        response_text = send.call_args[1].get("content") or send.call_args[0][1]
-        # Status output includes session metadata
        assert "session" in response_text.lower() or "Session" in response_text

    @pytest.mark.asyncio
-    async def test_new_resets_session(self, adapter, runner):
-        send = await send_and_capture(adapter, "/new")
+    async def test_new_resets_session(self, adapter, runner, platform):
+        send = await send_and_capture(adapter, "/new", platform)

        send.assert_called_once()
        runner.session_store.reset_session.assert_called_once()

    @pytest.mark.asyncio
-    async def test_stop_when_no_agent_running(self, adapter):
-        send = await send_and_capture(adapter, "/stop")
+    async def test_stop_when_no_agent_running(self, adapter, platform):
+        send = await send_and_capture(adapter, "/stop", platform)

        send.assert_called_once()
        response_text = send.call_args[1].get("content") or send.call_args[0][1]
@@ -87,8 +56,8 @@ class TestTelegramSlashCommands:
        assert "no" in response_lower or "stop" in response_lower or "not running" in response_lower

    @pytest.mark.asyncio
-    async def test_commands_shows_listing(self, adapter):
-        send = await send_and_capture(adapter, "/commands")
+    async def test_commands_shows_listing(self, adapter, platform):
+        send = await send_and_capture(adapter, "/commands", platform)

        send.assert_called_once()
        response_text = send.call_args[1].get("content") or send.call_args[0][1]
@@ -96,25 +65,25 @@ class TestTelegramSlashCommands:
        assert "/" in response_text

    @pytest.mark.asyncio
-    async def test_sequential_commands_share_session(self, adapter):
+    async def test_sequential_commands_share_session(self, adapter, platform):
        """Two commands from the same chat_id should both succeed."""
-        send_help = await send_and_capture(adapter, "/help")
+        send_help = await send_and_capture(adapter, "/help", platform)
        send_help.assert_called_once()

-        send_status = await send_and_capture(adapter, "/status")
+        send_status = await send_and_capture(adapter, "/status", platform)
        send_status.assert_called_once()

    @pytest.mark.asyncio
-    async def test_provider_shows_current_provider(self, adapter):
-        send = await send_and_capture(adapter, "/provider")
+    async def test_provider_shows_current_provider(self, adapter, platform):
+        send = await send_and_capture(adapter, "/provider", platform)

        send.assert_called_once()
        response_text = send.call_args[1].get("content") or send.call_args[0][1]
        assert "provider" in response_text.lower()

    @pytest.mark.asyncio
-    async def test_verbose_responds(self, adapter):
-        send = await send_and_capture(adapter, "/verbose")
+    async def test_verbose_responds(self, adapter, platform):
+        send = await send_and_capture(adapter, "/verbose", platform)

        send.assert_called_once()
        response_text = send.call_args[1].get("content") or send.call_args[0][1]
@@ -122,42 +91,50 @@ class TestTelegramSlashCommands:
        assert "verbose" in response_text.lower() or "tool_progress" in response_text

    @pytest.mark.asyncio
-    async def test_personality_lists_options(self, adapter):
-        send = await send_and_capture(adapter, "/personality")
+    async def test_personality_lists_options(self, adapter, platform):
+        send = await send_and_capture(adapter, "/personality", platform)

        send.assert_called_once()
        response_text = send.call_args[1].get("content") or send.call_args[0][1]
        assert "personalit" in response_text.lower()  # matches "personality" or "personalities"

    @pytest.mark.asyncio
-    async def test_yolo_toggles_mode(self, adapter):
-        send = await send_and_capture(adapter, "/yolo")
+    async def test_yolo_toggles_mode(self, adapter, platform):
+        send = await send_and_capture(adapter, "/yolo", platform)

        send.assert_called_once()
        response_text = send.call_args[1].get("content") or send.call_args[0][1]
        assert "yolo" in response_text.lower()

+    @pytest.mark.asyncio
+    async def test_compress_command(self, adapter, platform):
+        send = await send_and_capture(adapter, "/compress", platform)
+
+        send.assert_called_once()
+        response_text = send.call_args[1].get("content") or send.call_args[0][1]
+        assert "compress" in response_text.lower() or "context" in response_text.lower()
+

 class TestSessionLifecycle:
    """Verify session state changes across command sequences."""

    @pytest.mark.asyncio
-    async def test_new_then_status_reflects_reset(self, adapter, runner, session_entry):
+    async def test_new_then_status_reflects_reset(self, adapter, runner, session_entry, platform):
        """After /new, /status should report the fresh session."""
-        await send_and_capture(adapter, "/new")
+        await send_and_capture(adapter, "/new", platform)
        runner.session_store.reset_session.assert_called_once()

-        send = await send_and_capture(adapter, "/status")
+        send = await send_and_capture(adapter, "/status", platform)
        send.assert_called_once()
        response_text = send.call_args[1].get("content") or send.call_args[0][1]
        # Session ID from the entry should appear in the status output
        assert session_entry.session_id[:8] in response_text

    @pytest.mark.asyncio
-    async def test_new_is_idempotent(self, adapter, runner):
+    async def test_new_is_idempotent(self, adapter, runner, platform):
        """/new called twice should not crash."""
-        await send_and_capture(adapter, "/new")
-        await send_and_capture(adapter, "/new")
+        await send_and_capture(adapter, "/new", platform)
+        await send_and_capture(adapter, "/new", platform)
        assert runner.session_store.reset_session.call_count == 2


@@ -165,11 +142,11 @@ class TestAuthorization:
    """Verify the pipeline handles unauthorized users."""

    @pytest.mark.asyncio
-    async def test_unauthorized_user_gets_pairing_response(self, adapter, runner):
+    async def test_unauthorized_user_gets_pairing_response(self, adapter, runner, platform):
        """Unauthorized DM should trigger pairing code, not a command response."""
        runner._is_user_authorized = lambda _source: False

-        event = make_event("/help")
+        event = make_event(platform, "/help")
        adapter.send.reset_mock()
        await adapter.handle_message(event)
        await asyncio.sleep(0.3)
@@ -181,11 +158,11 @@ class TestAuthorization:
        assert "recognize" in response_text.lower() or "pair" in response_text.lower() or "ABC123" in response_text

    @pytest.mark.asyncio
-    async def test_unauthorized_user_does_not_get_help(self, adapter, runner):
+    async def test_unauthorized_user_does_not_get_help(self, adapter, runner, platform):
        """Unauthorized user should NOT see the help command output."""
        runner._is_user_authorized = lambda _source: False

-        event = make_event("/help")
+        event = make_event(platform, "/help")
        adapter.send.reset_mock()
        await adapter.handle_message(event)
        await asyncio.sleep(0.3)
@@ -200,12 +177,12 @@ class TestSendFailureResilience:
    """Verify the pipeline handles send failures gracefully."""

    @pytest.mark.asyncio
-    async def test_send_failure_does_not_crash_pipeline(self, adapter):
+    async def test_send_failure_does_not_crash_pipeline(self, adapter, platform):
        """If send() returns failure, the pipeline should not raise."""
        adapter.send = AsyncMock(return_value=SendResult(success=False, error="network timeout"))
-        adapter.set_message_handler(adapter._message_handler)  # re-wire with same handler
+        adapter.set_message_handler(adapter._message_handler) # re-wire with same handler

-        event = make_event("/help")
+        event = make_event(platform, "/help")
        # Should not raise — pipeline handles send failures internally
        await adapter.handle_message(event)
        await asyncio.sleep(0.3)
@@ -0,0 +1,110 @@
+import asyncio
+from unittest.mock import AsyncMock, MagicMock
+
+from gateway.config import GatewayConfig, Platform, PlatformConfig
+from gateway.platforms.base import BasePlatformAdapter, MessageEvent, SendResult
+from gateway.restart import DEFAULT_GATEWAY_RESTART_DRAIN_TIMEOUT
+from gateway.run import GatewayRunner
+from gateway.session import SessionSource
+
+
+class RestartTestAdapter(BasePlatformAdapter):
+    def __init__(self):
+        super().__init__(PlatformConfig(enabled=True, token="***"), Platform.TELEGRAM)
+        self.sent: list[str] = []
+
+    async def connect(self):
+        return True
+
+    async def disconnect(self):
+        return None
+
+    async def send(self, chat_id, content, reply_to=None, metadata=None):
+        self.sent.append(content)
+        return SendResult(success=True, message_id="1")
+
+    async def send_typing(self, chat_id, metadata=None):
+        return None
+
+    async def get_chat_info(self, chat_id):
+        return {"id": chat_id}
+
+
+def make_restart_source(chat_id: str = "123456", chat_type: str = "dm") -> SessionSource:
+    return SessionSource(
+        platform=Platform.TELEGRAM,
+        chat_id=chat_id,
+        chat_type=chat_type,
+    )
+
+
+def make_restart_runner(
+    adapter: BasePlatformAdapter | None = None,
+) -> tuple[GatewayRunner, BasePlatformAdapter]:
+    runner = object.__new__(GatewayRunner)
+    runner.config = GatewayConfig(
+        platforms={Platform.TELEGRAM: PlatformConfig(enabled=True, token="***")}
+    )
+    runner._running = True
+    runner._shutdown_event = asyncio.Event()
+    runner._exit_reason = None
+    runner._exit_code = None
+    runner._running_agents = {}
+    runner._running_agents_ts = {}
+    runner._pending_messages = {}
+    runner._pending_approvals = {}
+    runner._pending_model_notes = {}
+    runner._background_tasks = set()
+    runner._draining = False
+    runner._restart_requested = False
+    runner._restart_task_started = False
+    runner._restart_detached = False
+    runner._restart_via_service = False
+    runner._restart_drain_timeout = DEFAULT_GATEWAY_RESTART_DRAIN_TIMEOUT
+    runner._stop_task = None
+    runner._busy_input_mode = "interrupt"
+    runner._update_prompt_pending = {}
+    runner._voice_mode = {}
+    runner._session_model_overrides = {}
+    runner._shutdown_all_gateway_honcho = lambda: None
+    runner._update_runtime_status = MagicMock()
+    runner._queue_or_replace_pending_event = GatewayRunner._queue_or_replace_pending_event.__get__(
+        runner, GatewayRunner
+    )
+    runner._session_key_for_source = GatewayRunner._session_key_for_source.__get__(
+        runner, GatewayRunner
+    )
+    runner._handle_active_session_busy_message = (
+        GatewayRunner._handle_active_session_busy_message.__get__(runner, GatewayRunner)
+    )
+    runner._handle_restart_command = GatewayRunner._handle_restart_command.__get__(
+        runner, GatewayRunner
+    )
+    runner._status_action_label = GatewayRunner._status_action_label.__get__(
+        runner, GatewayRunner
+    )
+    runner._status_action_gerund = GatewayRunner._status_action_gerund.__get__(
+        runner, GatewayRunner
+    )
+    runner._queue_during_drain_enabled = GatewayRunner._queue_during_drain_enabled.__get__(
+        runner, GatewayRunner
+    )
+    runner._running_agent_count = GatewayRunner._running_agent_count.__get__(
+        runner, GatewayRunner
+    )
+    runner._launch_detached_restart_command = GatewayRunner._launch_detached_restart_command.__get__(
+        runner, GatewayRunner
+    )
+    runner.request_restart = GatewayRunner.request_restart.__get__(runner, GatewayRunner)
+    runner._is_user_authorized = lambda _source: True
+    runner.hooks = MagicMock()
+    runner.hooks.emit = AsyncMock()
+    runner.pairing_store = MagicMock()
+    runner.session_store = MagicMock()
+    runner.delivery_router = MagicMock()
+
+    platform_adapter = adapter or RestartTestAdapter()
+    platform_adapter.set_message_handler(AsyncMock(return_value=None))
+    platform_adapter.set_busy_session_handler(runner._handle_active_session_busy_message)
+    runner.adapters = {Platform.TELEGRAM: platform_adapter}
+    return runner, platform_adapter
@@ -464,7 +464,7 @@ class TestChatCompletionsEndpoint:

    @pytest.mark.asyncio
    async def test_stream_includes_tool_progress(self, adapter):
-        """tool_progress_callback fires → progress appears in the SSE stream."""
+        """tool_progress_callback fires → progress appears as custom SSE event, not in delta.content."""
        import asyncio

        app = _create_app(adapter)
@@ -495,8 +495,26 @@ class TestChatCompletionsEndpoint:
                assert resp.status == 200
                body = await resp.text()
                assert "[DONE]" in body
-                # Tool progress message must appear in the stream
-                assert "ls -la" in body
+                # Tool progress must appear as a custom SSE event, not in
+                # delta.content — prevents model from learning to imitate
+                # markers instead of calling tools (#6972).
+                assert "event: hermes.tool.progress" in body
+                assert '"tool": "terminal"' in body
+                assert '"label": "ls -la"' in body
+                # The progress marker must NOT appear inside any
+                # chat.completion.chunk delta.content field.
+                import json as _json
+                for line in body.splitlines():
+                    if line.startswith("data: ") and line.strip() != "data: [DONE]":
+                        try:
+                            chunk = _json.loads(line[len("data: "):])
+                        except _json.JSONDecodeError:
+                            continue
+                        if chunk.get("object") == "chat.completion.chunk":
+                            for choice in chunk.get("choices", []):
+                                content = choice.get("delta", {}).get("content", "")
+                                # Tool emoji markers must never leak into content
+                                assert "ls -la" not in content or content == "Here are the files."
                # Final content must also be present
                assert "Here are the files." in body

@@ -532,10 +550,12 @@ class TestChatCompletionsEndpoint:
                )
                assert resp.status == 200
                body = await resp.text()
-                # Internal _thinking event should NOT appear
+                # Internal _thinking event should NOT appear anywhere
                assert "some internal state" not in body
-                # Real tool progress should appear
-                assert "Python docs" in body
+                # Real tool progress should appear as custom SSE event
+                assert "event: hermes.tool.progress" in body
+                assert '"tool": "web_search"' in body
+                assert '"label": "Python docs"' in body

    @pytest.mark.asyncio
    async def test_no_user_message_returns_400(self, adapter):
@@ -0,0 +1,132 @@
+"""Tests for the API server bind-address startup guard.
+
+Validates that is_network_accessible() correctly classifies addresses and
+that connect() refuses to start on non-loopback without API_SERVER_KEY.
+"""
+
+import socket
+from unittest.mock import AsyncMock, patch
+
+import pytest
+
+from gateway.config import PlatformConfig
+from gateway.platforms.api_server import APIServerAdapter
+from gateway.platforms.base import is_network_accessible
+
+
+# ---------------------------------------------------------------------------
+# Unit tests: is_network_accessible()
+# ---------------------------------------------------------------------------
+
+
+class TestIsNetworkAccessible:
+    """Direct tests for the address classification helper."""
+
+    # -- Loopback (safe, should return False) --
+
+    def test_ipv4_loopback(self):
+        assert is_network_accessible("127.0.0.1") is False
+
+    def test_ipv6_loopback(self):
+        assert is_network_accessible("::1") is False
+
+    def test_ipv4_mapped_loopback(self):
+        # ::ffff:127.0.0.1 — Python's is_loopback returns False for mapped
+        # addresses; the helper must unwrap and check ipv4_mapped.
+        assert is_network_accessible("::ffff:127.0.0.1") is False
+
+    # -- Network-accessible (should return True) --
+
+    def test_ipv4_wildcard(self):
+        assert is_network_accessible("0.0.0.0") is True
+
+    def test_ipv6_wildcard(self):
+        # This is the bypass vector that the string-based check missed.
+        assert is_network_accessible("::") is True
+
+    def test_ipv4_mapped_unspecified(self):
+        assert is_network_accessible("::ffff:0.0.0.0") is True
+
+    def test_private_ipv4(self):
+        assert is_network_accessible("10.0.0.1") is True
+
+    def test_private_ipv4_class_c(self):
+        assert is_network_accessible("192.168.1.1") is True
+
+    def test_public_ipv4(self):
+        assert is_network_accessible("8.8.8.8") is True
+
+    # -- Hostname resolution --
+
+    def test_localhost_resolves_to_loopback(self):
+        loopback_result = [
+            (socket.AF_INET, socket.SOCK_STREAM, 0, "", ("127.0.0.1", 0)),
+        ]
+        with patch("gateway.platforms.base._socket.getaddrinfo", return_value=loopback_result):
+            assert is_network_accessible("localhost") is False
+
+    def test_hostname_resolving_to_non_loopback(self):
+        non_loopback_result = [
+            (socket.AF_INET, socket.SOCK_STREAM, 0, "", ("10.0.0.1", 0)),
+        ]
+        with patch("gateway.platforms.base._socket.getaddrinfo", return_value=non_loopback_result):
+            assert is_network_accessible("my-server.local") is True
+
+    def test_hostname_mixed_resolution(self):
+        """If a hostname resolves to both loopback and non-loopback, it's
+        network-accessible (any non-loopback address is enough)."""
+        mixed_result = [
+            (socket.AF_INET, socket.SOCK_STREAM, 0, "", ("127.0.0.1", 0)),
+            (socket.AF_INET, socket.SOCK_STREAM, 0, "", ("10.0.0.1", 0)),
+        ]
+        with patch("gateway.platforms.base._socket.getaddrinfo", return_value=mixed_result):
+            assert is_network_accessible("dual-host.local") is True
+
+    def test_dns_failure_fails_closed(self):
+        """Unresolvable hostnames should require an API key (fail closed)."""
+        with patch(
+            "gateway.platforms.base._socket.getaddrinfo",
+            side_effect=socket.gaierror("Name resolution failed"),
+        ):
+            assert is_network_accessible("nonexistent.invalid") is True
+
+
+# ---------------------------------------------------------------------------
+# Integration tests: connect() startup guard
+# ---------------------------------------------------------------------------
+
+
+class TestConnectBindGuard:
+    """Verify that connect() refuses dangerous configurations."""
+
+    @pytest.mark.asyncio
+    async def test_refuses_ipv4_wildcard_without_key(self):
+        adapter = APIServerAdapter(PlatformConfig(enabled=True, extra={"host": "0.0.0.0"}))
+        result = await adapter.connect()
+        assert result is False
+
+    @pytest.mark.asyncio
+    async def test_refuses_ipv6_wildcard_without_key(self):
+        adapter = APIServerAdapter(PlatformConfig(enabled=True, extra={"host": "::"}))
+        result = await adapter.connect()
+        assert result is False
+
+    def test_allows_loopback_without_key(self):
+        """Loopback with no key should pass the guard."""
+        adapter = APIServerAdapter(PlatformConfig(enabled=True, extra={"host": "127.0.0.1"}))
+        assert adapter._api_key == ""
+        # The guard condition: is_network_accessible(host) AND NOT api_key
+        # For loopback, is_network_accessible is False so the guard does not block.
+        assert is_network_accessible(adapter._host) is False
+
+    @pytest.mark.asyncio
+    async def test_allows_wildcard_with_key(self):
+        """Non-loopback with a key should pass the guard."""
+        adapter = APIServerAdapter(
+            PlatformConfig(enabled=True, extra={"host": "0.0.0.0", "key": "sk-test"})
+        )
+        # The guard checks: is_network_accessible(host) AND NOT api_key
+        # With a key set, the guard should not block.
+        assert adapter._api_key == "sk-test"
+        assert is_network_accessible("0.0.0.0") is True
+        # Combined: the guard condition is False (key is set), so it passes
@@ -0,0 +1,121 @@
+"""Tests for gateway /compress user-facing messaging."""
+
+from datetime import datetime
+from unittest.mock import MagicMock, patch
+
+import pytest
+
+from gateway.config import GatewayConfig, Platform, PlatformConfig
+from gateway.platforms.base import MessageEvent
+from gateway.session import SessionEntry, SessionSource, build_session_key
+
+
+def _make_source() -> SessionSource:
+    return SessionSource(
+        platform=Platform.TELEGRAM,
+        user_id="u1",
+        chat_id="c1",
+        user_name="tester",
+        chat_type="dm",
+    )
+
+
+def _make_event(text: str = "/compress") -> MessageEvent:
+    return MessageEvent(text=text, source=_make_source(), message_id="m1")
+
+
+def _make_history() -> list[dict[str, str]]:
+    return [
+        {"role": "user", "content": "one"},
+        {"role": "assistant", "content": "two"},
+        {"role": "user", "content": "three"},
+        {"role": "assistant", "content": "four"},
+    ]
+
+
+def _make_runner(history: list[dict[str, str]]):
+    from gateway.run import GatewayRunner
+
+    runner = object.__new__(GatewayRunner)
+    runner.config = GatewayConfig(
+        platforms={Platform.TELEGRAM: PlatformConfig(enabled=True, token="***")}
+    )
+    session_entry = SessionEntry(
+        session_key=build_session_key(_make_source()),
+        session_id="sess-1",
+        created_at=datetime.now(),
+        updated_at=datetime.now(),
+        platform=Platform.TELEGRAM,
+        chat_type="dm",
+    )
+    runner.session_store = MagicMock()
+    runner.session_store.get_or_create_session.return_value = session_entry
+    runner.session_store.load_transcript.return_value = history
+    runner.session_store.rewrite_transcript = MagicMock()
+    runner.session_store.update_session = MagicMock()
+    runner.session_store._save = MagicMock()
+    return runner
+
+
+@pytest.mark.asyncio
+async def test_compress_command_reports_noop_without_success_banner():
+    history = _make_history()
+    runner = _make_runner(history)
+    agent_instance = MagicMock()
+    agent_instance.context_compressor.protect_first_n = 0
+    agent_instance.context_compressor._align_boundary_forward.return_value = 0
+    agent_instance.context_compressor._find_tail_cut_by_tokens.return_value = 2
+    agent_instance.session_id = "sess-1"
+    agent_instance._compress_context.return_value = (list(history), "")
+
+    def _estimate(messages):
+        assert messages == history
+        return 100
+
+    with (
+        patch("gateway.run._resolve_runtime_agent_kwargs", return_value={"api_key": "test-key"}),
+        patch("gateway.run._resolve_gateway_model", return_value="test-model"),
+        patch("run_agent.AIAgent", return_value=agent_instance),
+        patch("agent.model_metadata.estimate_messages_tokens_rough", side_effect=_estimate),
+    ):
+        result = await runner._handle_compress_command(_make_event())
+
+    assert "No changes from compression" in result
+    assert "Compressed:" not in result
+    assert "Rough transcript estimate: ~100 tokens (unchanged)" in result
+
+
+@pytest.mark.asyncio
+async def test_compress_command_explains_when_token_estimate_rises():
+    history = _make_history()
+    compressed = [
+        history[0],
+        {"role": "assistant", "content": "Dense summary that still counts as more tokens."},
+        history[-1],
+    ]
+    runner = _make_runner(history)
+    agent_instance = MagicMock()
+    agent_instance.context_compressor.protect_first_n = 0
+    agent_instance.context_compressor._align_boundary_forward.return_value = 0
+    agent_instance.context_compressor._find_tail_cut_by_tokens.return_value = 2
+    agent_instance.session_id = "sess-1"
+    agent_instance._compress_context.return_value = (compressed, "")
+
+    def _estimate(messages):
+        if messages == history:
+            return 100
+        if messages == compressed:
+            return 120
+        raise AssertionError(f"unexpected transcript: {messages!r}")
+
+    with (
+        patch("gateway.run._resolve_runtime_agent_kwargs", return_value={"api_key": "test-key"}),
+        patch("gateway.run._resolve_gateway_model", return_value="test-model"),
+        patch("run_agent.AIAgent", return_value=agent_instance),
+        patch("agent.model_metadata.estimate_messages_tokens_rough", side_effect=_estimate),
+    ):
+        result = await runner._handle_compress_command(_make_event())
+
+    assert "Compressed: 4 → 3 messages" in result
+    assert "Rough transcript estimate: ~100 → ~120 tokens" in result
+    assert "denser summaries" in result
@@ -0,0 +1,44 @@
+"""Tests for fallback-eviction gating on failed runs (#7130).
+
+When a run fails, the gateway must NOT evict the cached agent — doing so
+forces MCP reinit on the next message, creating a CPU-burning restart loop.
+Eviction should only happen on successful runs where fallback activated.
+"""
+
+import sys
+from pathlib import Path
+from unittest.mock import MagicMock, patch
+
+import pytest
+
+sys.path.insert(0, str(Path(__file__).resolve().parent.parent.parent))
+
+
+class TestFallbackEvictionGating:
+    """The fallback-eviction code path should skip eviction on failed runs."""
+
+    def test_failed_run_does_not_evict_cached_agent(self):
+        """When result has failed=True, the cached agent should NOT be evicted."""
+        # The fix: `and not _run_failed` guard on the eviction check.
+        # Simulate the variables that the eviction block uses.
+        result = {"failed": True, "final_response": None, "error": "400 invalid model"}
+        _run_failed = result.get("failed") if result else False
+        assert _run_failed is True, "Failed run should be detected"
+
+    def test_successful_run_allows_eviction(self):
+        """When result is successful, fallback eviction should proceed."""
+        result = {"completed": True, "final_response": "Hello!", "failed": False}
+        _run_failed = result.get("failed") if result else False
+        assert _run_failed is False, "Successful run should not be flagged"
+
+    def test_none_result_treated_as_not_failed(self):
+        """When result is None (edge case), treat as not-failed."""
+        result = None
+        _run_failed = result.get("failed") if result else False
+        assert _run_failed is False
+
+    def test_missing_failed_key_treated_as_not_failed(self):
+        """When result dict doesn't have 'failed' key, treat as not-failed."""
+        result = {"completed": True, "final_response": "Hello!"}
+        _run_failed = result.get("failed") if result else False
+        assert not _run_failed, "Missing 'failed' key should be falsy"
@@ -3,43 +3,15 @@ from unittest.mock import AsyncMock, MagicMock, patch

 import pytest

-from gateway.config import GatewayConfig, Platform, PlatformConfig
-from gateway.platforms.base import BasePlatformAdapter, MessageEvent, SendResult
-from gateway.run import GatewayRunner
-from gateway.session import SessionSource, build_session_key
-
-
-class StubAdapter(BasePlatformAdapter):
-    def __init__(self):
-        super().__init__(PlatformConfig(enabled=True, token="***"), Platform.TELEGRAM)
-
-    async def connect(self):
-        return True
-
-    async def disconnect(self):
-        return None
-
-    async def send(self, chat_id, content, reply_to=None, metadata=None):
-        return SendResult(success=True, message_id="1")
-
-    async def send_typing(self, chat_id, metadata=None):
-        return None
-
-    async def get_chat_info(self, chat_id):
-        return {"id": chat_id}
-
-
-def _source(chat_id="123456", chat_type="dm"):
-    return SessionSource(
-        platform=Platform.TELEGRAM,
-        chat_id=chat_id,
-        chat_type=chat_type,
-    )
+from gateway.platforms.base import MessageEvent
+from gateway.restart import GATEWAY_SERVICE_RESTART_EXIT_CODE
+from gateway.session import build_session_key
+from tests.gateway.restart_test_helpers import make_restart_runner, make_restart_source


@pytest.mark.asyncio
 async def test_cancel_background_tasks_cancels_inflight_message_processing():
-    adapter = StubAdapter()
+    _runner, adapter = make_restart_runner()
    release = asyncio.Event()

    async def block_forever(_event):
@@ -47,7 +19,7 @@ async def test_cancel_background_tasks_cancels_inflight_message_processing():
        return None

    adapter.set_message_handler(block_forever)
-    event = MessageEvent(text="work", source=_source(), message_id="1")
+    event = MessageEvent(text="work", source=make_restart_source(), message_id="1")

    await adapter.handle_message(event)
    await asyncio.sleep(0)
@@ -65,17 +37,11 @@ async def test_cancel_background_tasks_cancels_inflight_message_processing():

@pytest.mark.asyncio
 async def test_gateway_stop_interrupts_running_agents_and_cancels_adapter_tasks():
-    runner = object.__new__(GatewayRunner)
-    runner.config = GatewayConfig(platforms={Platform.TELEGRAM: PlatformConfig(enabled=True, token="***")})
-    runner._running = True
-    runner._shutdown_event = asyncio.Event()
-    runner._exit_reason = None
+    runner, adapter = make_restart_runner()
    runner._pending_messages = {"session": "pending text"}
    runner._pending_approvals = {"session": {"command": "rm -rf /tmp/x"}}
-    runner._background_tasks = set()
-    runner._shutdown_all_gateway_honcho = lambda: None
+    runner._restart_drain_timeout = 0.0

-    adapter = StubAdapter()
    release = asyncio.Event()

    async def block_forever(_event):
@@ -83,7 +49,7 @@ async def test_gateway_stop_interrupts_running_agents_and_cancels_adapter_tasks(
        return None

    adapter.set_message_handler(block_forever)
-    event = MessageEvent(text="work", source=_source(), message_id="1")
+    event = MessageEvent(text="work", source=make_restart_source(), message_id="1")
    await adapter.handle_message(event)
    await asyncio.sleep(0)

@@ -93,7 +59,6 @@ async def test_gateway_stop_interrupts_running_agents_and_cancels_adapter_tasks(
    session_key = build_session_key(event.source)
    running_agent = MagicMock()
    runner._running_agents = {session_key: running_agent}
-    runner.adapters = {Platform.TELEGRAM: adapter}

    with patch("gateway.status.remove_pid_file"), patch("gateway.status.write_runtime_status"):
        await runner.stop()
@@ -105,3 +70,78 @@ async def test_gateway_stop_interrupts_running_agents_and_cancels_adapter_tasks(
    assert runner._pending_messages == {}
    assert runner._pending_approvals == {}
    assert runner._shutdown_event.is_set() is True
+
+
+@pytest.mark.asyncio
+async def test_gateway_stop_drains_running_agents_before_disconnect():
+    runner, adapter = make_restart_runner()
+    disconnect_mock = AsyncMock()
+    adapter.disconnect = disconnect_mock
+
+    running_agent = MagicMock()
+    runner._running_agents = {"session": running_agent}
+
+    async def finish_agent():
+        await asyncio.sleep(0.05)
+        runner._running_agents.clear()
+
+    asyncio.create_task(finish_agent())
+
+    with patch("gateway.status.remove_pid_file"), patch("gateway.status.write_runtime_status"):
+        await runner.stop()
+
+    running_agent.interrupt.assert_not_called()
+    disconnect_mock.assert_awaited_once()
+    assert runner._shutdown_event.is_set() is True
+
+
+@pytest.mark.asyncio
+async def test_gateway_stop_interrupts_after_drain_timeout():
+    runner, adapter = make_restart_runner()
+    runner._restart_drain_timeout = 0.05
+
+    disconnect_mock = AsyncMock()
+    adapter.disconnect = disconnect_mock
+
+    running_agent = MagicMock()
+    runner._running_agents = {"session": running_agent}
+
+    with patch("gateway.status.remove_pid_file"), patch("gateway.status.write_runtime_status"):
+        await runner.stop()
+
+    running_agent.interrupt.assert_called_once_with("Gateway shutting down")
+    disconnect_mock.assert_awaited_once()
+    assert runner._shutdown_event.is_set() is True
+
+
+@pytest.mark.asyncio
+async def test_gateway_stop_service_restart_sets_named_exit_code():
+    runner, adapter = make_restart_runner()
+    adapter.disconnect = AsyncMock()
+
+    with patch("gateway.status.remove_pid_file"), patch("gateway.status.write_runtime_status"):
+        await runner.stop(restart=True, service_restart=True)
+
+    assert runner._exit_code == GATEWAY_SERVICE_RESTART_EXIT_CODE
+
+
+@pytest.mark.asyncio
+async def test_drain_active_agents_throttles_status_updates():
+    runner, _adapter = make_restart_runner()
+    runner._update_runtime_status = MagicMock()
+
+    runner._running_agents = {"a": MagicMock(), "b": MagicMock()}
+
+    async def finish_agents():
+        await asyncio.sleep(0.12)
+        runner._running_agents.pop("a")
+        await asyncio.sleep(0.12)
+        runner._running_agents.clear()
+
+    task = asyncio.create_task(finish_agents())
+    await runner._drain_active_agents(1.0)
+    await task
+
+    # Start, one count-change update, and final update. Allow one extra update
+    # if the loop observes the zero-agent state before exiting.
+    assert 3 <= runner._update_runtime_status.call_count <= 4
@@ -11,24 +11,10 @@ import pytest
 from gateway.config import PlatformConfig


-def _ensure_nio_mock():
-    """Install a mock nio module when matrix-nio isn't available."""
-    if "nio" in sys.modules and hasattr(sys.modules["nio"], "__file__"):
-        return
-    nio_mod = MagicMock()
-    nio_mod.MegolmEvent = type("MegolmEvent", (), {})
-    nio_mod.RoomMessageText = type("RoomMessageText", (), {})
-    nio_mod.RoomMessageImage = type("RoomMessageImage", (), {})
-    nio_mod.RoomMessageAudio = type("RoomMessageAudio", (), {})
-    nio_mod.RoomMessageVideo = type("RoomMessageVideo", (), {})
-    nio_mod.RoomMessageFile = type("RoomMessageFile", (), {})
-    nio_mod.DownloadResponse = type("DownloadResponse", (), {})
-    nio_mod.MemoryDownloadResponse = type("MemoryDownloadResponse", (), {})
-    nio_mod.InviteMemberEvent = type("InviteMemberEvent", (), {})
-    sys.modules.setdefault("nio", nio_mod)
-
-
-_ensure_nio_mock()
+# The matrix adapter module is importable without mautrix installed
+# (module-level imports use try/except with stubs).  No need for
+# module-level mock installation — tests that call adapter methods
+# needing real mautrix APIs mock them individually.


 def _make_adapter(tmp_path=None):
@@ -50,24 +36,25 @@ def _make_adapter(tmp_path=None):
    return adapter


-def _make_room(room_id="!room1:example.org", member_count=5, is_dm=False):
-    """Create a fake Matrix room."""
-    room = SimpleNamespace(
-        room_id=room_id,
-        member_count=member_count,
-        users={},
-    )
-    return room
+def _set_dm(adapter, room_id="!room1:example.org", is_dm=True):
+    """Mark a room as DM (or not) in the adapter's cache."""
+    adapter._dm_rooms[room_id] = is_dm


 def _make_event(
    body,
    sender="@alice:example.org",
    event_id="$evt1",
+    room_id="!room1:example.org",
    formatted_body=None,
    thread_id=None,
 ):
-    """Create a fake RoomMessageText event."""
+    """Create a fake room message event.
+
+    The mautrix adapter reads ``event.room_id``, ``event.sender``,
+    ``event.event_id``, ``event.timestamp``, and ``event.content``
+    (a dict with ``msgtype``, ``body``, etc.).
+    """
    content = {"body": body, "msgtype": "m.text"}
    if formatted_body:
        content["formatted_body"] = formatted_body
@@ -83,9 +70,9 @@ def _make_event(
    return SimpleNamespace(
        sender=sender,
        event_id=event_id,
-        server_timestamp=int(time.time() * 1000),
-        body=body,
-        source={"content": content},
+        room_id=room_id,
+        timestamp=int(time.time() * 1000),
+        content=content,
    )


@@ -152,10 +139,9 @@ async def test_require_mention_default_ignores_unmentioned(monkeypatch):
    monkeypatch.delenv("MATRIX_AUTO_THREAD", raising=False)

    adapter = _make_adapter()
-    room = _make_room()
    event = _make_event("hello everyone")

-    await adapter._on_room_message(room, event)
+    await adapter._on_room_message(event)
    adapter.handle_message.assert_not_awaited()


@@ -167,10 +153,9 @@ async def test_require_mention_default_processes_mentioned(monkeypatch):
    monkeypatch.setenv("MATRIX_AUTO_THREAD", "false")

    adapter = _make_adapter()
-    room = _make_room()
    event = _make_event("@hermes:example.org help me")

-    await adapter._on_room_message(room, event)
+    await adapter._on_room_message(event)
    adapter.handle_message.assert_awaited_once()
    msg = adapter.handle_message.await_args.args[0]
    assert msg.text == "help me"
@@ -184,11 +169,10 @@ async def test_require_mention_html_pill(monkeypatch):
    monkeypatch.setenv("MATRIX_AUTO_THREAD", "false")

    adapter = _make_adapter()
-    room = _make_room()
    formatted = '<a href="https://matrix.to/#/@hermes:example.org">Hermes</a> help'
    event = _make_event("Hermes help", formatted_body=formatted)

-    await adapter._on_room_message(room, event)
+    await adapter._on_room_message(event)
    adapter.handle_message.assert_awaited_once()


@@ -200,11 +184,11 @@ async def test_require_mention_dm_always_responds(monkeypatch):
    monkeypatch.setenv("MATRIX_AUTO_THREAD", "false")

    adapter = _make_adapter()
-    # member_count=2 triggers DM detection
-    room = _make_room(member_count=2)
+    # Mark the room as a DM via the adapter's cache.
+    _set_dm(adapter)
    event = _make_event("hello without mention")

-    await adapter._on_room_message(room, event)
+    await adapter._on_room_message(event)
    adapter.handle_message.assert_awaited_once()


@@ -216,10 +200,10 @@ async def test_dm_strips_mention(monkeypatch):
    monkeypatch.setenv("MATRIX_AUTO_THREAD", "false")

    adapter = _make_adapter()
-    room = _make_room(member_count=2)
+    _set_dm(adapter)
    event = _make_event("@hermes:example.org help me")

-    await adapter._on_room_message(room, event)
+    await adapter._on_room_message(event)
    adapter.handle_message.assert_awaited_once()
    msg = adapter.handle_message.await_args.args[0]
    assert msg.text == "help me"
@@ -233,10 +217,9 @@ async def test_bare_mention_passes_empty_string(monkeypatch):
    monkeypatch.setenv("MATRIX_AUTO_THREAD", "false")

    adapter = _make_adapter()
-    room = _make_room()
    event = _make_event("@hermes:example.org")

-    await adapter._on_room_message(room, event)
+    await adapter._on_room_message(event)
    adapter.handle_message.assert_awaited_once()
    msg = adapter.handle_message.await_args.args[0]
    assert msg.text == ""
@@ -250,10 +233,9 @@ async def test_require_mention_free_response_room(monkeypatch):
    monkeypatch.setenv("MATRIX_AUTO_THREAD", "false")

    adapter = _make_adapter()
-    room = _make_room(room_id="!room1:example.org")
-    event = _make_event("hello without mention")
+    event = _make_event("hello without mention", room_id="!room1:example.org")

-    await adapter._on_room_message(room, event)
+    await adapter._on_room_message(event)
    adapter.handle_message.assert_awaited_once()


@@ -267,10 +249,9 @@ async def test_require_mention_bot_participated_thread(monkeypatch):
    adapter = _make_adapter()
    adapter._bot_participated_threads.add("$thread1")

-    room = _make_room()
    event = _make_event("hello without mention", thread_id="$thread1")

-    await adapter._on_room_message(room, event)
+    await adapter._on_room_message(event)
    adapter.handle_message.assert_awaited_once()


@@ -282,10 +263,9 @@ async def test_require_mention_disabled(monkeypatch):
    monkeypatch.setenv("MATRIX_AUTO_THREAD", "false")

    adapter = _make_adapter()
-    room = _make_room()
    event = _make_event("hello without mention")

-    await adapter._on_room_message(room, event)
+    await adapter._on_room_message(event)
    adapter.handle_message.assert_awaited_once()
    msg = adapter.handle_message.await_args.args[0]
    assert msg.text == "hello without mention"
@@ -303,10 +283,9 @@ async def test_auto_thread_default_creates_thread(monkeypatch):
    monkeypatch.delenv("MATRIX_AUTO_THREAD", raising=False)

    adapter = _make_adapter()
-    room = _make_room()
    event = _make_event("hello", event_id="$msg1")

-    await adapter._on_room_message(room, event)
+    await adapter._on_room_message(event)
    adapter.handle_message.assert_awaited_once()
    msg = adapter.handle_message.await_args.args[0]
    assert msg.source.thread_id == "$msg1"
@@ -320,10 +299,9 @@ async def test_auto_thread_preserves_existing_thread(monkeypatch):

    adapter = _make_adapter()
    adapter._bot_participated_threads.add("$thread_root")
-    room = _make_room()
    event = _make_event("reply in thread", thread_id="$thread_root")

-    await adapter._on_room_message(room, event)
+    await adapter._on_room_message(event)
    adapter.handle_message.assert_awaited_once()
    msg = adapter.handle_message.await_args.args[0]
    assert msg.source.thread_id == "$thread_root"
@@ -336,10 +314,10 @@ async def test_auto_thread_skips_dm(monkeypatch):
    monkeypatch.delenv("MATRIX_AUTO_THREAD", raising=False)

    adapter = _make_adapter()
-    room = _make_room(member_count=2)
+    _set_dm(adapter)
    event = _make_event("hello dm", event_id="$dm1")

-    await adapter._on_room_message(room, event)
+    await adapter._on_room_message(event)
    adapter.handle_message.assert_awaited_once()
    msg = adapter.handle_message.await_args.args[0]
    assert msg.source.thread_id is None
@@ -352,10 +330,9 @@ async def test_auto_thread_disabled(monkeypatch):
    monkeypatch.setenv("MATRIX_AUTO_THREAD", "false")

    adapter = _make_adapter()
-    room = _make_room()
    event = _make_event("hello", event_id="$msg1")

-    await adapter._on_room_message(room, event)
+    await adapter._on_room_message(event)
    adapter.handle_message.assert_awaited_once()
    msg = adapter.handle_message.await_args.args[0]
    assert msg.source.thread_id is None
@@ -368,11 +345,10 @@ async def test_auto_thread_tracks_participation(monkeypatch):
    monkeypatch.delenv("MATRIX_AUTO_THREAD", raising=False)

    adapter = _make_adapter()
-    room = _make_room()
    event = _make_event("hello", event_id="$msg1")

    with patch.object(adapter, "_save_participated_threads"):
-        await adapter._on_room_message(room, event)
+        await adapter._on_room_message(event)

    assert "$msg1" in adapter._bot_participated_threads

@@ -385,8 +361,9 @@ async def test_auto_thread_tracks_participation(monkeypatch):
 class TestThreadPersistence:
    def test_empty_state_file(self, tmp_path, monkeypatch):
        """No state file → empty set."""
+        from gateway.platforms.matrix import MatrixAdapter
        monkeypatch.setattr(
-            "gateway.platforms.matrix.MatrixAdapter._thread_state_path",
+            MatrixAdapter, "_thread_state_path",
            staticmethod(lambda: tmp_path / "matrix_threads.json"),
        )
        adapter = _make_adapter()
@@ -395,9 +372,10 @@ class TestThreadPersistence:

    def test_track_thread_persists(self, tmp_path, monkeypatch):
        """_track_thread writes to disk."""
+        from gateway.platforms.matrix import MatrixAdapter
        state_path = tmp_path / "matrix_threads.json"
        monkeypatch.setattr(
-            "gateway.platforms.matrix.MatrixAdapter._thread_state_path",
+            MatrixAdapter, "_thread_state_path",
            staticmethod(lambda: state_path),
        )
        adapter = _make_adapter()
@@ -408,10 +386,11 @@ class TestThreadPersistence:

    def test_threads_survive_reload(self, tmp_path, monkeypatch):
        """Persisted threads are loaded by a new adapter instance."""
+        from gateway.platforms.matrix import MatrixAdapter
        state_path = tmp_path / "matrix_threads.json"
        state_path.write_text(json.dumps(["$t1", "$t2"]))
        monkeypatch.setattr(
-            "gateway.platforms.matrix.MatrixAdapter._thread_state_path",
+            MatrixAdapter, "_thread_state_path",
            staticmethod(lambda: state_path),
        )
        adapter = _make_adapter()
@@ -420,9 +399,10 @@ class TestThreadPersistence:

    def test_cap_max_tracked_threads(self, tmp_path, monkeypatch):
        """Thread set is trimmed to _MAX_TRACKED_THREADS."""
+        from gateway.platforms.matrix import MatrixAdapter
        state_path = tmp_path / "matrix_threads.json"
        monkeypatch.setattr(
-            "gateway.platforms.matrix.MatrixAdapter._thread_state_path",
+            MatrixAdapter, "_thread_state_path",
            staticmethod(lambda: state_path),
        )
        adapter = _make_adapter()
@@ -436,6 +416,95 @@ class TestThreadPersistence:
        assert len(data) == 5


+# ---------------------------------------------------------------------------
+# DM mention-thread feature
+# ---------------------------------------------------------------------------
+
+
+@pytest.mark.asyncio
+async def test_dm_mention_thread_disabled_by_default(monkeypatch):
+    """Default (dm_mention_threads=false): DM with mention should NOT create a thread."""
+    monkeypatch.delenv("MATRIX_DM_MENTION_THREADS", raising=False)
+    monkeypatch.setenv("MATRIX_AUTO_THREAD", "false")
+
+    adapter = _make_adapter()
+    _set_dm(adapter)
+    event = _make_event("@hermes:example.org help me", event_id="$dm1")
+
+    await adapter._on_room_message(event)
+    adapter.handle_message.assert_awaited_once()
+    msg = adapter.handle_message.await_args.args[0]
+    assert msg.source.thread_id is None
+
+
+@pytest.mark.asyncio
+async def test_dm_mention_thread_creates_thread(monkeypatch):
+    """MATRIX_DM_MENTION_THREADS=true: DM with @mention creates a thread."""
+    monkeypatch.setenv("MATRIX_DM_MENTION_THREADS", "true")
+    monkeypatch.setenv("MATRIX_AUTO_THREAD", "false")
+
+    adapter = _make_adapter()
+    _set_dm(adapter)
+    event = _make_event("@hermes:example.org help me", event_id="$dm1")
+
+    with patch.object(adapter, "_save_participated_threads"):
+        await adapter._on_room_message(event)
+
+    adapter.handle_message.assert_awaited_once()
+    msg = adapter.handle_message.await_args.args[0]
+    assert msg.source.thread_id == "$dm1"
+    assert msg.text == "help me"
+
+
+@pytest.mark.asyncio
+async def test_dm_mention_thread_no_mention_no_thread(monkeypatch):
+    """MATRIX_DM_MENTION_THREADS=true: DM without mention does NOT create a thread."""
+    monkeypatch.setenv("MATRIX_DM_MENTION_THREADS", "true")
+    monkeypatch.setenv("MATRIX_AUTO_THREAD", "false")
+
+    adapter = _make_adapter()
+    _set_dm(adapter)
+    event = _make_event("hello without mention", event_id="$dm1")
+
+    await adapter._on_room_message(event)
+    adapter.handle_message.assert_awaited_once()
+    msg = adapter.handle_message.await_args.args[0]
+    assert msg.source.thread_id is None
+
+
+@pytest.mark.asyncio
+async def test_dm_mention_thread_preserves_existing_thread(monkeypatch):
+    """MATRIX_DM_MENTION_THREADS=true: DM already in a thread keeps that thread_id."""
+    monkeypatch.setenv("MATRIX_DM_MENTION_THREADS", "true")
+    monkeypatch.setenv("MATRIX_AUTO_THREAD", "false")
+
+    adapter = _make_adapter()
+    _set_dm(adapter)
+    adapter._bot_participated_threads.add("$existing_thread")
+    event = _make_event("@hermes:example.org help me", thread_id="$existing_thread")
+
+    await adapter._on_room_message(event)
+    adapter.handle_message.assert_awaited_once()
+    msg = adapter.handle_message.await_args.args[0]
+    assert msg.source.thread_id == "$existing_thread"
+
+
+@pytest.mark.asyncio
+async def test_dm_mention_thread_tracks_participation(monkeypatch):
+    """DM mention-thread tracks the thread in _bot_participated_threads."""
+    monkeypatch.setenv("MATRIX_DM_MENTION_THREADS", "true")
+    monkeypatch.setenv("MATRIX_AUTO_THREAD", "false")
+
+    adapter = _make_adapter()
+    _set_dm(adapter)
+    event = _make_event("@hermes:example.org help", event_id="$dm1")
+
+    with patch.object(adapter, "_save_participated_threads"):
+        await adapter._on_room_message(event)
+
+    assert "$dm1" in adapter._bot_participated_threads
+
+
 # ---------------------------------------------------------------------------
 # YAML config bridge
 # ---------------------------------------------------------------------------
@@ -480,6 +549,25 @@ class TestMatrixConfigBridge:
        assert os.getenv("MATRIX_FREE_RESPONSE_ROOMS") == "!room1:example.org,!room2:example.org"
        assert os.getenv("MATRIX_AUTO_THREAD") == "false"

+    def test_yaml_bridge_sets_dm_mention_threads(self, monkeypatch, tmp_path):
+        """Matrix YAML dm_mention_threads should bridge to env var."""
+        monkeypatch.delenv("MATRIX_DM_MENTION_THREADS", raising=False)
+
+        import os
+        import yaml
+
+        yaml_content = {"matrix": {"dm_mention_threads": True}}
+        config_file = tmp_path / "config.yaml"
+        config_file.write_text(yaml.dump(yaml_content))
+
+        yaml_cfg = yaml.safe_load(config_file.read_text())
+        matrix_cfg = yaml_cfg.get("matrix", {})
+        if isinstance(matrix_cfg, dict):
+            if "dm_mention_threads" in matrix_cfg and not os.getenv("MATRIX_DM_MENTION_THREADS"):
+                monkeypatch.setenv("MATRIX_DM_MENTION_THREADS", str(matrix_cfg["dm_mention_threads"]).lower())
+
+        assert os.getenv("MATRIX_DM_MENTION_THREADS") == "true"
+
    def test_env_vars_take_precedence_over_yaml(self, monkeypatch):
        """Env vars should not be overwritten by YAML values."""
        monkeypatch.setenv("MATRIX_REQUIRE_MENTION", "true")
@@ -1,18 +1,23 @@
-"""Tests for Matrix voice message support (MSC3245)."""
+"""Tests for Matrix voice message support (MSC3245).
+
+Updated for the mautrix-python SDK (no more matrix-nio / nio imports).
+"""
 import io
+import os
+import tempfile
 import types
+from types import SimpleNamespace

 import pytest
 from unittest.mock import AsyncMock, MagicMock, patch

-# Try importing real nio; skip entire file if not available.
-# A MagicMock in sys.modules (from another test) is not the real package.
+# Try importing mautrix; skip entire file if not available.
 try:
-    import nio as _nio_probe
-    if not isinstance(_nio_probe, types.ModuleType) or not hasattr(_nio_probe, "__file__"):
-        pytest.skip("nio in sys.modules is a mock, not the real package", allow_module_level=True)
+    import mautrix as _mautrix_probe
+    if not isinstance(_mautrix_probe, types.ModuleType) or not hasattr(_mautrix_probe, "__file__"):
+        pytest.skip("mautrix in sys.modules is a mock, not the real package", allow_module_level=True)
 except ImportError:
-    pytest.skip("matrix-nio not installed", allow_module_level=True)
+    pytest.skip("mautrix not installed", allow_module_level=True)

 from gateway.platforms.base import MessageType

@@ -25,7 +30,7 @@ def _make_adapter():
    """Create a MatrixAdapter with mocked config."""
    from gateway.platforms.matrix import MatrixAdapter
    from gateway.config import PlatformConfig
-    
+
    config = PlatformConfig(
        enabled=True,
        token="***",
@@ -38,32 +43,26 @@ def _make_adapter():
    return adapter


-def _make_room(room_id: str = "!test:example.org", member_count: int = 2):
-    """Create a mock Matrix room."""
-    room = MagicMock()
-    room.room_id = room_id
-    room.member_count = member_count
-    return room
-
-
 def _make_audio_event(
    event_id: str = "$audio_event",
    sender: str = "@alice:example.org",
+    room_id: str = "!test:example.org",
    body: str = "Voice message",
    url: str = "mxc://example.org/abc123",
    is_voice: bool = False,
    mimetype: str = "audio/ogg",
-    timestamp: float = 9999999999000,  # ms
+    timestamp: int = 9999999999000,  # ms
 ):
    """
-    Create a mock RoomMessageAudio event that passes isinstance checks.
-    
+    Create a mock mautrix room message event.
+
+    In mautrix, the handler receives a single event object with attributes
+    ``room_id``, ``sender``, ``event_id``, ``timestamp``, and ``content``
+    (a dict-like or serializable object).
+
    Args:
-        is_voice: If True, adds org.matrix.msc3245.voice field to content
+        is_voice: If True, adds org.matrix.msc3245.voice field to content.
    """
-    import nio
-    
-    # Build the source dict that nio events expose via .source
    content = {
        "msgtype": "m.audio",
        "body": body,
@@ -72,39 +71,35 @@ def _make_audio_event(
            "mimetype": mimetype,
        },
    }
-    
+
    if is_voice:
        content["org.matrix.msc3245.voice"] = {}
-    
-    # Create a real nio RoomMessageAudio-like object
-    # We use MagicMock but configure __class__ to pass isinstance check
-    event = MagicMock(spec=nio.RoomMessageAudio)
-    event.event_id = event_id
-    event.sender = sender
-    event.body = body
-    event.url = url
-    event.server_timestamp = timestamp
-    event.source = {
-        "type": "m.room.message",
-        "content": content,
-    }
-    # For MIME type extraction - needs to be a dict
-    event.content = content
-    
+
+    event = SimpleNamespace(
+        event_id=event_id,
+        sender=sender,
+        room_id=room_id,
+        timestamp=timestamp,
+        content=content,
+    )
    return event


-def _make_download_response(body: bytes = b"fake audio data"):
-    """Create a mock nio.MemoryDownloadResponse."""
-    import nio
-    resp = MagicMock()
-    resp.body = body
-    resp.__class__ = nio.MemoryDownloadResponse
-    return resp
+def _make_state_store(member_count: int = 2):
+    """Create a mock state store with get_members/get_member support."""
+    store = MagicMock()
+    # get_members returns a list of member user IDs
+    members = [MagicMock() for _ in range(member_count)]
+    store.get_members = AsyncMock(return_value=members)
+    # get_member returns a single member info object
+    member = MagicMock()
+    member.displayname = "Alice"
+    store.get_member = AsyncMock(return_value=member)
+    return store


 # ---------------------------------------------------------------------------
-# Tests: MSC3245 Voice Detection (RED -> GREEN)
+# Tests: MSC3245 Voice Detection
 # ---------------------------------------------------------------------------

 class TestMatrixVoiceMessageDetection:
@@ -118,27 +113,28 @@ class TestMatrixVoiceMessageDetection:
        self.adapter._message_handler = AsyncMock()
        # Mock _mxc_to_http to return a fake HTTP URL
        self.adapter._mxc_to_http = lambda url: f"https://matrix.example.org/_matrix/media/v3/download/{url[6:]}"
-        # Mock client for authenticated download
+        # Mock client for authenticated download — download_media returns bytes directly
        self.adapter._client = MagicMock()
-        self.adapter._client.download = AsyncMock(return_value=_make_download_response())
+        self.adapter._client.download_media = AsyncMock(return_value=b"fake audio data")
+        # State store for DM detection
+        self.adapter._client.state_store = _make_state_store()

    @pytest.mark.asyncio
    async def test_voice_message_has_type_voice(self):
        """Voice messages (with MSC3245 field) should be MessageType.VOICE."""
-        room = _make_room()
        event = _make_audio_event(is_voice=True)
-        
+
        # Capture the MessageEvent passed to handle_message
        captured_event = None
-        
+
        async def capture(msg_event):
            nonlocal captured_event
            captured_event = msg_event
-        
+
        self.adapter.handle_message = capture
-        
-        await self.adapter._on_room_message_media(room, event)
-        
+
+        await self.adapter._on_room_message(event)
+
        assert captured_event is not None, "No event was captured"
        assert captured_event.message_type == MessageType.VOICE, \
            f"Expected MessageType.VOICE, got {captured_event.message_type}"
@@ -146,44 +142,43 @@ class TestMatrixVoiceMessageDetection:
    @pytest.mark.asyncio
    async def test_voice_message_has_local_path(self):
        """Voice messages should have a local cached path in media_urls."""
-        room = _make_room()
        event = _make_audio_event(is_voice=True)
-        
+
        captured_event = None
-        
+
        async def capture(msg_event):
            nonlocal captured_event
            captured_event = msg_event
-        
+
        self.adapter.handle_message = capture
-        
-        await self.adapter._on_room_message_media(room, event)
-        
+
+        await self.adapter._on_room_message(event)
+
        assert captured_event is not None
        assert captured_event.media_urls is not None
        assert len(captured_event.media_urls) > 0
        # Should be a local path, not an HTTP URL
        assert not captured_event.media_urls[0].startswith("http"), \
            f"media_urls should contain local path, got {captured_event.media_urls[0]}"
-        self.adapter._client.download.assert_awaited_once_with(mxc=event.url)
+        # download_media is called with a ContentURI wrapping the mxc URL
+        self.adapter._client.download_media.assert_awaited_once()
        assert captured_event.media_types == ["audio/ogg"]

    @pytest.mark.asyncio
    async def test_audio_without_msc3245_stays_audio_type(self):
        """Regular audio uploads (no MSC3245 field) should remain MessageType.AUDIO."""
-        room = _make_room()
        event = _make_audio_event(is_voice=False)  # NOT a voice message
-        
+
        captured_event = None
-        
+
        async def capture(msg_event):
            nonlocal captured_event
            captured_event = msg_event
-        
+
        self.adapter.handle_message = capture
-        
-        await self.adapter._on_room_message_media(room, event)
-        
+
+        await self.adapter._on_room_message(event)
+
        assert captured_event is not None
        assert captured_event.message_type == MessageType.AUDIO, \
            f"Expected MessageType.AUDIO for non-voice, got {captured_event.message_type}"
@@ -191,25 +186,24 @@ class TestMatrixVoiceMessageDetection:
    @pytest.mark.asyncio
    async def test_regular_audio_has_http_url(self):
        """Regular audio uploads should keep HTTP URL (not cached locally)."""
-        room = _make_room()
        event = _make_audio_event(is_voice=False)
-        
+
        captured_event = None
-        
+
        async def capture(msg_event):
            nonlocal captured_event
            captured_event = msg_event
-        
+
        self.adapter.handle_message = capture
-        
-        await self.adapter._on_room_message_media(room, event)
-        
+
+        await self.adapter._on_room_message(event)
+
        assert captured_event is not None
        assert captured_event.media_urls is not None
        # Should be HTTP URL, not local path
        assert captured_event.media_urls[0].startswith("http"), \
            f"Non-voice audio should have HTTP URL, got {captured_event.media_urls[0]}"
-        self.adapter._client.download.assert_not_awaited()
+        self.adapter._client.download_media.assert_not_awaited()
        assert captured_event.media_types == ["audio/ogg"]


@@ -224,29 +218,26 @@ class TestMatrixVoiceCacheFallback:
        self.adapter._message_handler = AsyncMock()
        self.adapter._mxc_to_http = lambda url: f"https://matrix.example.org/_matrix/media/v3/download/{url[6:]}"
        self.adapter._client = MagicMock()
+        self.adapter._client.state_store = _make_state_store()

    @pytest.mark.asyncio
    async def test_voice_cache_failure_falls_back_to_http_url(self):
-        """If caching fails, voice message should still be delivered with HTTP URL."""
-        room = _make_room()
+        """If caching fails (download returns None), voice message should still be delivered with HTTP URL."""
        event = _make_audio_event(is_voice=True)
-        
-        # Make download fail
-        import nio
-        error_resp = MagicMock()
-        error_resp.__class__ = nio.DownloadError
-        self.adapter._client.download = AsyncMock(return_value=error_resp)
-        
+
+        # download_media returns None on failure
+        self.adapter._client.download_media = AsyncMock(return_value=None)
+
        captured_event = None
-        
+
        async def capture(msg_event):
            nonlocal captured_event
            captured_event = msg_event
-        
+
        self.adapter.handle_message = capture
-        
-        await self.adapter._on_room_message_media(room, event)
-        
+
+        await self.adapter._on_room_message(event)
+
        assert captured_event is not None
        assert captured_event.media_urls is not None
        # Should fall back to HTTP URL
@@ -256,10 +247,9 @@ class TestMatrixVoiceCacheFallback:
    @pytest.mark.asyncio
    async def test_voice_cache_exception_falls_back_to_http_url(self):
        """Unexpected download exceptions should also fall back to HTTP URL."""
-        room = _make_room()
        event = _make_audio_event(is_voice=True)

-        self.adapter._client.download = AsyncMock(side_effect=RuntimeError("boom"))
+        self.adapter._client.download_media = AsyncMock(side_effect=RuntimeError("boom"))

        captured_event = None

@@ -269,7 +259,7 @@ class TestMatrixVoiceCacheFallback:

        self.adapter.handle_message = capture

-        await self.adapter._on_room_message_media(room, event)
+        await self.adapter._on_room_message(event)

        assert captured_event is not None
        assert captured_event.media_urls is not None
@@ -278,7 +268,7 @@ class TestMatrixVoiceCacheFallback:


 # ---------------------------------------------------------------------------
-# Tests: send_voice includes MSC3245 field (RED -> GREEN)
+# Tests: send_voice includes MSC3245 field
 # ---------------------------------------------------------------------------

 class TestMatrixSendVoiceMSC3245:
@@ -287,62 +277,52 @@ class TestMatrixSendVoiceMSC3245:
    def setup_method(self):
        self.adapter = _make_adapter()
        self.adapter._user_id = "@bot:example.org"
-        # Mock client with successful upload
+        # Mock client — upload_media returns a ContentURI string
        self.adapter._client = MagicMock()
        self.upload_call = None

-        async def mock_upload(*args, **kwargs):
-            self.upload_call = (args, kwargs)
-            import nio
-            resp = MagicMock()
-            resp.content_uri = "mxc://example.org/uploaded"
-            resp.__class__ = nio.UploadResponse
-            return resp, None
+        async def mock_upload_media(data, mime_type=None, filename=None, **kwargs):
+            self.upload_call = {"data": data, "mime_type": mime_type, "filename": filename}
+            return "mxc://example.org/uploaded"

-        self.adapter._client.upload = mock_upload
+        self.adapter._client.upload_media = mock_upload_media

    @pytest.mark.asyncio
-    async def test_send_voice_includes_msc3245_field(self):
+    @patch("mimetypes.guess_type", return_value=("audio/ogg", None))
+    async def test_send_voice_includes_msc3245_field(self, _mock_guess):
        """send_voice should include org.matrix.msc3245.voice in message content."""
-        import tempfile
-        import os
-        
        # Create a temp audio file
        with tempfile.NamedTemporaryFile(suffix=".ogg", delete=False) as f:
            f.write(b"fake audio data")
            temp_path = f.name
-        
+
        try:
-            # Capture the message content sent to room_send
+            # Capture the message content sent via send_message_event
            sent_content = None
-            
-            async def mock_room_send(room_id, event_type, content):
+
+            async def mock_send_message_event(room_id, event_type, content):
                nonlocal sent_content
                sent_content = content
-                resp = MagicMock()
-                resp.event_id = "$sent_event"
-                import nio
-                resp.__class__ = nio.RoomSendResponse
-                return resp
-            
-            self.adapter._client.room_send = mock_room_send
-            
+                # send_message_event returns an EventID string
+                return "$sent_event"
+
+            self.adapter._client.send_message_event = mock_send_message_event
+
            await self.adapter.send_voice(
                chat_id="!room:example.org",
                audio_path=temp_path,
                caption="Test voice",
            )
-            
+
            assert sent_content is not None, "No message was sent"
            assert "org.matrix.msc3245.voice" in sent_content, \
                f"MSC3245 voice field missing from content: {sent_content.keys()}"
            assert sent_content["msgtype"] == "m.audio"
            assert sent_content["info"]["mimetype"] == "audio/ogg"
-            assert self.upload_call is not None, "Expected upload() to be called"
-            args, kwargs = self.upload_call
-            assert isinstance(args[0], io.BytesIO)
-            assert kwargs["content_type"] == "audio/ogg"
-            assert kwargs["filename"].endswith(".ogg")
+            assert self.upload_call is not None, "Expected upload_media() to be called"
+            assert isinstance(self.upload_call["data"], bytes)
+            assert self.upload_call["mime_type"] == "audio/ogg"
+            assert self.upload_call["filename"].endswith(".ogg")

        finally:
            os.unlink(temp_path)
@@ -0,0 +1,160 @@
+import asyncio
+import shutil
+import subprocess
+from unittest.mock import AsyncMock, MagicMock
+
+import pytest
+
+import gateway.run as gateway_run
+from gateway.platforms.base import MessageEvent, MessageType
+from gateway.restart import DEFAULT_GATEWAY_RESTART_DRAIN_TIMEOUT
+from gateway.session import build_session_key
+from tests.gateway.restart_test_helpers import make_restart_runner, make_restart_source
+
+
+@pytest.mark.asyncio
+async def test_restart_command_while_busy_requests_drain_without_interrupt():
+    runner, _adapter = make_restart_runner()
+    runner.request_restart = MagicMock(return_value=True)
+    event = MessageEvent(
+        text="/restart",
+        message_type=MessageType.TEXT,
+        source=make_restart_source(),
+        message_id="m1",
+    )
+    session_key = build_session_key(event.source)
+    running_agent = MagicMock()
+    runner._running_agents[session_key] = running_agent
+
+    result = await runner._handle_message(event)
+
+    assert result == "⏳ Draining 1 active agent(s) before restart..."
+    running_agent.interrupt.assert_not_called()
+    runner.request_restart.assert_called_once_with(detached=True, via_service=False)
+
+
+@pytest.mark.asyncio
+async def test_drain_queue_mode_queues_follow_up_without_interrupt():
+    runner, adapter = make_restart_runner()
+    runner._draining = True
+    runner._restart_requested = True
+    runner._busy_input_mode = "queue"
+
+    event = MessageEvent(
+        text="follow up",
+        message_type=MessageType.TEXT,
+        source=make_restart_source(),
+        message_id="m2",
+    )
+    session_key = build_session_key(event.source)
+    adapter._active_sessions[session_key] = asyncio.Event()
+
+    await adapter.handle_message(event)
+
+    assert session_key in adapter._pending_messages
+    assert adapter._pending_messages[session_key].text == "follow up"
+    assert not adapter._active_sessions[session_key].is_set()
+    assert any("queued for the next turn" in message for message in adapter.sent)
+
+
+@pytest.mark.asyncio
+async def test_draining_rejects_new_session_messages():
+    runner, _adapter = make_restart_runner()
+    runner._draining = True
+    runner._restart_requested = True
+
+    event = MessageEvent(
+        text="hello",
+        message_type=MessageType.TEXT,
+        source=make_restart_source("fresh"),
+        message_id="m3",
+    )
+
+    result = await runner._handle_message(event)
+
+    assert result == "⏳ Gateway is restarting and is not accepting new work right now."
+
+
+def test_load_busy_input_mode_prefers_env_then_config_then_default(tmp_path, monkeypatch):
+    monkeypatch.setattr(gateway_run, "_hermes_home", tmp_path)
+    monkeypatch.delenv("HERMES_GATEWAY_BUSY_INPUT_MODE", raising=False)
+
+    assert gateway_run.GatewayRunner._load_busy_input_mode() == "interrupt"
+
+    (tmp_path / "config.yaml").write_text(
+        "display:\n  busy_input_mode: queue\n", encoding="utf-8"
+    )
+    assert gateway_run.GatewayRunner._load_busy_input_mode() == "queue"
+
+    monkeypatch.setenv("HERMES_GATEWAY_BUSY_INPUT_MODE", "interrupt")
+    assert gateway_run.GatewayRunner._load_busy_input_mode() == "interrupt"
+
+
+def test_load_restart_drain_timeout_prefers_env_then_config_then_default(
+    tmp_path, monkeypatch, caplog
+):
+    monkeypatch.setattr(gateway_run, "_hermes_home", tmp_path)
+    monkeypatch.delenv("HERMES_RESTART_DRAIN_TIMEOUT", raising=False)
+
+    assert (
+        gateway_run.GatewayRunner._load_restart_drain_timeout()
+        == DEFAULT_GATEWAY_RESTART_DRAIN_TIMEOUT
+    )
+
+    (tmp_path / "config.yaml").write_text(
+        "agent:\n  restart_drain_timeout: 12\n", encoding="utf-8"
+    )
+    assert gateway_run.GatewayRunner._load_restart_drain_timeout() == 12.0
+
+    monkeypatch.setenv("HERMES_RESTART_DRAIN_TIMEOUT", "7")
+    assert gateway_run.GatewayRunner._load_restart_drain_timeout() == 7.0
+
+    monkeypatch.setenv("HERMES_RESTART_DRAIN_TIMEOUT", "invalid")
+    assert (
+        gateway_run.GatewayRunner._load_restart_drain_timeout()
+        == DEFAULT_GATEWAY_RESTART_DRAIN_TIMEOUT
+    )
+    assert "Invalid restart_drain_timeout" in caplog.text
+
+
+@pytest.mark.asyncio
+async def test_request_restart_is_idempotent():
+    runner, _adapter = make_restart_runner()
+    runner.stop = AsyncMock()
+
+    assert runner.request_restart(detached=True, via_service=False) is True
+    first_task = next(iter(runner._background_tasks))
+    assert runner.request_restart(detached=True, via_service=False) is False
+
+    await first_task
+
+    runner.stop.assert_awaited_once_with(
+        restart=True, detached_restart=True, service_restart=False
+    )
+
+
+@pytest.mark.asyncio
+async def test_launch_detached_restart_command_uses_setsid(monkeypatch):
+    runner, _adapter = make_restart_runner()
+    popen_calls = []
+
+    monkeypatch.setattr(gateway_run, "_resolve_hermes_bin", lambda: ["/usr/bin/hermes"])
+    monkeypatch.setattr(gateway_run.os, "getpid", lambda: 321)
+    monkeypatch.setattr(shutil, "which", lambda cmd: "/usr/bin/setsid" if cmd == "setsid" else None)
+
+    def fake_popen(cmd, **kwargs):
+        popen_calls.append((cmd, kwargs))
+        return MagicMock()
+
+    monkeypatch.setattr(subprocess, "Popen", fake_popen)
+
+    await runner._launch_detached_restart_command()
+
+    assert len(popen_calls) == 1
+    cmd, kwargs = popen_calls[0]
+    assert cmd[:2] == ["/usr/bin/setsid", "bash"]
+    assert "gateway restart" in cmd[-1]
+    assert "kill -0 321" in cmd[-1]
+    assert kwargs["start_new_session"] is True
+    assert kwargs["stdout"] is subprocess.DEVNULL
+    assert kwargs["stderr"] is subprocess.DEVNULL
@@ -127,6 +127,16 @@ async def test_shutdown_fires_finalize_for_active_agents(mock_invoke_hook):
    runner._shutdown_event = MagicMock()
    runner.adapters = {}
    runner._exit_reason = "test"
+    runner._exit_code = None
+    runner._draining = False
+    runner._restart_requested = False
+    runner._restart_task_started = False
+    runner._restart_detached = False
+    runner._restart_via_service = False
+    runner._restart_drain_timeout = 0.0
+    runner._stop_task = None
+    runner._running_agents_ts = {}
+    runner._update_runtime_status = MagicMock()

    agent1 = MagicMock()
    agent1.session_id = "sess-a"
@@ -3,9 +3,15 @@ import os
 from gateway.config import Platform
 from gateway.run import GatewayRunner
 from gateway.session import SessionContext, SessionSource
+from gateway.session_context import (
+    get_session_env,
+    set_session_vars,
+    clear_session_vars,
+)


-def test_set_session_env_includes_thread_id(monkeypatch):
+def test_set_session_env_sets_contextvars(monkeypatch):
+    """_set_session_env should populate contextvars, not os.environ."""
    runner = object.__new__(GatewayRunner)
    source = SessionSource(
        platform=Platform.TELEGRAM,
@@ -21,25 +27,93 @@ def test_set_session_env_includes_thread_id(monkeypatch):
    monkeypatch.delenv("HERMES_SESSION_CHAT_NAME", raising=False)
    monkeypatch.delenv("HERMES_SESSION_THREAD_ID", raising=False)

-    runner._set_session_env(context)
+    tokens = runner._set_session_env(context)

-    assert os.getenv("HERMES_SESSION_PLATFORM") == "telegram"
-    assert os.getenv("HERMES_SESSION_CHAT_ID") == "-1001"
-    assert os.getenv("HERMES_SESSION_CHAT_NAME") == "Group"
-    assert os.getenv("HERMES_SESSION_THREAD_ID") == "17585"
+    # Values should be readable via get_session_env (contextvar path)
+    assert get_session_env("HERMES_SESSION_PLATFORM") == "telegram"
+    assert get_session_env("HERMES_SESSION_CHAT_ID") == "-1001"
+    assert get_session_env("HERMES_SESSION_CHAT_NAME") == "Group"
+    assert get_session_env("HERMES_SESSION_THREAD_ID") == "17585"
+
+    # os.environ should NOT be touched
+    assert os.getenv("HERMES_SESSION_PLATFORM") is None
+    assert os.getenv("HERMES_SESSION_THREAD_ID") is None
+
+    # Clean up
+    runner._clear_session_env(tokens)


-def test_clear_session_env_removes_thread_id(monkeypatch):
+def test_clear_session_env_restores_previous_state(monkeypatch):
+    """_clear_session_env should restore contextvars to their pre-handler values."""
    runner = object.__new__(GatewayRunner)

-    monkeypatch.setenv("HERMES_SESSION_PLATFORM", "telegram")
-    monkeypatch.setenv("HERMES_SESSION_CHAT_ID", "-1001")
-    monkeypatch.setenv("HERMES_SESSION_CHAT_NAME", "Group")
-    monkeypatch.setenv("HERMES_SESSION_THREAD_ID", "17585")
+    monkeypatch.delenv("HERMES_SESSION_PLATFORM", raising=False)
+    monkeypatch.delenv("HERMES_SESSION_CHAT_ID", raising=False)
+    monkeypatch.delenv("HERMES_SESSION_CHAT_NAME", raising=False)
+    monkeypatch.delenv("HERMES_SESSION_THREAD_ID", raising=False)

-    runner._clear_session_env()
+    source = SessionSource(
+        platform=Platform.TELEGRAM,
+        chat_id="-1001",
+        chat_name="Group",
+        chat_type="group",
+        thread_id="17585",
+    )
+    context = SessionContext(source=source, connected_platforms=[], home_channels={})

-    assert os.getenv("HERMES_SESSION_PLATFORM") is None
-    assert os.getenv("HERMES_SESSION_CHAT_ID") is None
-    assert os.getenv("HERMES_SESSION_CHAT_NAME") is None
-    assert os.getenv("HERMES_SESSION_THREAD_ID") is None
+    tokens = runner._set_session_env(context)
+    assert get_session_env("HERMES_SESSION_PLATFORM") == "telegram"
+
+    runner._clear_session_env(tokens)
+
+    # After clear, contextvars should return to defaults (empty)
+    assert get_session_env("HERMES_SESSION_PLATFORM") == ""
+    assert get_session_env("HERMES_SESSION_CHAT_ID") == ""
+    assert get_session_env("HERMES_SESSION_CHAT_NAME") == ""
+    assert get_session_env("HERMES_SESSION_THREAD_ID") == ""
+
+
+def test_get_session_env_falls_back_to_os_environ(monkeypatch):
+    """get_session_env should fall back to os.environ when contextvar is unset."""
+    monkeypatch.setenv("HERMES_SESSION_PLATFORM", "discord")
+
+    # No contextvar set — should read from os.environ
+    assert get_session_env("HERMES_SESSION_PLATFORM") == "discord"
+
+    # Now set a contextvar — should prefer it
+    tokens = set_session_vars(platform="telegram")
+    assert get_session_env("HERMES_SESSION_PLATFORM") == "telegram"
+
+    # Restore — should fall back to os.environ again
+    clear_session_vars(tokens)
+    assert get_session_env("HERMES_SESSION_PLATFORM") == "discord"
+
+
+def test_get_session_env_default_when_nothing_set(monkeypatch):
+    """get_session_env returns default when neither contextvar nor env is set."""
+    monkeypatch.delenv("HERMES_SESSION_PLATFORM", raising=False)
+
+    assert get_session_env("HERMES_SESSION_PLATFORM") == ""
+    assert get_session_env("HERMES_SESSION_PLATFORM", "fallback") == "fallback"
+
+
+def test_set_session_env_handles_missing_optional_fields():
+    """_set_session_env should handle None chat_name and thread_id gracefully."""
+    runner = object.__new__(GatewayRunner)
+    source = SessionSource(
+        platform=Platform.TELEGRAM,
+        chat_id="-1001",
+        chat_name=None,
+        chat_type="private",
+        thread_id=None,
+    )
+    context = SessionContext(source=source, connected_platforms=[], home_channels={})
+
+    tokens = runner._set_session_env(context)
+
+    assert get_session_env("HERMES_SESSION_PLATFORM") == "telegram"
+    assert get_session_env("HERMES_SESSION_CHAT_ID") == "-1001"
+    assert get_session_env("HERMES_SESSION_CHAT_NAME") == ""
+    assert get_session_env("HERMES_SESSION_THREAD_ID") == ""
+
+    runner._clear_session_env(tokens)
@@ -41,6 +41,15 @@ def _make_runner():
    runner._pending_approvals = {}
    runner._voice_mode = {}
    runner._background_tasks = set()
+    runner._draining = False
+    runner._restart_requested = False
+    runner._restart_task_started = False
+    runner._restart_detached = False
+    runner._restart_via_service = False
+    runner._restart_drain_timeout = 0.0
+    runner._stop_task = None
+    runner._exit_code = None
+    runner._update_runtime_status = MagicMock()
    runner._is_user_authorized = lambda _source: True
    runner.hooks = MagicMock()
    runner.hooks.emit = AsyncMock()
@@ -5,6 +5,10 @@ from pathlib import Path
 from types import SimpleNamespace

 import hermes_cli.gateway as gateway_cli
+from gateway.restart import (
+    DEFAULT_GATEWAY_RESTART_DRAIN_TIMEOUT,
+    GATEWAY_SERVICE_RESTART_EXIT_CODE,
+)


 class TestSystemdServiceRefresh:
@@ -74,7 +78,7 @@ class TestSystemdServiceRefresh:
        assert unit_path.read_text(encoding="utf-8") == "new unit\n"
        assert calls[:2] == [
            ["systemctl", "--user", "daemon-reload"],
-            ["systemctl", "--user", "restart", gateway_cli.get_service_name()],
+            ["systemctl", "--user", "reload-or-restart", gateway_cli.get_service_name()],
        ]


@@ -84,6 +88,8 @@ class TestGeneratedSystemdUnits:

        assert "ExecStart=" in unit
        assert "ExecStop=" not in unit
+        assert "ExecReload=/bin/kill -USR1 $MAINPID" in unit
+        assert f"RestartForceExitStatus={GATEWAY_SERVICE_RESTART_EXIT_CODE}" in unit
        assert "TimeoutStopSec=60" in unit

    def test_user_unit_includes_resolved_node_directory_in_path(self, monkeypatch):
@@ -98,6 +104,8 @@ class TestGeneratedSystemdUnits:

        assert "ExecStart=" in unit
        assert "ExecStop=" not in unit
+        assert "ExecReload=/bin/kill -USR1 $MAINPID" in unit
+        assert f"RestartForceExitStatus={GATEWAY_SERVICE_RESTART_EXIT_CODE}" in unit
        assert "TimeoutStopSec=60" in unit
        assert "WantedBy=multi-user.target" in unit

@@ -157,6 +165,31 @@ class TestGatewayStopCleanup:


 class TestLaunchdServiceRecovery:
+    def test_get_restart_drain_timeout_prefers_env_then_config_then_default(self, monkeypatch):
+        monkeypatch.delenv("HERMES_RESTART_DRAIN_TIMEOUT", raising=False)
+        monkeypatch.setattr(gateway_cli, "read_raw_config", lambda: {})
+
+        assert (
+            gateway_cli._get_restart_drain_timeout()
+            == DEFAULT_GATEWAY_RESTART_DRAIN_TIMEOUT
+        )
+
+        monkeypatch.setattr(
+            gateway_cli,
+            "read_raw_config",
+            lambda: {"agent": {"restart_drain_timeout": 14}},
+        )
+        assert gateway_cli._get_restart_drain_timeout() == 14.0
+
+        monkeypatch.setenv("HERMES_RESTART_DRAIN_TIMEOUT", "9")
+        assert gateway_cli._get_restart_drain_timeout() == 9.0
+
+        monkeypatch.setenv("HERMES_RESTART_DRAIN_TIMEOUT", "invalid")
+        assert (
+            gateway_cli._get_restart_drain_timeout()
+            == DEFAULT_GATEWAY_RESTART_DRAIN_TIMEOUT
+        )
+
    def test_launchd_install_repairs_outdated_plist_without_force(self, tmp_path, monkeypatch):
        plist_path = tmp_path / "ai.hermes.gateway.plist"
        plist_path.write_text("<plist>old content</plist>", encoding="utf-8")
@@ -234,6 +267,55 @@ class TestLaunchdServiceRecovery:
            ["launchctl", "kickstart", target],
        ]

+    def test_launchd_restart_drains_running_gateway_before_kickstart(self, monkeypatch):
+        calls = []
+        target = f"{gateway_cli._launchd_domain()}/{gateway_cli.get_launchd_label()}"
+
+        monkeypatch.setattr(gateway_cli, "_get_restart_drain_timeout", lambda: 12.0)
+        monkeypatch.setattr(gateway_cli, "_request_gateway_self_restart", lambda pid: False)
+        monkeypatch.setattr(gateway_cli, "_wait_for_gateway_exit", lambda timeout, force_after=None: True)
+        monkeypatch.setattr(gateway_cli, "terminate_pid", lambda pid, force=False: calls.append(("term", pid, force)))
+        monkeypatch.setattr(
+            "gateway.status.get_running_pid",
+            lambda: 321,
+        )
+
+        def fake_run(cmd, check=False, **kwargs):
+            calls.append(cmd)
+            return SimpleNamespace(returncode=0, stdout="", stderr="")
+
+        monkeypatch.setattr(gateway_cli.subprocess, "run", fake_run)
+
+        gateway_cli.launchd_restart()
+
+        assert calls == [
+            ("term", 321, False),
+            ["launchctl", "kickstart", "-k", target],
+        ]
+
+    def test_launchd_restart_self_requests_graceful_restart_without_kickstart(self, monkeypatch, capsys):
+        calls = []
+
+        monkeypatch.setattr(
+            "gateway.status.get_running_pid",
+            lambda: 321,
+        )
+        monkeypatch.setattr(
+            gateway_cli,
+            "_request_gateway_self_restart",
+            lambda pid: calls.append(("self", pid)) or True,
+        )
+        monkeypatch.setattr(
+            gateway_cli.subprocess,
+            "run",
+            lambda *args, **kwargs: (_ for _ in ()).throw(AssertionError("launchctl should not run")),
+        )
+
+        gateway_cli.launchd_restart()
+
+        assert calls == [("self", 321)]
+        assert "restart requested" in capsys.readouterr().out.lower()
+
    def test_launchd_stop_uses_bootout_not_kill(self, monkeypatch):
        """launchd_stop must bootout the service so KeepAlive doesn't respawn it."""
        label = gateway_cli.get_launchd_label()
@@ -337,6 +419,31 @@ class TestGatewayServiceDetection:


 class TestGatewaySystemServiceRouting:
+    def test_systemd_restart_self_requests_graceful_restart_without_reload_or_restart(self, monkeypatch, capsys):
+        calls = []
+
+        monkeypatch.setattr(gateway_cli, "_select_systemd_scope", lambda system=False: False)
+        monkeypatch.setattr(gateway_cli, "refresh_systemd_unit_if_needed", lambda system=False: calls.append(("refresh", system)))
+        monkeypatch.setattr(
+            "gateway.status.get_running_pid",
+            lambda: 654,
+        )
+        monkeypatch.setattr(
+            gateway_cli,
+            "_request_gateway_self_restart",
+            lambda pid: calls.append(("self", pid)) or True,
+        )
+        monkeypatch.setattr(
+            gateway_cli.subprocess,
+            "run",
+            lambda *args, **kwargs: (_ for _ in ()).throw(AssertionError("systemctl should not run")),
+        )
+
+        gateway_cli.systemd_restart()
+
+        assert calls == [("refresh", False), ("self", 654)]
+        assert "restart requested" in capsys.readouterr().out.lower()
+
    def test_gateway_install_passes_system_flags(self, monkeypatch):
        monkeypatch.setattr(gateway_cli, "supports_systemd_services", lambda: True)
        monkeypatch.setattr(gateway_cli, "is_termux", lambda: False)
@@ -0,0 +1,279 @@
+"""Tests for WSL detection and WSL-aware gateway behavior."""
+
+import io
+import subprocess
+import sys
+from types import SimpleNamespace
+from unittest.mock import patch, MagicMock, mock_open
+
+import pytest
+
+import hermes_cli.gateway as gateway
+import hermes_constants
+
+
+# =============================================================================
+# is_wsl() in hermes_constants
+# =============================================================================
+
+class TestIsWsl:
+    """Test the shared is_wsl() utility."""
+
+    def setup_method(self):
+        # Reset cached value between tests
+        hermes_constants._wsl_detected = None
+
+    def test_detects_wsl2(self):
+        fake_content = (
+            "Linux version 5.15.146.1-microsoft-standard-WSL2 "
+            "(gcc (GCC) 11.2.0) #1 SMP Thu Jan 11 04:09:03 UTC 2024\n"
+        )
+        with patch("builtins.open", mock_open(read_data=fake_content)):
+            assert hermes_constants.is_wsl() is True
+
+    def test_detects_wsl1(self):
+        fake_content = (
+            "Linux version 4.4.0-19041-Microsoft "
+            "(Microsoft@Microsoft.com) (gcc version 5.4.0) #1\n"
+        )
+        with patch("builtins.open", mock_open(read_data=fake_content)):
+            assert hermes_constants.is_wsl() is True
+
+    def test_native_linux(self):
+        fake_content = (
+            "Linux version 6.5.0-44-generic (buildd@lcy02-amd64-015) "
+            "(x86_64-linux-gnu-gcc-12 (Ubuntu 12.3.0-1ubuntu1~22.04) 12.3.0) #44\n"
+        )
+        with patch("builtins.open", mock_open(read_data=fake_content)):
+            assert hermes_constants.is_wsl() is False
+
+    def test_no_proc_version(self):
+        with patch("builtins.open", side_effect=FileNotFoundError):
+            assert hermes_constants.is_wsl() is False
+
+    def test_result_is_cached(self):
+        """After first detection, subsequent calls return the cached value."""
+        hermes_constants._wsl_detected = True
+        # Even with open raising, cached value is returned
+        with patch("builtins.open", side_effect=FileNotFoundError):
+            assert hermes_constants.is_wsl() is True
+
+
+# =============================================================================
+# _wsl_systemd_operational() in gateway
+# =============================================================================
+
+class TestWslSystemdOperational:
+    """Test the WSL systemd check."""
+
+    def test_running(self, monkeypatch):
+        monkeypatch.setattr(
+            gateway.subprocess, "run",
+            lambda *a, **kw: SimpleNamespace(
+                returncode=0, stdout="running\n", stderr=""
+            ),
+        )
+        assert gateway._wsl_systemd_operational() is True
+
+    def test_degraded(self, monkeypatch):
+        monkeypatch.setattr(
+            gateway.subprocess, "run",
+            lambda *a, **kw: SimpleNamespace(
+                returncode=1, stdout="degraded\n", stderr=""
+            ),
+        )
+        assert gateway._wsl_systemd_operational() is True
+
+    def test_starting(self, monkeypatch):
+        monkeypatch.setattr(
+            gateway.subprocess, "run",
+            lambda *a, **kw: SimpleNamespace(
+                returncode=1, stdout="starting\n", stderr=""
+            ),
+        )
+        assert gateway._wsl_systemd_operational() is True
+
+    def test_offline_no_systemd(self, monkeypatch):
+        monkeypatch.setattr(
+            gateway.subprocess, "run",
+            lambda *a, **kw: SimpleNamespace(
+                returncode=1, stdout="offline\n", stderr=""
+            ),
+        )
+        assert gateway._wsl_systemd_operational() is False
+
+    def test_systemctl_not_found(self, monkeypatch):
+        monkeypatch.setattr(
+            gateway.subprocess, "run",
+            MagicMock(side_effect=FileNotFoundError),
+        )
+        assert gateway._wsl_systemd_operational() is False
+
+    def test_timeout(self, monkeypatch):
+        monkeypatch.setattr(
+            gateway.subprocess, "run",
+            MagicMock(side_effect=subprocess.TimeoutExpired("systemctl", 5)),
+        )
+        assert gateway._wsl_systemd_operational() is False
+
+
+# =============================================================================
+# supports_systemd_services() WSL integration
+# =============================================================================
+
+class TestSupportsSystemdServicesWSL:
+    """Test that supports_systemd_services() handles WSL correctly."""
+
+    def test_wsl_with_systemd(self, monkeypatch):
+        """WSL + working systemd → True."""
+        monkeypatch.setattr(gateway, "is_linux", lambda: True)
+        monkeypatch.setattr(gateway, "is_termux", lambda: False)
+        monkeypatch.setattr(gateway, "is_wsl", lambda: True)
+        monkeypatch.setattr(gateway, "_wsl_systemd_operational", lambda: True)
+        assert gateway.supports_systemd_services() is True
+
+    def test_wsl_without_systemd(self, monkeypatch):
+        """WSL + no systemd → False."""
+        monkeypatch.setattr(gateway, "is_linux", lambda: True)
+        monkeypatch.setattr(gateway, "is_termux", lambda: False)
+        monkeypatch.setattr(gateway, "is_wsl", lambda: True)
+        monkeypatch.setattr(gateway, "_wsl_systemd_operational", lambda: False)
+        assert gateway.supports_systemd_services() is False
+
+    def test_native_linux(self, monkeypatch):
+        """Native Linux (not WSL) → True without checking systemd."""
+        monkeypatch.setattr(gateway, "is_linux", lambda: True)
+        monkeypatch.setattr(gateway, "is_termux", lambda: False)
+        monkeypatch.setattr(gateway, "is_wsl", lambda: False)
+        assert gateway.supports_systemd_services() is True
+
+    def test_termux_still_excluded(self, monkeypatch):
+        """Termux → False regardless of WSL status."""
+        monkeypatch.setattr(gateway, "is_linux", lambda: True)
+        monkeypatch.setattr(gateway, "is_termux", lambda: True)
+        assert gateway.supports_systemd_services() is False
+
+
+# =============================================================================
+# WSL messaging in gateway commands
+# =============================================================================
+
+class TestGatewayCommandWSLMessages:
+    """Test that WSL users see appropriate guidance."""
+
+    def test_install_wsl_no_systemd(self, monkeypatch, capsys):
+        """hermes gateway install on WSL without systemd shows guidance."""
+        monkeypatch.setattr(gateway, "is_linux", lambda: True)
+        monkeypatch.setattr(gateway, "is_termux", lambda: False)
+        monkeypatch.setattr(gateway, "is_wsl", lambda: True)
+        monkeypatch.setattr(gateway, "supports_systemd_services", lambda: False)
+        monkeypatch.setattr(gateway, "is_macos", lambda: False)
+        monkeypatch.setattr(gateway, "is_managed", lambda: False)
+
+        args = SimpleNamespace(
+            gateway_command="install", force=False, system=False,
+            run_as_user=None,
+        )
+        with pytest.raises(SystemExit) as exc_info:
+            gateway.gateway_command(args)
+        assert exc_info.value.code == 1
+
+        out = capsys.readouterr().out
+        assert "WSL detected" in out
+        assert "systemd is not running" in out
+        assert "hermes gateway run" in out
+        assert "tmux" in out
+
+    def test_start_wsl_no_systemd(self, monkeypatch, capsys):
+        """hermes gateway start on WSL without systemd shows guidance."""
+        monkeypatch.setattr(gateway, "is_linux", lambda: True)
+        monkeypatch.setattr(gateway, "is_termux", lambda: False)
+        monkeypatch.setattr(gateway, "is_wsl", lambda: True)
+        monkeypatch.setattr(gateway, "supports_systemd_services", lambda: False)
+        monkeypatch.setattr(gateway, "is_macos", lambda: False)
+
+        args = SimpleNamespace(gateway_command="start", system=False)
+        with pytest.raises(SystemExit) as exc_info:
+            gateway.gateway_command(args)
+        assert exc_info.value.code == 1
+
+        out = capsys.readouterr().out
+        assert "WSL detected" in out
+        assert "hermes gateway run" in out
+        assert "wsl.conf" in out
+
+    def test_install_wsl_with_systemd_warns(self, monkeypatch, capsys):
+        """hermes gateway install on WSL with systemd shows warning but proceeds."""
+        monkeypatch.setattr(gateway, "is_linux", lambda: True)
+        monkeypatch.setattr(gateway, "is_termux", lambda: False)
+        monkeypatch.setattr(gateway, "is_wsl", lambda: True)
+        monkeypatch.setattr(gateway, "supports_systemd_services", lambda: True)
+        monkeypatch.setattr(gateway, "is_macos", lambda: False)
+        monkeypatch.setattr(gateway, "is_managed", lambda: False)
+
+        # Mock systemd_install to capture call
+        install_called = []
+        monkeypatch.setattr(
+            gateway, "systemd_install",
+            lambda **kwargs: install_called.append(kwargs),
+        )
+
+        args = SimpleNamespace(
+            gateway_command="install", force=False, system=False,
+            run_as_user=None,
+        )
+        gateway.gateway_command(args)
+
+        out = capsys.readouterr().out
+        assert "WSL detected" in out
+        assert "may not survive WSL restarts" in out
+        assert len(install_called) == 1  # install still proceeded
+
+    def test_status_wsl_running_manual(self, monkeypatch, capsys):
+        """hermes gateway status on WSL with manual process shows WSL note."""
+        monkeypatch.setattr(gateway, "supports_systemd_services", lambda: False)
+        monkeypatch.setattr(gateway, "is_macos", lambda: False)
+        monkeypatch.setattr(gateway, "is_termux", lambda: False)
+        monkeypatch.setattr(gateway, "is_wsl", lambda: True)
+        monkeypatch.setattr(gateway, "find_gateway_pids", lambda: [12345])
+        monkeypatch.setattr(gateway, "_runtime_health_lines", lambda: [])
+        # Stub out the systemd unit path check
+        monkeypatch.setattr(
+            gateway, "get_systemd_unit_path",
+            lambda system=False: SimpleNamespace(exists=lambda: False),
+        )
+        monkeypatch.setattr(
+            gateway, "get_launchd_plist_path",
+            lambda: SimpleNamespace(exists=lambda: False),
+        )
+
+        args = SimpleNamespace(gateway_command="status", deep=False, system=False)
+        gateway.gateway_command(args)
+
+        out = capsys.readouterr().out
+        assert "WSL note" in out
+        assert "tmux or screen" in out
+
+    def test_status_wsl_not_running(self, monkeypatch, capsys):
+        """hermes gateway status on WSL with no process shows WSL start advice."""
+        monkeypatch.setattr(gateway, "supports_systemd_services", lambda: False)
+        monkeypatch.setattr(gateway, "is_macos", lambda: False)
+        monkeypatch.setattr(gateway, "is_termux", lambda: False)
+        monkeypatch.setattr(gateway, "is_wsl", lambda: True)
+        monkeypatch.setattr(gateway, "find_gateway_pids", lambda: [])
+        monkeypatch.setattr(gateway, "_runtime_health_lines", lambda: [])
+        monkeypatch.setattr(
+            gateway, "get_systemd_unit_path",
+            lambda system=False: SimpleNamespace(exists=lambda: False),
+        )
+        monkeypatch.setattr(
+            gateway, "get_launchd_plist_path",
+            lambda: SimpleNamespace(exists=lambda: False),
+        )
+
+        args = SimpleNamespace(gateway_command="status", deep=False, system=False)
+        gateway.gateway_command(args)
+
+        out = capsys.readouterr().out
+        assert "hermes gateway run" in out
+        assert "tmux" in out
@@ -555,3 +555,103 @@ class TestPromptPluginEnvVars:

        # Should not crash, and not save anything
        mock_save.assert_not_called()
+
+
+# ── curses_radiolist ─────────────────────────────────────────────────────
+
+
+class TestCursesRadiolist:
+    """Test the curses_radiolist function (non-TTY fallback path)."""
+
+    def test_non_tty_returns_default(self):
+        from hermes_cli.curses_ui import curses_radiolist
+        with patch("sys.stdin") as mock_stdin:
+            mock_stdin.isatty.return_value = False
+            result = curses_radiolist("Pick one", ["a", "b", "c"], selected=1)
+            assert result == 1
+
+    def test_non_tty_returns_cancel_value(self):
+        from hermes_cli.curses_ui import curses_radiolist
+        with patch("sys.stdin") as mock_stdin:
+            mock_stdin.isatty.return_value = False
+            result = curses_radiolist("Pick", ["x", "y"], selected=0, cancel_returns=1)
+            assert result == 1
+
+
+# ── Provider discovery helpers ───────────────────────────────────────────
+
+
+class TestProviderDiscovery:
+    """Test provider plugin discovery and config helpers."""
+
+    def test_get_current_memory_provider_default(self, tmp_path, monkeypatch):
+        """Empty config returns empty string."""
+        monkeypatch.setenv("HERMES_HOME", str(tmp_path))
+        config_file = tmp_path / "config.yaml"
+        config_file.write_text("memory:\n  provider: ''\n")
+        from hermes_cli.plugins_cmd import _get_current_memory_provider
+        result = _get_current_memory_provider()
+        assert result == ""
+
+    def test_get_current_context_engine_default(self, tmp_path, monkeypatch):
+        """Default config returns 'compressor'."""
+        monkeypatch.setenv("HERMES_HOME", str(tmp_path))
+        config_file = tmp_path / "config.yaml"
+        config_file.write_text("context:\n  engine: compressor\n")
+        from hermes_cli.plugins_cmd import _get_current_context_engine
+        result = _get_current_context_engine()
+        assert result == "compressor"
+
+    def test_save_memory_provider(self, tmp_path, monkeypatch):
+        """Saving a memory provider persists to config.yaml."""
+        monkeypatch.setenv("HERMES_HOME", str(tmp_path))
+        config_file = tmp_path / "config.yaml"
+        config_file.write_text("memory:\n  provider: ''\n")
+        from hermes_cli.plugins_cmd import _save_memory_provider
+        _save_memory_provider("honcho")
+        content = yaml.safe_load(config_file.read_text())
+        assert content["memory"]["provider"] == "honcho"
+
+    def test_save_context_engine(self, tmp_path, monkeypatch):
+        """Saving a context engine persists to config.yaml."""
+        monkeypatch.setenv("HERMES_HOME", str(tmp_path))
+        config_file = tmp_path / "config.yaml"
+        config_file.write_text("context:\n  engine: compressor\n")
+        from hermes_cli.plugins_cmd import _save_context_engine
+        _save_context_engine("lcm")
+        content = yaml.safe_load(config_file.read_text())
+        assert content["context"]["engine"] == "lcm"
+
+    def test_discover_memory_providers_empty(self):
+        """Discovery returns empty list when import fails."""
+        with patch("plugins.memory.discover_memory_providers",
+                    side_effect=ImportError("no module")):
+            from hermes_cli.plugins_cmd import _discover_memory_providers
+            result = _discover_memory_providers()
+            assert result == []
+
+    def test_discover_context_engines_empty(self):
+        """Discovery returns empty list when import fails."""
+        with patch("plugins.context_engine.discover_context_engines",
+                    side_effect=ImportError("no module")):
+            from hermes_cli.plugins_cmd import _discover_context_engines
+            result = _discover_context_engines()
+            assert result == []
+
+
+# ── Auto-activation fix ──────────────────────────────────────────────────
+
+
+class TestNoAutoActivation:
+    """Verify that plugin engines don't auto-activate when config says 'compressor'."""
+
+    def test_compressor_default_ignores_plugin(self):
+        """When context.engine is 'compressor', a plugin-registered engine should NOT
+        be used — only explicit config triggers plugin engines."""
+        # This tests the run_agent.py logic indirectly by checking that the
+        # code path for default config doesn't call get_plugin_context_engine.
+        import run_agent as ra_module
+        source = open(ra_module.__file__).read()
+        # The old code had: "Even with default config, check if a plugin registered one"
+        # The fix removes this. Verify it's gone.
+        assert "Even with default config, check if a plugin registered one" not in source
@@ -22,7 +22,7 @@ def _parse_setup_imports():
 class TestSetupShutilImport:
    def test_shutil_imported_at_module_level(self):
        """shutil must be imported at module level so setup_gateway can use it
-        for the matrix-nio auto-install path (line ~2126)."""
+        for the mautrix auto-install path."""
        names = _parse_setup_imports()
        assert "shutil" in names, (
            "shutil is not imported at the top of hermes_cli/setup.py. "
@@ -1823,6 +1823,111 @@ class TestRunConversation:
        assert result["final_response"] == "Here is the actual answer."
        assert result["api_calls"] == 2  # 1 original + 1 nudge retry

+    def test_empty_response_triggers_fallback_provider(self, agent):
+        """After 3 empty retries, fallback provider is activated and produces content."""
+        self._setup_agent(agent)
+        agent.base_url = "http://127.0.0.1:1234/v1"
+        # Configure a fallback chain
+        agent._fallback_chain = [{"provider": "openrouter", "model": "anthropic/claude-sonnet-4"}]
+        agent._fallback_index = 0
+        agent._fallback_activated = False
+
+        empty_resp = _mock_response(content=None, finish_reason="stop")
+        content_resp = _mock_response(content="Fallback answer.", finish_reason="stop")
+        # 4 empty (1 orig + 3 retries), then fallback model answers
+        agent.client.chat.completions.create.side_effect = [
+            empty_resp, empty_resp, empty_resp, empty_resp, content_resp,
+        ]
+
+        fallback_called = {"called": False}
+
+        def _mock_fallback():
+            fallback_called["called"] = True
+            # Simulate what _try_activate_fallback does: just advance the
+            # index and set the flag (the client is already mocked).
+            agent._fallback_index = 1
+            agent._fallback_activated = True
+            agent.model = "anthropic/claude-sonnet-4"
+            agent.provider = "openrouter"
+            return True
+
+        with (
+            patch.object(agent, "_persist_session"),
+            patch.object(agent, "_save_trajectory"),
+            patch.object(agent, "_cleanup_task_resources"),
+            patch.object(agent, "_try_activate_fallback", side_effect=_mock_fallback),
+        ):
+            result = agent.run_conversation("answer me")
+        assert fallback_called["called"], "Fallback should have been triggered"
+        assert result["completed"] is True
+        assert result["final_response"] == "Fallback answer."
+
+    def test_empty_response_fallback_also_empty_returns_empty(self, agent):
+        """If fallback also returns empty, final response is (empty)."""
+        self._setup_agent(agent)
+        agent.base_url = "http://127.0.0.1:1234/v1"
+        agent._fallback_chain = [{"provider": "openrouter", "model": "anthropic/claude-sonnet-4"}]
+        agent._fallback_index = 0
+        agent._fallback_activated = False
+
+        empty_resp = _mock_response(content=None, finish_reason="stop")
+        # 4 empty from primary (1 + 3 retries), fallback activated,
+        # then 4 more empty from fallback (1 + 3 retries), no more fallbacks
+        agent.client.chat.completions.create.side_effect = [
+            empty_resp, empty_resp, empty_resp, empty_resp,  # primary exhausted
+            empty_resp, empty_resp, empty_resp, empty_resp,  # fallback exhausted
+        ]
+
+        def _mock_fallback():
+            if agent._fallback_index >= len(agent._fallback_chain):
+                return False
+            agent._fallback_index += 1
+            agent._fallback_activated = True
+            agent.model = "anthropic/claude-sonnet-4"
+            agent.provider = "openrouter"
+            return True
+
+        with (
+            patch.object(agent, "_persist_session"),
+            patch.object(agent, "_save_trajectory"),
+            patch.object(agent, "_cleanup_task_resources"),
+            patch.object(agent, "_try_activate_fallback", side_effect=_mock_fallback),
+        ):
+            result = agent.run_conversation("answer me")
+        assert result["completed"] is True
+        assert result["final_response"] == "(empty)"
+
+    def test_empty_response_emits_status_for_gateway(self, agent):
+        """_emit_status is called during empty retries so gateway users see feedback."""
+        self._setup_agent(agent)
+        agent.base_url = "http://127.0.0.1:1234/v1"
+
+        empty_resp = _mock_response(content=None, finish_reason="stop")
+        # 4 empty: 1 original + 3 retries, all empty, no fallback
+        agent.client.chat.completions.create.side_effect = [
+            empty_resp, empty_resp, empty_resp, empty_resp,
+        ]
+
+        status_messages = []
+
+        def _capture_status(msg):
+            status_messages.append(msg)
+
+        with (
+            patch.object(agent, "_persist_session"),
+            patch.object(agent, "_save_trajectory"),
+            patch.object(agent, "_cleanup_task_resources"),
+            patch.object(agent, "_emit_status", side_effect=_capture_status),
+        ):
+            result = agent.run_conversation("answer me")
+
+        assert result["final_response"] == "(empty)"
+        # Should have emitted retry statuses (3 retries) + final failure
+        retry_msgs = [m for m in status_messages if "retrying" in m.lower()]
+        assert len(retry_msgs) == 3, f"Expected 3 retry status messages, got {len(retry_msgs)}: {status_messages}"
+        failure_msgs = [m for m in status_messages if "no content" in m.lower() or "no fallback" in m.lower()]
+        assert len(failure_msgs) >= 1, f"Expected at least 1 failure status, got: {status_messages}"
+
    def test_nous_401_refreshes_after_remint_and_retries(self, agent):
        self._setup_agent(agent)
        agent.provider = "nous"
@@ -2125,6 +2230,28 @@ class TestRetryExhaustion:
        assert "error" in result
        assert "rate limited" in result["error"]

+    def test_build_api_kwargs_error_no_unbound_local(self, agent):
+        """When _build_api_kwargs raises, except handler must not crash with UnboundLocalError.
+
+        Regression: _dump_api_request_debug(api_kwargs, ...) in the except block
+        referenced api_kwargs before it was assigned when _build_api_kwargs threw.
+        """
+        self._setup_agent(agent)
+        with (
+            patch.object(agent, "_build_api_kwargs", side_effect=ValueError("bad messages")),
+            patch.object(agent, "_persist_session"),
+            patch.object(agent, "_save_trajectory"),
+            patch.object(agent, "_cleanup_task_resources"),
+            patch("run_agent.time", self._make_fast_time_mock()),
+        ):
+            result = agent.run_conversation("hello")
+        # Must surface the real error, not UnboundLocalError
+        assert result.get("completed") is False
+        assert result.get("failed") is True
+        assert "error" in result
+        assert "UnboundLocalError" not in result.get("error", "")
+        assert "bad messages" in result["error"]
+

 # ---------------------------------------------------------------------------
 # Flush sentinel leak
@@ -11,12 +11,19 @@ def _load_optional_dependencies():
    return project["optional-dependencies"]


-def test_matrix_extra_exists_but_excluded_from_all():
-    """matrix-nio[e2e] depends on python-olm which is upstream-broken on modern
-    macOS (archived libolm, C++ errors with Clang 21+).  The [matrix] extra is
-    kept for opt-in install but deliberately excluded from [all] so one broken
-    upstream dep doesn't nuke every other extra during ``hermes update``."""
+def test_matrix_extra_linux_only_in_all():
+    """mautrix[encryption] depends on python-olm which is upstream-broken on
+    modern macOS (archived libolm, C++ errors with Clang 21+).  The [matrix]
+    extra is included in [all] but gated to Linux via a platform marker so
+    that ``hermes update`` doesn't fail on macOS."""
    optional_dependencies = _load_optional_dependencies()

    assert "matrix" in optional_dependencies
+    # Must NOT be unconditional — python-olm has no macOS wheels.
    assert "hermes-agent[matrix]" not in optional_dependencies["all"]
+    # Must be present with a Linux platform marker.
+    linux_gated = [
+        dep for dep in optional_dependencies["all"]
+        if "matrix" in dep and "linux" in dep
+    ]
+    assert linux_gated, "expected hermes-agent[matrix] with sys_platform=='linux' marker in [all]"
@@ -205,9 +205,9 @@ class TestMacosOsascript:

 class TestIsWsl:
    def setup_method(self):
-        # Reset cached value before each test
-        import hermes_cli.clipboard as cb
-        cb._wsl_detected = None
+        # _is_wsl is now hermes_constants.is_wsl — reset its cache
+        import hermes_constants
+        hermes_constants._wsl_detected = None

    def test_wsl2_detected(self):
        content = "Linux version 5.15.0 (microsoft-standard-WSL2)"
@@ -229,6 +229,7 @@ class TestIsWsl:
            assert _is_wsl() is False

    def test_result_is_cached(self):
+        import hermes_constants
        content = "Linux version 5.15.0 (microsoft-standard-WSL2)"
        with patch("builtins.open", mock_open(read_data=content)) as m:
            assert _is_wsl() is True
@@ -1210,5 +1210,73 @@ class TestDelegateHeartbeat(unittest.TestCase):
            f"Heartbeat should include last_activity_desc: {touch_calls}")


+class TestDelegationReasoningEffort(unittest.TestCase):
+    """Tests for delegation.reasoning_effort config override."""
+
+    @patch("tools.delegate_tool._load_config")
+    @patch("run_agent.AIAgent")
+    def test_inherits_parent_reasoning_when_no_override(self, MockAgent, mock_cfg):
+        """With no delegation.reasoning_effort, child inherits parent's config."""
+        mock_cfg.return_value = {"max_iterations": 50, "reasoning_effort": ""}
+        MockAgent.return_value = MagicMock()
+        parent = _make_mock_parent()
+        parent.reasoning_config = {"enabled": True, "effort": "xhigh"}
+
+        _build_child_agent(
+            task_index=0, goal="test", context=None, toolsets=None,
+            model=None, max_iterations=50, parent_agent=parent,
+        )
+        call_kwargs = MockAgent.call_args[1]
+        self.assertEqual(call_kwargs["reasoning_config"], {"enabled": True, "effort": "xhigh"})
+
+    @patch("tools.delegate_tool._load_config")
+    @patch("run_agent.AIAgent")
+    def test_override_reasoning_effort_from_config(self, MockAgent, mock_cfg):
+        """delegation.reasoning_effort overrides the parent's level."""
+        mock_cfg.return_value = {"max_iterations": 50, "reasoning_effort": "low"}
+        MockAgent.return_value = MagicMock()
+        parent = _make_mock_parent()
+        parent.reasoning_config = {"enabled": True, "effort": "xhigh"}
+
+        _build_child_agent(
+            task_index=0, goal="test", context=None, toolsets=None,
+            model=None, max_iterations=50, parent_agent=parent,
+        )
+        call_kwargs = MockAgent.call_args[1]
+        self.assertEqual(call_kwargs["reasoning_config"], {"enabled": True, "effort": "low"})
+
+    @patch("tools.delegate_tool._load_config")
+    @patch("run_agent.AIAgent")
+    def test_override_reasoning_effort_none_disables(self, MockAgent, mock_cfg):
+        """delegation.reasoning_effort: 'none' disables thinking for subagents."""
+        mock_cfg.return_value = {"max_iterations": 50, "reasoning_effort": "none"}
+        MockAgent.return_value = MagicMock()
+        parent = _make_mock_parent()
+        parent.reasoning_config = {"enabled": True, "effort": "high"}
+
+        _build_child_agent(
+            task_index=0, goal="test", context=None, toolsets=None,
+            model=None, max_iterations=50, parent_agent=parent,
+        )
+        call_kwargs = MockAgent.call_args[1]
+        self.assertEqual(call_kwargs["reasoning_config"], {"enabled": False})
+
+    @patch("tools.delegate_tool._load_config")
+    @patch("run_agent.AIAgent")
+    def test_invalid_reasoning_effort_falls_back_to_parent(self, MockAgent, mock_cfg):
+        """Invalid delegation.reasoning_effort falls back to parent's config."""
+        mock_cfg.return_value = {"max_iterations": 50, "reasoning_effort": "banana"}
+        MockAgent.return_value = MagicMock()
+        parent = _make_mock_parent()
+        parent.reasoning_config = {"enabled": True, "effort": "medium"}
+
+        _build_child_agent(
+            task_index=0, goal="test", context=None, toolsets=None,
+            model=None, max_iterations=50, parent_agent=parent,
+        )
+        call_kwargs = MockAgent.call_args[1]
+        self.assertEqual(call_kwargs["reasoning_config"], {"enabled": True, "effort": "medium"})
+
+
 if __name__ == "__main__":
    unittest.main()
@@ -333,3 +333,25 @@ class TestShellFileOpsWriteDenied:
        result = file_ops.patch_replace("~/.ssh/authorized_keys", "old", "new")
        assert result.error is not None
        assert "denied" in result.error.lower()
+
+    def test_delete_file_denied_path(self, file_ops):
+        result = file_ops.delete_file("~/.ssh/authorized_keys")
+        assert result.error is not None
+        assert "denied" in result.error.lower()
+
+    def test_move_file_src_denied(self, file_ops):
+        result = file_ops.move_file("~/.ssh/id_rsa", "/tmp/dest.txt")
+        assert result.error is not None
+        assert "denied" in result.error.lower()
+
+    def test_move_file_dst_denied(self, file_ops):
+        result = file_ops.move_file("/tmp/src.txt", "~/.aws/credentials")
+        assert result.error is not None
+        assert "denied" in result.error.lower()
+
+    def test_move_file_failure_path(self, mock_env):
+        mock_env.execute.return_value = {"output": "No such file or directory", "returncode": 1}
+        ops = ShellFileOperations(mock_env)
+        result = ops.move_file("/tmp/nonexistent.txt", "/tmp/dest.txt")
+        assert result.error is not None
+        assert "Failed to move" in result.error
@@ -0,0 +1,148 @@
+"""Tests for edge cases in tools/file_operations.py.
+
+Covers:
+- ``_is_likely_binary()`` content-analysis branch (dead-code removal regression guard)
+- ``_check_lint()`` robustness against file paths containing curly braces
+"""
+
+import pytest
+from unittest.mock import MagicMock, patch
+
+from tools.file_operations import ShellFileOperations
+
+
+# =========================================================================
+# _is_likely_binary edge cases
+# =========================================================================
+
+
+class TestIsLikelyBinary:
+    """Verify content-analysis logic after dead-code removal."""
+
+    @pytest.fixture()
+    def ops(self):
+        return ShellFileOperations.__new__(ShellFileOperations)
+
+    def test_binary_extension_returns_true(self, ops):
+        """Known binary extensions should short-circuit without content analysis."""
+        assert ops._is_likely_binary("image.png") is True
+        assert ops._is_likely_binary("archive.tar.gz", content_sample="hello") is True
+
+    def test_text_content_returns_false(self, ops):
+        """Normal printable text should not be classified as binary."""
+        sample = "Hello, world!\nThis is a normal text file.\n"
+        assert ops._is_likely_binary("unknown.xyz", content_sample=sample) is False
+
+    def test_binary_content_returns_true(self, ops):
+        """Content with >30% non-printable characters should be classified as binary."""
+        # 500 NUL bytes + 500 printable = 50% non-printable → binary
+        # Use .xyz extension (not in BINARY_EXTENSIONS) to ensure content analysis runs
+        sample = "\x00" * 500 + "a" * 500
+        assert ops._is_likely_binary("data.xyz", content_sample=sample) is True
+
+    def test_no_content_sample_returns_false(self, ops):
+        """When no content sample is provided and extension is unknown → not binary."""
+        assert ops._is_likely_binary("mystery_file") is False
+
+    def test_none_content_sample_returns_false(self, ops):
+        """Explicit ``None`` content_sample should behave the same as missing."""
+        assert ops._is_likely_binary("mystery_file", content_sample=None) is False
+
+    def test_empty_string_content_sample_returns_false(self, ops):
+        """Empty string is falsy, so content analysis should be skipped → not binary."""
+        assert ops._is_likely_binary("mystery_file", content_sample="") is False
+
+    def test_threshold_boundary(self, ops):
+        """Exactly 30% non-printable should NOT trigger binary classification (> 0.30, not >=)."""
+        # 300 NUL bytes + 700 printable = 30.0% → should be False (uses strict >)
+        sample = "\x00" * 300 + "a" * 700
+        assert ops._is_likely_binary("data.xyz", content_sample=sample) is False
+
+    def test_just_above_threshold(self, ops):
+        """301/1000 = 30.1% non-printable → should be binary."""
+        sample = "\x00" * 301 + "a" * 699
+        assert ops._is_likely_binary("data.xyz", content_sample=sample) is True
+
+    def test_tabs_and_newlines_excluded(self, ops):
+        """Tabs, carriage returns, and newlines should not count as non-printable."""
+        sample = "\t" * 400 + "\n" * 300 + "\r" * 200 + "a" * 100
+        assert ops._is_likely_binary("file.txt", content_sample=sample) is False
+
+    def test_content_sample_longer_than_1000(self, ops):
+        """Only the first 1000 characters should be analysed."""
+        # First 1000 chars: 200 NUL + 800 printable = 20% → not binary
+        # Remaining 1000 chars: all NUL → ignored by [:1000] slice
+        sample = "\x00" * 200 + "a" * 800 + "\x00" * 1000
+        assert ops._is_likely_binary("file.xyz", content_sample=sample) is False
+
+
+# =========================================================================
+# _check_lint edge cases
+# =========================================================================
+
+
+class TestCheckLintBracePaths:
+    """Verify _check_lint handles file paths with curly braces safely."""
+
+    @pytest.fixture()
+    def ops(self):
+        obj = ShellFileOperations.__new__(ShellFileOperations)
+        obj._command_cache = {}
+        return obj
+
+    def test_normal_path(self, ops):
+        """Normal path without braces should work as before."""
+        with patch.object(ops, "_has_command", return_value=True), \
+             patch.object(ops, "_exec") as mock_exec:
+            mock_exec.return_value = MagicMock(exit_code=0, stdout="")
+            result = ops._check_lint("/tmp/test_file.py")
+
+        assert result.success is True
+        # Verify the command was built correctly
+        cmd_arg = mock_exec.call_args[0][0]
+        assert "'/tmp/test_file.py'" in cmd_arg
+
+    def test_path_with_curly_braces(self, ops):
+        """Path containing ``{`` and ``}`` must not raise KeyError/ValueError."""
+        with patch.object(ops, "_has_command", return_value=True), \
+             patch.object(ops, "_exec") as mock_exec:
+            mock_exec.return_value = MagicMock(exit_code=0, stdout="")
+            # This would raise KeyError with .format() but works with .replace()
+            result = ops._check_lint("/tmp/{test}_file.py")
+
+        assert result.success is True
+        cmd_arg = mock_exec.call_args[0][0]
+        assert "{test}" in cmd_arg
+
+    def test_path_with_nested_braces(self, ops):
+        """Path with complex brace patterns like ``{{var}}`` should be safe."""
+        with patch.object(ops, "_has_command", return_value=True), \
+             patch.object(ops, "_exec") as mock_exec:
+            mock_exec.return_value = MagicMock(exit_code=0, stdout="")
+            result = ops._check_lint("/tmp/{{var}}.py")
+
+        assert result.success is True
+
+    def test_unsupported_extension_skipped(self, ops):
+        """Extensions without a linter should return a skipped result."""
+        result = ops._check_lint("/tmp/file.unknown_ext")
+        assert result.skipped is True
+
+    def test_missing_linter_skipped(self, ops):
+        """When the linter binary is not installed, skip gracefully."""
+        with patch.object(ops, "_has_command", return_value=False):
+            result = ops._check_lint("/tmp/test.py")
+        assert result.skipped is True
+
+    def test_lint_failure_returns_output(self, ops):
+        """When the linter exits non-zero, result should capture output."""
+        with patch.object(ops, "_has_command", return_value=True), \
+             patch.object(ops, "_exec") as mock_exec:
+            mock_exec.return_value = MagicMock(
+                exit_code=1,
+                stdout="SyntaxError: invalid syntax",
+            )
+            result = ops._check_lint("/tmp/bad.py")
+
+        assert result.success is False
+        assert "SyntaxError" in result.output
@@ -255,3 +255,57 @@ class TestEdgeCases:

        mgr.sync(force=True)
        upload.assert_not_called()  # _file_mtime_key returns None, skipped
+
+
+class TestBulkUpload:
+    """Tests for the optional bulk_upload_fn callback."""
+
+    def test_bulk_upload_used_when_provided(self, tmp_files):
+        """When bulk_upload_fn is set, it's called instead of per-file upload_fn."""
+        upload = MagicMock()
+        bulk_upload = MagicMock()
+        mgr = FileSyncManager(
+            get_files_fn=_make_get_files(tmp_files),
+            upload_fn=upload,
+            delete_fn=MagicMock(),
+            bulk_upload_fn=bulk_upload,
+        )
+
+        mgr.sync(force=True)
+        upload.assert_not_called()
+        bulk_upload.assert_called_once()
+        # All 3 files passed as a list of (host, remote) tuples
+        files_arg = bulk_upload.call_args[0][0]
+        assert len(files_arg) == 3
+
+    def test_fallback_to_upload_fn_when_no_bulk(self, tmp_files):
+        """Without bulk_upload_fn, per-file upload_fn is used (backwards compat)."""
+        upload = MagicMock()
+        mgr = FileSyncManager(
+            get_files_fn=_make_get_files(tmp_files),
+            upload_fn=upload,
+            delete_fn=MagicMock(),
+            bulk_upload_fn=None,
+        )
+
+        mgr.sync(force=True)
+        assert upload.call_count == 3
+
+    def test_bulk_upload_rollback_on_failure(self, tmp_files):
+        """Bulk upload failure rolls back synced state so next sync retries."""
+        bulk_upload = MagicMock(side_effect=RuntimeError("upload failed"))
+        mgr = FileSyncManager(
+            get_files_fn=_make_get_files(tmp_files),
+            upload_fn=MagicMock(),
+            delete_fn=MagicMock(),
+            bulk_upload_fn=bulk_upload,
+        )
+
+        mgr.sync(force=True)  # fails, should rollback
+
+        # State rolled back: next sync should retry all files
+        bulk_upload.side_effect = None
+        bulk_upload.reset_mock()
+        mgr.sync(force=True)
+        bulk_upload.assert_called_once()
+        assert len(bulk_upload.call_args[0][0]) == 3
@@ -6,31 +6,31 @@ from tools.fuzzy_match import fuzzy_find_and_replace
 class TestExactMatch:
    def test_single_replacement(self):
        content = "hello world"
-        new, count, err = fuzzy_find_and_replace(content, "hello", "hi")
+        new, count, _, err = fuzzy_find_and_replace(content, "hello", "hi")
        assert err is None
        assert count == 1
        assert new == "hi world"

    def test_no_match(self):
        content = "hello world"
-        new, count, err = fuzzy_find_and_replace(content, "xyz", "abc")
+        new, count, _, err = fuzzy_find_and_replace(content, "xyz", "abc")
        assert count == 0
        assert err is not None
        assert new == content

    def test_empty_old_string(self):
-        new, count, err = fuzzy_find_and_replace("abc", "", "x")
+        new, count, _, err = fuzzy_find_and_replace("abc", "", "x")
        assert count == 0
        assert err is not None

    def test_identical_strings(self):
-        new, count, err = fuzzy_find_and_replace("abc", "abc", "abc")
+        new, count, _, err = fuzzy_find_and_replace("abc", "abc", "abc")
        assert count == 0
        assert "identical" in err

    def test_multiline_exact(self):
        content = "line1\nline2\nline3"
-        new, count, err = fuzzy_find_and_replace(content, "line1\nline2", "replaced")
+        new, count, _, err = fuzzy_find_and_replace(content, "line1\nline2", "replaced")
        assert err is None
        assert count == 1
        assert new == "replaced\nline3"
@@ -39,7 +39,7 @@ class TestExactMatch:
 class TestWhitespaceDifference:
    def test_extra_spaces_match(self):
        content = "def  foo(  x,  y  ):"
-        new, count, err = fuzzy_find_and_replace(content, "def foo( x, y ):", "def bar(x, y):")
+        new, count, _, err = fuzzy_find_and_replace(content, "def foo( x, y ):", "def bar(x, y):")
        assert count == 1
        assert "bar" in new

@@ -47,7 +47,7 @@ class TestWhitespaceDifference:
 class TestIndentDifference:
    def test_different_indentation(self):
        content = "    def foo():\n        pass"
-        new, count, err = fuzzy_find_and_replace(content, "def foo():\n    pass", "def bar():\n    return 1")
+        new, count, _, err = fuzzy_find_and_replace(content, "def foo():\n    pass", "def bar():\n    return 1")
        assert count == 1
        assert "bar" in new

@@ -55,13 +55,96 @@ class TestIndentDifference:
 class TestReplaceAll:
    def test_multiple_matches_without_flag_errors(self):
        content = "aaa bbb aaa"
-        new, count, err = fuzzy_find_and_replace(content, "aaa", "ccc", replace_all=False)
+        new, count, _, err = fuzzy_find_and_replace(content, "aaa", "ccc", replace_all=False)
        assert count == 0
        assert "Found 2 matches" in err

    def test_multiple_matches_with_flag(self):
        content = "aaa bbb aaa"
-        new, count, err = fuzzy_find_and_replace(content, "aaa", "ccc", replace_all=True)
+        new, count, _, err = fuzzy_find_and_replace(content, "aaa", "ccc", replace_all=True)
        assert err is None
        assert count == 2
        assert new == "ccc bbb ccc"
+
+
+class TestUnicodeNormalized:
+    """Tests for the unicode_normalized strategy (Bug 5)."""
+
+    def test_em_dash_matched(self):
+        """Em-dash in content should match ASCII '--' in pattern."""
+        content = "return value\u2014fallback"
+        new, count, strategy, err = fuzzy_find_and_replace(
+            content, "return value--fallback", "return value or fallback"
+        )
+        assert count == 1, f"Expected match via unicode_normalized, got err={err}"
+        assert strategy == "unicode_normalized"
+        assert "return value or fallback" in new
+
+    def test_smart_quotes_matched(self):
+        """Smart double quotes in content should match straight quotes in pattern."""
+        content = 'print(\u201chello\u201d)'
+        new, count, strategy, err = fuzzy_find_and_replace(
+            content, 'print("hello")', 'print("world")'
+        )
+        assert count == 1, f"Expected match via unicode_normalized, got err={err}"
+        assert "world" in new
+
+    def test_no_unicode_skips_strategy(self):
+        """When content and pattern have no Unicode variants, strategy is skipped."""
+        content = "hello world"
+        # Should match via exact, not unicode_normalized
+        new, count, strategy, err = fuzzy_find_and_replace(content, "hello", "hi")
+        assert count == 1
+        assert strategy == "exact"
+
+
+class TestBlockAnchorThreshold:
+    """Tests for the raised block_anchor threshold (Bug 4)."""
+
+    def test_high_similarity_matches(self):
+        """A block with >50% middle similarity should match."""
+        content = "def foo():\n    x = 1\n    y = 2\n    return x + y\n"
+        pattern = "def foo():\n    x = 1\n    y = 9\n    return x + y"
+        new, count, strategy, err = fuzzy_find_and_replace(content, pattern, "def foo():\n    return 0\n")
+        # Should match via block_anchor or earlier strategy
+        assert count == 1
+
+    def test_completely_different_middle_does_not_match(self):
+        """A block where only first+last lines match but middle is completely different
+        should NOT match under the raised 0.50 threshold."""
+        content = (
+            "class Foo:\n"
+            "    completely = 'unrelated'\n"
+            "    content = 'here'\n"
+            "    nothing = 'in common'\n"
+            "    pass\n"
+        )
+        # Pattern has same first/last lines but completely different middle
+        pattern = (
+            "class Foo:\n"
+            "    x = 1\n"
+            "    y = 2\n"
+            "    z = 3\n"
+            "    pass"
+        )
+        new, count, strategy, err = fuzzy_find_and_replace(content, pattern, "replaced")
+        # With threshold=0.50, this near-zero-similarity middle should not match
+        assert count == 0, (
+            f"Block with unrelated middle should not match under threshold=0.50, "
+            f"but matched via strategy={strategy}"
+        )
+
+
+class TestStrategyNameSurfaced:
+    """Tests for the strategy name in the 4-tuple return (Bug 6)."""
+
+    def test_exact_strategy_name(self):
+        new, count, strategy, err = fuzzy_find_and_replace("hello", "hello", "world")
+        assert strategy == "exact"
+        assert count == 1
+
+    def test_failed_match_returns_none_strategy(self):
+        new, count, strategy, err = fuzzy_find_and_replace("hello", "xyz", "world")
+        assert count == 0
+        assert strategy is None
+        assert err is not None
@@ -104,6 +104,45 @@ class TestStdioPidTracking:
        with _lock:
            assert fake_pid not in _stdio_pids

+    def test_kill_orphaned_uses_sigkill_when_available(self, monkeypatch):
+        """Unix-like platforms should keep using SIGKILL for orphan cleanup."""
+        from tools.mcp_tool import _kill_orphaned_mcp_children, _stdio_pids, _lock
+
+        fake_pid = 424242
+        with _lock:
+            _stdio_pids.clear()
+            _stdio_pids.add(fake_pid)
+
+        fake_sigkill = 9
+        monkeypatch.setattr(signal, "SIGKILL", fake_sigkill, raising=False)
+
+        with patch("tools.mcp_tool.os.kill") as mock_kill:
+            _kill_orphaned_mcp_children()
+
+        mock_kill.assert_called_once_with(fake_pid, fake_sigkill)
+
+        with _lock:
+            assert fake_pid not in _stdio_pids
+
+    def test_kill_orphaned_falls_back_without_sigkill(self, monkeypatch):
+        """Windows-like signal modules without SIGKILL should fall back to SIGTERM."""
+        from tools.mcp_tool import _kill_orphaned_mcp_children, _stdio_pids, _lock
+
+        fake_pid = 434343
+        with _lock:
+            _stdio_pids.clear()
+            _stdio_pids.add(fake_pid)
+
+        monkeypatch.delattr(signal, "SIGKILL", raising=False)
+
+        with patch("tools.mcp_tool.os.kill") as mock_kill:
+            _kill_orphaned_mcp_children()
+
+        mock_kill.assert_called_once_with(fake_pid, signal.SIGTERM)
+
+        with _lock:
+            assert fake_pid not in _stdio_pids
+

 # ---------------------------------------------------------------------------
 # Fix 3: MCP reload timeout (cli.py)
@@ -159,7 +159,7 @@ class TestApplyUpdate:
            def __init__(self):
                self.written = None

-            def read_file(self, path, offset=1, limit=500):
+            def read_file_raw(self, path):
                return SimpleNamespace(
                    content=(
                        'def run():\n'
@@ -211,7 +211,7 @@ class TestAdditionOnlyHunks:
        # Apply to a file that contains the context hint
        class FakeFileOps:
            written = None
-            def read_file(self, path, **kw):
+            def read_file_raw(self, path):
                return SimpleNamespace(
                    content="def main():\n    pass\n",
                    error=None,
@@ -239,7 +239,7 @@ class TestAdditionOnlyHunks:

        class FakeFileOps:
            written = None
-            def read_file(self, path, **kw):
+            def read_file_raw(self, path):
                return SimpleNamespace(
                    content="existing = True\n",
                    error=None,
@@ -253,3 +253,259 @@ class TestAdditionOnlyHunks:
        assert result.success is True
        assert file_ops.written.endswith("def new_func():\n    return True\n")
        assert "existing = True" in file_ops.written
+
+
+class TestReadFileRaw:
+    """Bug 1 regression tests — files > 2000 lines and lines > 2000 chars."""
+
+    def test_apply_update_file_over_2000_lines(self):
+        """A hunk targeting line 2200 must not truncate the file to 2000 lines."""
+        patch = """\
+*** Begin Patch
+*** Update File: big.py
+@@ marker_at_2200 @@
+ line_2200
+-old_value
+new_value
+*** End Patch"""
+        ops, err = parse_v4a_patch(patch)
+        assert err is None
+
+        # Build a 2500-line file; the hunk targets a region at line 2200
+        lines = [f"line_{i}" for i in range(1, 2501)]
+        lines[2199] = "line_2200"   # index 2199 = line 2200
+        lines[2200] = "old_value"
+        file_content = "\n".join(lines)
+
+        class FakeFileOps:
+            written = None
+            def read_file_raw(self, path):
+                return SimpleNamespace(content=file_content, error=None)
+            def write_file(self, path, content):
+                self.written = content
+                return SimpleNamespace(error=None)
+
+        file_ops = FakeFileOps()
+        result = apply_v4a_operations(ops, file_ops)
+        assert result.success is True
+        written_lines = file_ops.written.split("\n")
+        assert len(written_lines) == 2500, (
+            f"Expected 2500 lines, got {len(written_lines)}"
+        )
+        assert "new_value" in file_ops.written
+        assert "old_value" not in file_ops.written
+
+    def test_apply_update_preserves_long_lines(self):
+        """A line > 2000 chars must be preserved verbatim after an unrelated hunk."""
+        long_line = "x" * 3000
+        patch = """\
+*** Begin Patch
+*** Update File: wide.py
+@@ short_func @@
+ def short_func():
+-    return 1
+    return 2
+*** End Patch"""
+        ops, err = parse_v4a_patch(patch)
+        assert err is None
+
+        file_content = f"def short_func():\n    return 1\n{long_line}\n"
+
+        class FakeFileOps:
+            written = None
+            def read_file_raw(self, path):
+                return SimpleNamespace(content=file_content, error=None)
+            def write_file(self, path, content):
+                self.written = content
+                return SimpleNamespace(error=None)
+
+        file_ops = FakeFileOps()
+        result = apply_v4a_operations(ops, file_ops)
+        assert result.success is True
+        assert long_line in file_ops.written, "Long line was truncated"
+        assert "... [truncated]" not in file_ops.written
+
+
+class TestValidationPhase:
+    """Bug 2 regression tests — validation prevents partial apply."""
+
+    def test_validation_failure_writes_nothing(self):
+        """If one hunk is invalid, no files should be written."""
+        patch = """\
+*** Begin Patch
+*** Update File: a.py
+ def good():
+-    return 1
+    return 2
+*** Update File: b.py
+ THIS LINE DOES NOT EXIST
+-    old
+    new
+*** End Patch"""
+        ops, err = parse_v4a_patch(patch)
+        assert err is None
+
+        written = {}
+
+        class FakeFileOps:
+            def read_file_raw(self, path):
+                files = {
+                    "a.py": "def good():\n    return 1\n",
+                    "b.py": "completely different content\n",
+                }
+                content = files.get(path)
+                if content is None:
+                    return SimpleNamespace(content=None, error=f"File not found: {path}")
+                return SimpleNamespace(content=content, error=None)
+
+            def write_file(self, path, content):
+                written[path] = content
+                return SimpleNamespace(error=None)
+
+        result = apply_v4a_operations(ops, FakeFileOps())
+        assert result.success is False
+        assert written == {}, f"No files should have been written, got: {list(written.keys())}"
+        assert "validation failed" in result.error.lower()
+
+    def test_all_valid_operations_applied(self):
+        """When all operations are valid, all files are written."""
+        patch = """\
+*** Begin Patch
+*** Update File: a.py
+ def foo():
+-    return 1
+    return 2
+*** Update File: b.py
+ def bar():
+-    pass
+    return True
+*** End Patch"""
+        ops, err = parse_v4a_patch(patch)
+        assert err is None
+
+        written = {}
+
+        class FakeFileOps:
+            def read_file_raw(self, path):
+                files = {
+                    "a.py": "def foo():\n    return 1\n",
+                    "b.py": "def bar():\n    pass\n",
+                }
+                return SimpleNamespace(content=files[path], error=None)
+
+            def write_file(self, path, content):
+                written[path] = content
+                return SimpleNamespace(error=None)
+
+        result = apply_v4a_operations(ops, FakeFileOps())
+        assert result.success is True
+        assert set(written.keys()) == {"a.py", "b.py"}
+
+
+class TestApplyDelete:
+    """Tests for _apply_delete producing a real unified diff."""
+
+    def test_delete_diff_contains_removed_lines(self):
+        """_apply_delete must embed the actual file content in the diff, not a placeholder."""
+        patch = """\
+*** Begin Patch
+*** Delete File: old/stuff.py
+*** End Patch"""
+        ops, err = parse_v4a_patch(patch)
+        assert err is None
+
+        class FakeFileOps:
+            deleted = False
+
+            def read_file_raw(self, path):
+                return SimpleNamespace(
+                    content="def old_func():\n    return 42\n",
+                    error=None,
+                )
+
+            def delete_file(self, path):
+                self.deleted = True
+                return SimpleNamespace(error=None)
+
+        file_ops = FakeFileOps()
+        result = apply_v4a_operations(ops, file_ops)
+
+        assert result.success is True
+        assert file_ops.deleted is True
+        # Diff must contain the actual removed lines, not a bare comment
+        assert "-def old_func():" in result.diff
+        assert "-    return 42" in result.diff
+        assert "/dev/null" in result.diff
+
+    def test_delete_diff_fallback_on_empty_file(self):
+        """An empty file should produce the fallback comment diff."""
+        patch = """\
+*** Begin Patch
+*** Delete File: empty.py
+*** End Patch"""
+        ops, err = parse_v4a_patch(patch)
+        assert err is None
+
+        class FakeFileOps:
+            def read_file_raw(self, path):
+                return SimpleNamespace(content="", error=None)
+
+            def delete_file(self, path):
+                return SimpleNamespace(error=None)
+
+        result = apply_v4a_operations(ops, FakeFileOps())
+        assert result.success is True
+        # unified_diff produces nothing for two empty inputs — fallback comment expected
+        assert "Deleted" in result.diff or result.diff.strip() == ""
+
+
+class TestCountOccurrences:
+    def test_basic(self):
+        from tools.patch_parser import _count_occurrences
+        assert _count_occurrences("aaa", "a") == 3
+        assert _count_occurrences("aaa", "aa") == 2
+        assert _count_occurrences("hello world", "xyz") == 0
+        assert _count_occurrences("", "x") == 0
+
+
+class TestParseErrorSignalling:
+    """Bug 3 regression tests — parse_v4a_patch must signal errors, not swallow them."""
+
+    def test_update_with_no_hunks_returns_error(self):
+        """An UPDATE with no hunk lines is a malformed patch and should error."""
+        patch = """\
+*** Begin Patch
+*** Update File: foo.py
+*** End Patch"""
+        ops, err = parse_v4a_patch(patch)
+        assert err is not None, "Expected a parse error for hunk-less UPDATE"
+        assert ops == []
+
+    def test_move_without_destination_returns_error(self):
+        """A MOVE without '->' syntax should not silently produce a broken operation."""
+        # The move regex requires '->' so this will be treated as an unrecognised
+        # line and the op is never created.  Confirm nothing crashes and ops is empty.
+        patch = """\
+*** Begin Patch
+*** Move File: src/foo.py
+*** End Patch"""
+        ops, err = parse_v4a_patch(patch)
+        # Either parse sees zero ops (fine) or returns an error (also fine).
+        # What is NOT acceptable is ops=[MOVE op with empty new_path] + err=None.
+        if ops:
+            assert err is not None, (
+                "MOVE with missing destination must either produce empty ops or an error"
+            )
+
+    def test_valid_patch_returns_no_error(self):
+        """A well-formed patch must still return err=None."""
+        patch = """\
+*** Begin Patch
+*** Update File: f.py
+ ctx
+-old
+new
+*** End Patch"""
+        ops, err = parse_v4a_patch(patch)
+        assert err is None
+        assert len(ops) == 1
@@ -0,0 +1,274 @@
+"""Tests for zombie process cleanup — verifies processes spawned by tools
+are properly reaped when agent sessions end.
+
+Reproduction for issue #7131: zombie process accumulation on long-running
+gateway deployments.
+"""
+
+import os
+import signal
+import subprocess
+import sys
+import time
+import threading
+
+import pytest
+
+
+def _spawn_sleep(seconds: float = 60) -> subprocess.Popen:
+    """Spawn a portable long-lived Python sleep process (no shell wrapper)."""
+    return subprocess.Popen(
+        [sys.executable, "-c", f"import time; time.sleep({seconds})"],
+    )
+
+
+def _pid_alive(pid: int) -> bool:
+    """Return True if a process with the given PID is still running."""
+    try:
+        os.kill(pid, 0)
+        return True
+    except (ProcessLookupError, PermissionError):
+        return False
+
+
+class TestZombieReproduction:
+    """Demonstrate that subprocesses survive when cleanup is not called."""
+
+    def test_orphaned_processes_survive_without_cleanup(self):
+        """REPRODUCTION: processes spawned directly survive if no one kills
+        them — this models the gap that causes zombie accumulation when
+        the gateway drops agent references without calling close()."""
+        pids = []
+
+        try:
+            for _ in range(3):
+                proc = _spawn_sleep(60)
+                pids.append(proc.pid)
+
+            for pid in pids:
+                assert _pid_alive(pid), f"PID {pid} should be alive after spawn"
+
+            # Simulate "session end" by just dropping the reference
+            del proc  # noqa: F821
+
+            # BUG: processes are still alive after reference is dropped
+            for pid in pids:
+                assert _pid_alive(pid), (
+                    f"PID {pid} died after ref drop — "
+                    f"expected it to survive (demonstrating the bug)"
+                )
+        finally:
+            for pid in pids:
+                try:
+                    os.kill(pid, signal.SIGKILL)
+                except (ProcessLookupError, PermissionError):
+                    pass
+
+    def test_explicit_terminate_reaps_processes(self):
+        """Explicitly terminating+waiting on Popen handles works.
+        This models what ProcessRegistry.kill_process does internally."""
+        procs = []
+
+        try:
+            for _ in range(3):
+                proc = _spawn_sleep(60)
+                procs.append(proc)
+
+            for proc in procs:
+                assert _pid_alive(proc.pid)
+
+            for proc in procs:
+                proc.terminate()
+                proc.wait(timeout=5)
+
+            for proc in procs:
+                assert proc.returncode is not None, (
+                    f"PID {proc.pid} should have exited after terminate+wait"
+                )
+        finally:
+            for proc in procs:
+                try:
+                    proc.kill()
+                    proc.wait(timeout=1)
+                except Exception:
+                    pass
+
+
+class TestAgentCloseMethod:
+    """Verify AIAgent.close() exists, is idempotent, and calls cleanup."""
+
+    def test_close_calls_cleanup_functions(self):
+        """close() should call kill_all, cleanup_vm, cleanup_browser."""
+        from unittest.mock import patch
+
+        with patch("run_agent.AIAgent.__init__", return_value=None):
+            from run_agent import AIAgent
+            agent = AIAgent.__new__(AIAgent)
+            agent.session_id = "test-close-cleanup"
+            agent._active_children = []
+            agent._active_children_lock = threading.Lock()
+            agent.client = None
+
+            with patch("tools.process_registry.process_registry") as mock_registry, \
+                 patch("tools.terminal_tool.cleanup_vm") as mock_cleanup_vm, \
+                 patch("tools.browser_tool.cleanup_browser") as mock_cleanup_browser:
+                agent.close()
+
+                mock_registry.kill_all.assert_called_once_with(
+                    task_id="test-close-cleanup"
+                )
+                mock_cleanup_vm.assert_called_once_with("test-close-cleanup")
+                mock_cleanup_browser.assert_called_once_with("test-close-cleanup")
+
+    def test_close_is_idempotent(self):
+        """close() can be called multiple times without error."""
+        from unittest.mock import patch
+
+        with patch("run_agent.AIAgent.__init__", return_value=None):
+            from run_agent import AIAgent
+            agent = AIAgent.__new__(AIAgent)
+            agent.session_id = "test-close-idempotent"
+            agent._active_children = []
+            agent._active_children_lock = threading.Lock()
+            agent.client = None
+
+            agent.close()
+            agent.close()
+            agent.close()
+
+    def test_close_propagates_to_children(self):
+        """close() should call close() on all active child agents."""
+        from unittest.mock import MagicMock, patch
+
+        with patch("run_agent.AIAgent.__init__", return_value=None):
+            from run_agent import AIAgent
+            agent = AIAgent.__new__(AIAgent)
+            agent.session_id = "test-close-children"
+            agent._active_children_lock = threading.Lock()
+            agent.client = None
+
+            child_1 = MagicMock()
+            child_2 = MagicMock()
+            agent._active_children = [child_1, child_2]
+
+            agent.close()
+
+            child_1.close.assert_called_once()
+            child_2.close.assert_called_once()
+            assert agent._active_children == []
+
+    def test_close_survives_partial_failures(self):
+        """close() continues cleanup even if one step fails."""
+        from unittest.mock import patch
+
+        with patch("run_agent.AIAgent.__init__", return_value=None):
+            from run_agent import AIAgent
+            agent = AIAgent.__new__(AIAgent)
+            agent.session_id = "test-close-partial"
+            agent._active_children = []
+            agent._active_children_lock = threading.Lock()
+            agent.client = None
+
+            with patch(
+                "tools.process_registry.process_registry"
+            ) as mock_reg, patch(
+                "tools.terminal_tool.cleanup_vm"
+            ) as mock_vm, patch(
+                "tools.browser_tool.cleanup_browser"
+            ) as mock_browser:
+                mock_reg.kill_all.side_effect = RuntimeError("boom")
+
+                agent.close()
+
+                mock_vm.assert_called_once()
+                mock_browser.assert_called_once()
+
+
+class TestGatewayCleanupWiring:
+    """Verify gateway lifecycle calls close() on agents."""
+
+    def test_gateway_stop_calls_close(self):
+        """gateway stop() should call close() on all running agents."""
+        import asyncio
+        from unittest.mock import MagicMock, patch
+
+        runner = MagicMock()
+        runner._running = True
+        runner._running_agents = {}
+        runner.adapters = {}
+        runner._background_tasks = set()
+        runner._pending_messages = {}
+        runner._pending_approvals = {}
+        runner._shutdown_event = asyncio.Event()
+        runner._exit_reason = None
+
+        mock_agent_1 = MagicMock()
+        mock_agent_2 = MagicMock()
+        runner._running_agents = {
+            "session-1": mock_agent_1,
+            "session-2": mock_agent_2,
+        }
+
+        from gateway.run import GatewayRunner
+
+        loop = asyncio.new_event_loop()
+        try:
+            with patch("gateway.status.remove_pid_file"), \
+                 patch("gateway.status.write_runtime_status"), \
+                 patch("tools.terminal_tool.cleanup_all_environments"), \
+                 patch("tools.browser_tool.cleanup_all_browsers"):
+                loop.run_until_complete(GatewayRunner.stop(runner))
+        finally:
+            loop.close()
+
+        mock_agent_1.close.assert_called()
+        mock_agent_2.close.assert_called()
+
+    def test_evict_does_not_call_close(self):
+        """_evict_cached_agent() should NOT call close() — it's also used
+        for non-destructive refreshes (model switch, branch, fallback)."""
+        import threading
+        from unittest.mock import MagicMock
+
+        from gateway.run import GatewayRunner
+
+        runner = object.__new__(GatewayRunner)
+        runner._agent_cache_lock = threading.Lock()
+
+        mock_agent = MagicMock()
+        runner._agent_cache = {"session-key": (mock_agent, 12345)}
+
+        GatewayRunner._evict_cached_agent(runner, "session-key")
+
+        mock_agent.close.assert_not_called()
+        assert "session-key" not in runner._agent_cache
+
+
+class TestDelegationCleanup:
+    """Verify subagent delegation cleans up child agents."""
+
+    def test_run_single_child_calls_close(self):
+        """_run_single_child finally block should call close() on child."""
+        from unittest.mock import MagicMock
+        from tools.delegate_tool import _run_single_child
+
+        parent = MagicMock()
+        parent._active_children = []
+        parent._active_children_lock = threading.Lock()
+
+        child = MagicMock()
+        child._delegate_saved_tool_names = ["tool1"]
+        child.run_conversation.side_effect = RuntimeError("test abort")
+
+        parent._active_children.append(child)
+
+        result = _run_single_child(
+            task_index=0,
+            goal="test goal",
+            child=child,
+            parent_agent=parent,
+        )
+
+        child.close.assert_called_once()
+        assert child not in parent._active_children
+        assert result["status"] == "error"
@@ -64,14 +64,15 @@ def _scan_cron_prompt(prompt: str) -> str:


 def _origin_from_env() -> Optional[Dict[str, str]]:
-    origin_platform = os.getenv("HERMES_SESSION_PLATFORM")
-    origin_chat_id = os.getenv("HERMES_SESSION_CHAT_ID")
+    from gateway.session_context import get_session_env
+    origin_platform = get_session_env("HERMES_SESSION_PLATFORM")
+    origin_chat_id = get_session_env("HERMES_SESSION_CHAT_ID")
    if origin_platform and origin_chat_id:
        return {
            "platform": origin_platform,
            "chat_id": origin_chat_id,
-            "chat_name": os.getenv("HERMES_SESSION_CHAT_NAME"),
-            "thread_id": os.getenv("HERMES_SESSION_THREAD_ID"),
+            "chat_name": get_session_env("HERMES_SESSION_CHAT_NAME") or None,
+            "thread_id": get_session_env("HERMES_SESSION_THREAD_ID") or None,
        }
    return None

@@ -312,6 +312,25 @@ def _build_child_agent(
    effective_acp_command = override_acp_command or getattr(parent_agent, "acp_command", None)
    effective_acp_args = list(override_acp_args if override_acp_args is not None else (getattr(parent_agent, "acp_args", []) or []))

+    # Resolve reasoning config: delegation override > parent inherit
+    parent_reasoning = getattr(parent_agent, "reasoning_config", None)
+    child_reasoning = parent_reasoning
+    try:
+        delegation_cfg = _load_config()
+        delegation_effort = str(delegation_cfg.get("reasoning_effort") or "").strip()
+        if delegation_effort:
+            from hermes_constants import parse_reasoning_effort
+            parsed = parse_reasoning_effort(delegation_effort)
+            if parsed is not None:
+                child_reasoning = parsed
+            else:
+                logger.warning(
+                    "Unknown delegation.reasoning_effort '%s', inheriting parent level",
+                    delegation_effort,
+                )
+    except Exception as exc:
+        logger.debug("Could not load delegation reasoning_effort: %s", exc)
+
    child = AIAgent(
        base_url=effective_base_url,
        api_key=effective_api_key,
@@ -322,7 +341,7 @@ def _build_child_agent(
        acp_args=effective_acp_args,
        max_iterations=max_iterations,
        max_tokens=getattr(parent_agent, "max_tokens", None),
-        reasoning_config=getattr(parent_agent, "reasoning_config", None),
+        reasoning_config=child_reasoning,
        prefill_messages=getattr(parent_agent, "prefill_messages", None),
        enabled_toolsets=child_toolsets,
        quiet_mode=True,
@@ -578,6 +597,15 @@ def _run_single_child(
            except (ValueError, UnboundLocalError) as e:
                logger.debug("Could not remove child from active_children: %s", e)

+        # Close tool resources (terminal sandboxes, browser daemons,
+        # background processes, httpx clients) so subagent subprocesses
+        # don't outlive the delegation.
+        try:
+            if hasattr(child, 'close'):
+                child.close()
+        except Exception:
+            logger.debug("Failed to close child agent after delegation")
+
 def delegate_task(
    goal: Optional[str] = None,
    context: Optional[str] = None,
@@ -9,7 +9,6 @@ import logging
 import math
 import shlex
 import threading
-import warnings
 from pathlib import Path

 from tools.environments.base import (
@@ -63,10 +62,9 @@ class DaytonaEnvironment(BaseEnvironment):
        memory_gib = max(1, math.ceil(memory / 1024))
        disk_gib = max(1, math.ceil(disk / 1024))
        if disk_gib > 10:
-            warnings.warn(
-                f"Daytona: requested disk ({disk_gib}GB) exceeds platform limit (10GB). "
-                f"Capping to 10GB.",
-                stacklevel=2,
+            logger.warning(
+                "Daytona: requested disk (%dGB) exceeds platform limit (10GB). "
+                "Capping to 10GB.", disk_gib,
            )
            disk_gib = 10
        resources = Resources(cpu=cpu, memory=memory_gib, disk=disk_gib)
@@ -129,6 +127,7 @@ class DaytonaEnvironment(BaseEnvironment):
            get_files_fn=lambda: iter_sync_files(f"{self._remote_home}/.hermes"),
            upload_fn=self._daytona_upload,
            delete_fn=self._daytona_delete,
+            bulk_upload_fn=self._daytona_bulk_upload,
        )
        self._sync_manager.sync(force=True)
        self.init_session()
@@ -139,6 +138,30 @@ class DaytonaEnvironment(BaseEnvironment):
        self._sandbox.process.exec(f"mkdir -p {parent}")
        self._sandbox.fs.upload_file(host_path, remote_path)

+    def _daytona_bulk_upload(self, files: list[tuple[str, str]]) -> None:
+        """Upload many files in a single HTTP call via Daytona SDK.
+
+        Uses ``sandbox.fs.upload_files()`` which batches all files into one
+        multipart POST, avoiding per-file TLS/HTTP overhead (~580 files
+        goes from ~5 min to <2 s).
+        """
+        from daytona.common.filesystem import FileUpload
+
+        if not files:
+            return
+
+        # Pre-create all unique parent directories in one shell call
+        parents = sorted({str(Path(remote).parent) for _, remote in files})
+        if parents:
+            mkdir_cmd = "mkdir -p " + " ".join(shlex.quote(p) for p in parents)
+            self._sandbox.process.exec(mkdir_cmd)
+
+        uploads = [
+            FileUpload(source=host_path, destination=remote_path)
+            for host_path, remote_path in files
+        ]
+        self._sandbox.fs.upload_files(uploads)
+
    def _daytona_delete(self, remote_paths: list[str]) -> None:
        """Batch-delete remote files via SDK exec."""
        self._sandbox.process.exec(quoted_rm_command(remote_paths))
@@ -409,11 +409,12 @@ class DockerEnvironment(BaseEnvironment):
        container_name = f"hermes-{uuid.uuid4().hex[:8]}"
        run_cmd = [
            self._docker_exe, "run", "-d",
+            "--init",           # tini/catatonit as PID 1 — reaps zombie children
            "--name", container_name,
            "-w", cwd,
            *all_run_args,
            image,
-            "sleep", "2h",
+            "sleep", "infinity",  # no fixed lifetime — idle reaper handles cleanup
        ]
        logger.debug(f"Starting container: {' '.join(run_cmd)}")
        result = subprocess.run(
@@ -21,6 +21,7 @@ _FORCE_SYNC_ENV = "HERMES_FORCE_FILE_SYNC"

 # Transport callbacks provided by each backend
 UploadFn = Callable[[str, str], None]  # (host_path, remote_path) -> raises on failure
+BulkUploadFn = Callable[[list[tuple[str, str]]], None]  # [(host_path, remote_path), ...] -> raises on failure
 DeleteFn = Callable[[list[str]], None]  # (remote_paths) -> raises on failure
 GetFilesFn = Callable[[], list[tuple[str, str]]]  # () -> [(host_path, remote_path), ...]

@@ -76,9 +77,11 @@ class FileSyncManager:
        upload_fn: UploadFn,
        delete_fn: DeleteFn,
        sync_interval: float = _SYNC_INTERVAL_SECONDS,
+        bulk_upload_fn: BulkUploadFn | None = None,
    ):
        self._get_files_fn = get_files_fn
        self._upload_fn = upload_fn
+        self._bulk_upload_fn = bulk_upload_fn
        self._delete_fn = delete_fn
        self._synced_files: dict[str, tuple[float, int]] = {}  # remote_path -> (mtime, size)
        self._last_sync_time: float = 0.0  # monotonic; 0 ensures first sync runs
@@ -129,9 +132,13 @@ class FileSyncManager:
            logger.debug("file_sync: deleting %d stale remote file(s)", len(to_delete))

        try:
-            for host_path, remote_path in to_upload:
-                self._upload_fn(host_path, remote_path)
-                logger.debug("file_sync: uploaded %s -> %s", host_path, remote_path)
+            if to_upload and self._bulk_upload_fn is not None:
+                self._bulk_upload_fn(to_upload)
+                logger.debug("file_sync: bulk-uploaded %d file(s)", len(to_upload))
+            else:
+                for host_path, remote_path in to_upload:
+                    self._upload_fn(host_path, remote_path)
+                    logger.debug("file_sync: uploaded %s -> %s", host_path, remote_path)

            if to_delete:
                self._delete_fn(to_delete)
@@ -252,23 +252,43 @@ class FileOperations(ABC):
    def read_file(self, path: str, offset: int = 1, limit: int = 500) -> ReadResult:
        """Read a file with pagination support."""
        ...
-    
+
+    @abstractmethod
+    def read_file_raw(self, path: str) -> ReadResult:
+        """Read the complete file content as a plain string.
+
+        No pagination, no line-number prefixes, no per-line truncation.
+        Returns ReadResult with .content = full file text, .error set on
+        failure. Always reads to EOF regardless of file size.
+        """
+        ...
+
    @abstractmethod
    def write_file(self, path: str, content: str) -> WriteResult:
        """Write content to a file, creating directories as needed."""
        ...
-    
+
    @abstractmethod
-    def patch_replace(self, path: str, old_string: str, new_string: str, 
+    def patch_replace(self, path: str, old_string: str, new_string: str,
                      replace_all: bool = False) -> PatchResult:
        """Replace text in a file using fuzzy matching."""
        ...
-    
+
    @abstractmethod
    def patch_v4a(self, patch_content: str) -> PatchResult:
        """Apply a V4A format patch."""
        ...
-    
+
+    @abstractmethod
+    def delete_file(self, path: str) -> WriteResult:
+        """Delete a file. Returns WriteResult with .error set on failure."""
+        ...
+
+    @abstractmethod
+    def move_file(self, src: str, dst: str) -> WriteResult:
+        """Move/rename a file from src to dst. Returns WriteResult with .error set on failure."""
+        ...
+
    @abstractmethod
    def search(self, pattern: str, path: str = ".", target: str = "content",
               file_glob: Optional[str] = None, limit: int = 50, offset: int = 0,
@@ -366,9 +386,7 @@ class ShellFileOperations(FileOperations):
        
        # Content analysis: >30% non-printable chars = binary
        if content_sample:
-            if not content_sample:
-                return False
-            non_printable = sum(1 for c in content_sample[:1000] 
+            non_printable = sum(1 for c in content_sample[:1000]
                               if ord(c) < 32 and c not in '\n\r\t')
            return non_printable / min(len(content_sample), 1000) > 0.30
        
@@ -561,10 +579,62 @@ class ShellFileOperations(FileOperations):
            similar_files=similar[:5]  # Limit to 5 suggestions
        )
    
+    def read_file_raw(self, path: str) -> ReadResult:
+        """Read the complete file content as a plain string.
+
+        No pagination, no line-number prefixes, no per-line truncation.
+        Uses cat so the full file is returned regardless of size.
+        """
+        path = self._expand_path(path)
+        stat_cmd = f"wc -c < {self._escape_shell_arg(path)} 2>/dev/null"
+        stat_result = self._exec(stat_cmd)
+        if stat_result.exit_code != 0:
+            return self._suggest_similar_files(path)
+        try:
+            file_size = int(stat_result.stdout.strip())
+        except ValueError:
+            file_size = 0
+        if self._is_image(path):
+            return ReadResult(is_image=True, is_binary=True, file_size=file_size)
+        sample_result = self._exec(f"head -c 1000 {self._escape_shell_arg(path)} 2>/dev/null")
+        if self._is_likely_binary(path, sample_result.stdout):
+            return ReadResult(
+                is_binary=True, file_size=file_size,
+                error="Binary file — cannot display as text."
+            )
+        cat_result = self._exec(f"cat {self._escape_shell_arg(path)}")
+        if cat_result.exit_code != 0:
+            return ReadResult(error=f"Failed to read file: {cat_result.stdout}")
+        return ReadResult(content=cat_result.stdout, file_size=file_size)
+
+    def delete_file(self, path: str) -> WriteResult:
+        """Delete a file via rm."""
+        path = self._expand_path(path)
+        if _is_write_denied(path):
+            return WriteResult(error=f"Delete denied: {path} is a protected path")
+        result = self._exec(f"rm -f {self._escape_shell_arg(path)}")
+        if result.exit_code != 0:
+            return WriteResult(error=f"Failed to delete {path}: {result.stdout}")
+        return WriteResult()
+
+    def move_file(self, src: str, dst: str) -> WriteResult:
+        """Move a file via mv."""
+        src = self._expand_path(src)
+        dst = self._expand_path(dst)
+        for p in (src, dst):
+            if _is_write_denied(p):
+                return WriteResult(error=f"Move denied: {p} is a protected path")
+        result = self._exec(
+            f"mv {self._escape_shell_arg(src)} {self._escape_shell_arg(dst)}"
+        )
+        if result.exit_code != 0:
+            return WriteResult(error=f"Failed to move {src} -> {dst}: {result.stdout}")
+        return WriteResult()
+
    # =========================================================================
    # WRITE Implementation
    # =========================================================================
-    
+
    def write_file(self, path: str, content: str) -> WriteResult:
        """
        Write content to a file, creating parent directories as needed.
@@ -656,7 +726,7 @@ class ShellFileOperations(FileOperations):
        # Import and use fuzzy matching
        from tools.fuzzy_match import fuzzy_find_and_replace
        
-        new_content, match_count, error = fuzzy_find_and_replace(
+        new_content, match_count, _strategy, error = fuzzy_find_and_replace(
            content, old_string, new_string, replace_all
        )
        
@@ -738,7 +808,7 @@ class ShellFileOperations(FileOperations):
            return LintResult(skipped=True, message=f"{base_cmd} not available")
        
        # Run linter
-        cmd = linter_cmd.format(file=self._escape_shell_arg(path))
+        cmd = linter_cmd.replace("{file}", self._escape_shell_arg(path))
        result = self._exec(cmd, timeout=30)
        
        return LintResult(
@@ -21,7 +21,7 @@ Multi-occurrence matching is handled via the replace_all flag.
 Usage:
    from tools.fuzzy_match import fuzzy_find_and_replace
    
-    new_content, match_count, error = fuzzy_find_and_replace(
+    new_content, match_count, strategy, error = fuzzy_find_and_replace(
        content="def foo():\\n    pass",
        old_string="def foo():",
        new_string="def bar():",
@@ -48,27 +48,27 @@ def _unicode_normalize(text: str) -> str:


 def fuzzy_find_and_replace(content: str, old_string: str, new_string: str,
-                            replace_all: bool = False) -> Tuple[str, int, Optional[str]]:
+                            replace_all: bool = False) -> Tuple[str, int, Optional[str], Optional[str]]:
    """
    Find and replace text using a chain of increasingly fuzzy matching strategies.
-    
+
    Args:
        content: The file content to search in
        old_string: The text to find
        new_string: The replacement text
        replace_all: If True, replace all occurrences; if False, require uniqueness
-    
+
    Returns:
-        Tuple of (new_content, match_count, error_message)
-        - If successful: (modified_content, number_of_replacements, None)
-        - If failed: (original_content, 0, error_description)
+        Tuple of (new_content, match_count, strategy_name, error_message)
+        - If successful: (modified_content, number_of_replacements, strategy_used, None)
+        - If failed: (original_content, 0, None, error_description)
    """
    if not old_string:
-        return content, 0, "old_string cannot be empty"
-    
+        return content, 0, None, "old_string cannot be empty"
+
    if old_string == new_string:
-        return content, 0, "old_string and new_string are identical"
-    
+        return content, 0, None, "old_string and new_string are identical"
+
    # Try each matching strategy in order
    strategies: List[Tuple[str, Callable]] = [
        ("exact", _strategy_exact),
@@ -77,27 +77,28 @@ def fuzzy_find_and_replace(content: str, old_string: str, new_string: str,
        ("indentation_flexible", _strategy_indentation_flexible),
        ("escape_normalized", _strategy_escape_normalized),
        ("trimmed_boundary", _strategy_trimmed_boundary),
+        ("unicode_normalized", _strategy_unicode_normalized),
        ("block_anchor", _strategy_block_anchor),
        ("context_aware", _strategy_context_aware),
    ]
-    
-    for _strategy_name, strategy_fn in strategies:
+
+    for strategy_name, strategy_fn in strategies:
        matches = strategy_fn(content, old_string)
-        
+
        if matches:
            # Found matches with this strategy
            if len(matches) > 1 and not replace_all:
-                return content, 0, (
+                return content, 0, None, (
                    f"Found {len(matches)} matches for old_string. "
                    f"Provide more context to make it unique, or use replace_all=True."
                )
-            
+
            # Perform replacement
            new_content = _apply_replacements(content, matches, new_string)
-            return new_content, len(matches), None
-    
+            return new_content, len(matches), strategy_name, None
+
    # No strategy found a match
-    return content, 0, "Could not find a match for old_string in the file"
+    return content, 0, None, "Could not find a match for old_string in the file"


 def _apply_replacements(content: str, matches: List[Tuple[int, int]], new_string: str) -> str:
@@ -258,9 +259,90 @@ def _strategy_trimmed_boundary(content: str, pattern: str) -> List[Tuple[int, in
    return matches


+def _build_orig_to_norm_map(original: str) -> List[int]:
+    """Build a list mapping each original character index to its normalized index.
+
+    Because UNICODE_MAP replacements may expand characters (e.g. em-dash → '--',
+    ellipsis → '...'), the normalised string can be longer than the original.
+    This map lets us convert positions in the normalised string back to the
+    corresponding positions in the original string.
+
+    Returns a list of length ``len(original) + 1``; entry ``i`` is the
+    normalised index that character ``i`` maps to.
+    """
+    result: List[int] = []
+    norm_pos = 0
+    for char in original:
+        result.append(norm_pos)
+        repl = UNICODE_MAP.get(char)
+        norm_pos += len(repl) if repl is not None else 1
+    result.append(norm_pos)  # sentinel: one past the last character
+    return result
+
+
+def _map_positions_norm_to_orig(
+    orig_to_norm: List[int],
+    norm_matches: List[Tuple[int, int]],
+) -> List[Tuple[int, int]]:
+    """Convert (start, end) positions in the normalised string to original positions."""
+    # Invert the map: norm_pos -> first original position with that norm_pos
+    norm_to_orig_start: dict[int, int] = {}
+    for orig_pos, norm_pos in enumerate(orig_to_norm[:-1]):
+        if norm_pos not in norm_to_orig_start:
+            norm_to_orig_start[norm_pos] = orig_pos
+
+    results: List[Tuple[int, int]] = []
+    orig_len = len(orig_to_norm) - 1  # number of original characters
+
+    for norm_start, norm_end in norm_matches:
+        if norm_start not in norm_to_orig_start:
+            continue
+        orig_start = norm_to_orig_start[norm_start]
+
+        # Walk forward until orig_to_norm[orig_end] >= norm_end
+        orig_end = orig_start
+        while orig_end < orig_len and orig_to_norm[orig_end] < norm_end:
+            orig_end += 1
+
+        results.append((orig_start, orig_end))
+
+    return results
+
+
+def _strategy_unicode_normalized(content: str, pattern: str) -> List[Tuple[int, int]]:
+    """Strategy 7: Unicode normalisation.
+
+    Normalises smart quotes, em/en-dashes, ellipsis, and non-breaking spaces
+    to their ASCII equivalents in both *content* and *pattern*, then runs
+    exact and line_trimmed matching on the normalised copies.
+
+    Positions are mapped back to the *original* string via
+    ``_build_orig_to_norm_map`` — necessary because some UNICODE_MAP
+    replacements expand a single character into multiple ASCII characters,
+    making a naïve position copy incorrect.
+    """
+    # Normalize both sides. Either the content or the pattern (or both) may
+    # carry unicode variants — e.g. content has an em-dash that should match
+    # the LLM's ASCII '--', or vice-versa.  Skip only when neither changes.
+    norm_pattern = _unicode_normalize(pattern)
+    norm_content = _unicode_normalize(content)
+    if norm_content == content and norm_pattern == pattern:
+        return []
+
+    norm_matches = _strategy_exact(norm_content, norm_pattern)
+    if not norm_matches:
+        norm_matches = _strategy_line_trimmed(norm_content, norm_pattern)
+
+    if not norm_matches:
+        return []
+
+    orig_to_norm = _build_orig_to_norm_map(content)
+    return _map_positions_norm_to_orig(orig_to_norm, norm_matches)
+
+
 def _strategy_block_anchor(content: str, pattern: str) -> List[Tuple[int, int]]:
    """
-    Strategy 7: Match by anchoring on first and last lines.
+    Strategy 8: Match by anchoring on first and last lines.
    Adjusted with permissive thresholds and unicode normalization.
    """
    # Normalize both strings for comparison while keeping original content for offset calculation
@@ -290,8 +372,10 @@ def _strategy_block_anchor(content: str, pattern: str) -> List[Tuple[int, int]]:
    matches = []
    candidate_count = len(potential_matches)
    
-    # Thresholding logic: 0.10 for unique matches (max flexibility), 0.30 for multiple candidates
-    threshold = 0.10 if candidate_count == 1 else 0.30
+    # Thresholding logic: 0.50 for unique matches, 0.70 for multiple candidates.
+    # Previous values (0.10 / 0.30) were dangerously loose — a 10% middle-section
+    # similarity could match completely unrelated blocks.
+    threshold = 0.50 if candidate_count == 1 else 0.70

    for i in potential_matches:
        if pattern_line_count <= 2:
@@ -314,7 +398,7 @@ def _strategy_block_anchor(content: str, pattern: str) -> List[Tuple[int, int]]:

 def _strategy_context_aware(content: str, pattern: str) -> List[Tuple[int, int]]:
    """
-    Strategy 8: Line-by-line similarity with 50% threshold.
+    Strategy 9: Line-by-line similarity with 50% threshold.
    
    Finds blocks where at least 50% of lines have high similarity.
    """
@@ -2160,6 +2160,7 @@ def _kill_orphaned_mcp_children() -> None:
    Only kills PIDs tracked in ``_stdio_pids`` — never arbitrary children.
    """
    import signal as _signal
+    kill_signal = getattr(_signal, "SIGKILL", _signal.SIGTERM)

    with _lock:
        pids = list(_stdio_pids)
@@ -2167,7 +2168,7 @@ def _kill_orphaned_mcp_children() -> None:

    for pid in pids:
        try:
-            os.kill(pid, _signal.SIGKILL)
+            os.kill(pid, kill_signal)
            logger.debug("Force-killed orphaned MCP stdio process %d", pid)
        except (ProcessLookupError, PermissionError, OSError):
            pass  # Already exited or inaccessible
@@ -28,6 +28,7 @@ Usage:
        result = apply_v4a_operations(operations, file_ops)
 """

+import difflib
 import re
 from dataclasses import dataclass, field
 from typing import List, Optional, Tuple, Any
@@ -202,31 +203,162 @@ def parse_v4a_patch(patch_content: str) -> Tuple[List[PatchOperation], Optional[
        if current_hunk and current_hunk.lines:
            current_op.hunks.append(current_hunk)
        operations.append(current_op)
-    
+
+    # Validate the parsed result
+    if not operations:
+        # Empty patch is not an error — callers get [] and can decide
+        return operations, None
+
+    parse_errors: List[str] = []
+    for op in operations:
+        if not op.file_path:
+            parse_errors.append("Operation with empty file path")
+        if op.operation == OperationType.UPDATE and not op.hunks:
+            parse_errors.append(f"UPDATE {op.file_path!r}: no hunks found")
+        if op.operation == OperationType.MOVE and not op.new_path:
+            parse_errors.append(f"MOVE {op.file_path!r}: missing destination path (expected 'src -> dst')")
+
+    if parse_errors:
+        return [], "Parse error: " + "; ".join(parse_errors)
+
    return operations, None


-def apply_v4a_operations(operations: List[PatchOperation], 
-                          file_ops: Any) -> 'PatchResult':
+def _count_occurrences(text: str, pattern: str) -> int:
+    """Count non-overlapping occurrences of *pattern* in *text*."""
+    count = 0
+    start = 0
+    while True:
+        pos = text.find(pattern, start)
+        if pos == -1:
+            break
+        count += 1
+        start = pos + 1
+    return count
+
+
+def _validate_operations(
+    operations: List[PatchOperation],
+    file_ops: Any,
+) -> List[str]:
+    """Validate all operations without writing any files.
+
+    Returns a list of error strings; an empty list means all operations
+    are valid and the apply phase can proceed safely.
+
+    For UPDATE operations, hunks are simulated in order so that later
+    hunks validate against post-earlier-hunk content (matching apply order).
    """
-    Apply V4A patch operations using a file operations interface.
-    
+    # Deferred import: breaks the patch_parser ↔ fuzzy_match circular dependency
+    from tools.fuzzy_match import fuzzy_find_and_replace
+
+    errors: List[str] = []
+
+    for op in operations:
+        if op.operation == OperationType.UPDATE:
+            read_result = file_ops.read_file_raw(op.file_path)
+            if read_result.error:
+                errors.append(f"{op.file_path}: {read_result.error}")
+                continue
+
+            simulated = read_result.content
+            for hunk in op.hunks:
+                search_lines = [l.content for l in hunk.lines if l.prefix in (' ', '-')]
+                if not search_lines:
+                    # Addition-only hunk: validate context hint uniqueness
+                    if hunk.context_hint:
+                        occurrences = _count_occurrences(simulated, hunk.context_hint)
+                        if occurrences == 0:
+                            errors.append(
+                                f"{op.file_path}: addition-only hunk context hint "
+                                f"'{hunk.context_hint}' not found"
+                            )
+                        elif occurrences > 1:
+                            errors.append(
+                                f"{op.file_path}: addition-only hunk context hint "
+                                f"'{hunk.context_hint}' is ambiguous "
+                                f"({occurrences} occurrences)"
+                            )
+                    continue
+
+                search_pattern = '\n'.join(search_lines)
+                replace_lines = [l.content for l in hunk.lines if l.prefix in (' ', '+')]
+                replacement = '\n'.join(replace_lines)
+
+                new_simulated, count, _strategy, match_error = fuzzy_find_and_replace(
+                    simulated, search_pattern, replacement, replace_all=False
+                )
+                if count == 0:
+                    label = f"'{hunk.context_hint}'" if hunk.context_hint else "(no hint)"
+                    errors.append(
+                        f"{op.file_path}: hunk {label} not found"
+                        + (f" — {match_error}" if match_error else "")
+                    )
+                else:
+                    # Advance simulation so subsequent hunks validate correctly.
+                    # Reuse the result from the call above — no second fuzzy run.
+                    simulated = new_simulated
+
+        elif op.operation == OperationType.DELETE:
+            read_result = file_ops.read_file_raw(op.file_path)
+            if read_result.error:
+                errors.append(f"{op.file_path}: file not found for deletion")
+
+        elif op.operation == OperationType.MOVE:
+            if not op.new_path:
+                errors.append(f"{op.file_path}: MOVE operation missing destination path")
+                continue
+            src_result = file_ops.read_file_raw(op.file_path)
+            if src_result.error:
+                errors.append(f"{op.file_path}: source file not found for move")
+            dst_result = file_ops.read_file_raw(op.new_path)
+            if not dst_result.error:
+                errors.append(
+                    f"{op.new_path}: destination already exists — move would overwrite"
+                )
+
+        # ADD: parent directory creation handled by write_file; no pre-check needed.
+
+    return errors
+
+
+def apply_v4a_operations(operations: List[PatchOperation],
+                          file_ops: Any) -> 'PatchResult':
+    """Apply V4A patch operations using a file operations interface.
+
+    Uses a two-phase validate-then-apply approach:
+    - Phase 1: validate all operations against current file contents without
+      writing anything. If any validation error is found, return immediately
+      with no filesystem changes.
+    - Phase 2: apply all operations. A failure here (e.g. a race between
+      validation and apply) is reported with a note to run ``git diff``.
+
    Args:
        operations: List of PatchOperation from parse_v4a_patch
-        file_ops: Object with read_file, write_file methods
-    
+        file_ops: Object with read_file_raw, write_file methods
+
    Returns:
        PatchResult with results of all operations
    """
    # Import here to avoid circular imports
    from tools.file_operations import PatchResult
-    
+
+    # ---- Phase 1: validate ----
+    validation_errors = _validate_operations(operations, file_ops)
+    if validation_errors:
+        return PatchResult(
+            success=False,
+            error="Patch validation failed (no files were modified):\n"
+                  + "\n".join(f"  • {e}" for e in validation_errors),
+        )
+
+    # ---- Phase 2: apply ----
    files_modified = []
    files_created = []
    files_deleted = []
    all_diffs = []
    errors = []
-    
+
    for op in operations:
        try:
            if op.operation == OperationType.ADD:
@@ -236,7 +368,7 @@ def apply_v4a_operations(operations: List[PatchOperation],
                    all_diffs.append(result[1])
                else:
                    errors.append(f"Failed to add {op.file_path}: {result[1]}")
-                    
+
            elif op.operation == OperationType.DELETE:
                result = _apply_delete(op, file_ops)
                if result[0]:
@@ -244,7 +376,7 @@ def apply_v4a_operations(operations: List[PatchOperation],
                    all_diffs.append(result[1])
                else:
                    errors.append(f"Failed to delete {op.file_path}: {result[1]}")
-                    
+
            elif op.operation == OperationType.MOVE:
                result = _apply_move(op, file_ops)
                if result[0]:
@@ -252,7 +384,7 @@ def apply_v4a_operations(operations: List[PatchOperation],
                    all_diffs.append(result[1])
                else:
                    errors.append(f"Failed to move {op.file_path}: {result[1]}")
-                    
+
            elif op.operation == OperationType.UPDATE:
                result = _apply_update(op, file_ops)
                if result[0]:
@@ -260,19 +392,19 @@ def apply_v4a_operations(operations: List[PatchOperation],
                    all_diffs.append(result[1])
                else:
                    errors.append(f"Failed to update {op.file_path}: {result[1]}")
-                    
+
        except Exception as e:
            errors.append(f"Error processing {op.file_path}: {str(e)}")
-    
+
    # Run lint on all modified/created files
    lint_results = {}
    for f in files_modified + files_created:
        if hasattr(file_ops, '_check_lint'):
            lint_result = file_ops._check_lint(f)
            lint_results[f] = lint_result.to_dict()
-    
+
    combined_diff = '\n'.join(all_diffs)
-    
+
    if errors:
        return PatchResult(
            success=False,
@@ -281,16 +413,17 @@ def apply_v4a_operations(operations: List[PatchOperation],
            files_created=files_created,
            files_deleted=files_deleted,
            lint=lint_results if lint_results else None,
-            error='; '.join(errors)
+            error="Apply phase failed (state may be inconsistent — run `git diff` to assess):\n"
+                  + "\n".join(f"  • {e}" for e in errors),
        )
-    
+
    return PatchResult(
        success=True,
        diff=combined_diff,
        files_modified=files_modified,
        files_created=files_created,
        files_deleted=files_deleted,
-        lint=lint_results if lint_results else None
+        lint=lint_results if lint_results else None,
    )


@@ -317,68 +450,56 @@ def _apply_add(op: PatchOperation, file_ops: Any) -> Tuple[bool, str]:

 def _apply_delete(op: PatchOperation, file_ops: Any) -> Tuple[bool, str]:
    """Apply a delete file operation."""
-    # Read file first for diff
-    read_result = file_ops.read_file(op.file_path)
-    
-    if read_result.error and "not found" in read_result.error.lower():
-        # File doesn't exist, nothing to delete
-        return True, f"# {op.file_path} already deleted or doesn't exist"
-    
-    # Delete directly via shell command using the underlying environment
-    rm_result = file_ops._exec(f"rm -f {file_ops._escape_shell_arg(op.file_path)}")
-    
-    if rm_result.exit_code != 0:
-        return False, rm_result.stdout
-    
-    diff = f"--- a/{op.file_path}\n+++ /dev/null\n# File deleted"
-    return True, diff
+    # Read before deleting so we can produce a real unified diff.
+    # Validation already confirmed existence; this guards against races.
+    read_result = file_ops.read_file_raw(op.file_path)
+    if read_result.error:
+        return False, f"Cannot delete {op.file_path}: file not found"
+
+    result = file_ops.delete_file(op.file_path)
+    if result.error:
+        return False, result.error
+
+    removed_lines = read_result.content.splitlines(keepends=True)
+    diff = ''.join(difflib.unified_diff(
+        removed_lines, [],
+        fromfile=f"a/{op.file_path}",
+        tofile="/dev/null",
+    ))
+    return True, diff or f"# Deleted: {op.file_path}"


 def _apply_move(op: PatchOperation, file_ops: Any) -> Tuple[bool, str]:
    """Apply a move file operation."""
-    # Use shell mv command
-    mv_result = file_ops._exec(
-        f"mv {file_ops._escape_shell_arg(op.file_path)} {file_ops._escape_shell_arg(op.new_path)}"
-    )
-    
-    if mv_result.exit_code != 0:
-        return False, mv_result.stdout
-    
+    result = file_ops.move_file(op.file_path, op.new_path)
+    if result.error:
+        return False, result.error
+
    diff = f"# Moved: {op.file_path} -> {op.new_path}"
    return True, diff


 def _apply_update(op: PatchOperation, file_ops: Any) -> Tuple[bool, str]:
    """Apply an update file operation."""
-    # Read current content
-    read_result = file_ops.read_file(op.file_path, limit=10000)
-    
+    # Deferred import: breaks the patch_parser ↔ fuzzy_match circular dependency
+    from tools.fuzzy_match import fuzzy_find_and_replace
+
+    # Read current content — raw so no line-number prefixes or per-line truncation
+    read_result = file_ops.read_file_raw(op.file_path)
+
    if read_result.error:
        return False, f"Cannot read file: {read_result.error}"
-    
-    # Parse content (remove line numbers)
-    current_lines = []
-    for line in read_result.content.split('\n'):
-        if re.match(r'^\s*\d+\|', line):
-            # Line format: "    123|content"
-            parts = line.split('|', 1)
-            if len(parts) == 2:
-                current_lines.append(parts[1])
-            else:
-                current_lines.append(line)
-        else:
-            current_lines.append(line)
-    
-    current_content = '\n'.join(current_lines)
-    
+
+    current_content = read_result.content
+
    # Apply each hunk
    new_content = current_content
-    
+
    for hunk in op.hunks:
        # Build search pattern from context and removed lines
        search_lines = []
        replace_lines = []
-        
+
        for line in hunk.lines:
            if line.prefix == ' ':
                search_lines.append(line.content)
@@ -387,17 +508,15 @@ def _apply_update(op: PatchOperation, file_ops: Any) -> Tuple[bool, str]:
                search_lines.append(line.content)
            elif line.prefix == '+':
                replace_lines.append(line.content)
-        
+
        if search_lines:
            search_pattern = '\n'.join(search_lines)
            replacement = '\n'.join(replace_lines)
-            
-            # Use fuzzy matching
-            from tools.fuzzy_match import fuzzy_find_and_replace
-            new_content, count, error = fuzzy_find_and_replace(
+
+            new_content, count, _strategy, error = fuzzy_find_and_replace(
                new_content, search_pattern, replacement, replace_all=False
            )
-            
+
            if error and count == 0:
                # Try with context hint if available
                if hunk.context_hint:
@@ -408,8 +527,8 @@ def _apply_update(op: PatchOperation, file_ops: Any) -> Tuple[bool, str]:
                        window_start = max(0, hint_pos - 500)
                        window_end = min(len(new_content), hint_pos + 2000)
                        window = new_content[window_start:window_end]
-                        
-                        window_new, count, error = fuzzy_find_and_replace(
+
+                        window_new, count, _strategy, error = fuzzy_find_and_replace(
                            window, search_pattern, replacement, replace_all=False
                        )
                        
@@ -424,16 +543,23 @@ def _apply_update(op: PatchOperation, file_ops: Any) -> Tuple[bool, str]:
            # Insert at the location indicated by the context hint, or at end of file.
            insert_text = '\n'.join(replace_lines)
            if hunk.context_hint:
-                hint_pos = new_content.find(hunk.context_hint)
-                if hint_pos != -1:
+                occurrences = _count_occurrences(new_content, hunk.context_hint)
+                if occurrences == 0:
+                    # Hint not found — append at end as a safe fallback
+                    new_content = new_content.rstrip('\n') + '\n' + insert_text + '\n'
+                elif occurrences > 1:
+                    return False, (
+                        f"Addition-only hunk: context hint '{hunk.context_hint}' is ambiguous "
+                        f"({occurrences} occurrences) — provide a more unique hint"
+                    )
+                else:
+                    hint_pos = new_content.find(hunk.context_hint)
                    # Insert after the line containing the context hint
                    eol = new_content.find('\n', hint_pos)
                    if eol != -1:
                        new_content = new_content[:eol + 1] + insert_text + '\n' + new_content[eol + 1:]
                    else:
                        new_content = new_content + '\n' + insert_text
-                else:
-                    new_content = new_content.rstrip('\n') + '\n' + insert_text + '\n'
            else:
                new_content = new_content.rstrip('\n') + '\n' + insert_text + '\n'
    
@@ -443,7 +569,6 @@ def _apply_update(op: PatchOperation, file_ops: Any) -> Tuple[bool, str]:
        return False, write_result.error
    
    # Generate diff
-    import difflib
    diff_lines = difflib.unified_diff(
        current_content.splitlines(keepends=True),
        new_content.splitlines(keepends=True),
@@ -585,7 +585,10 @@ class ProcessRegistry:
        from tools.ansi_strip import strip_ansi
        from tools.terminal_tool import _interrupt_event

-        default_timeout = int(os.getenv("TERMINAL_TIMEOUT", "180"))
+        try:
+            default_timeout = int(os.getenv("TERMINAL_TIMEOUT", "180"))
+        except (ValueError, TypeError):
+            default_timeout = 180
        max_timeout = default_timeout
        requested_timeout = timeout
        timeout_note = None
@@ -212,7 +212,8 @@ def _handle_send(args):
        if isinstance(result, dict) and result.get("success") and mirror_text:
            try:
                from gateway.mirror import mirror_to_session
-                source_label = os.getenv("HERMES_SESSION_PLATFORM", "cli")
+                from gateway.session_context import get_session_env
+                source_label = get_session_env("HERMES_SESSION_PLATFORM", "cli")
                if mirror_to_session(platform_name, chat_id, mirror_text, source_label=source_label, thread_id=thread_id):
                    result["mirrored"] = True
            except Exception:
@@ -689,7 +690,10 @@ async def _send_email(extra, chat_id, message):
    address = extra.get("address") or os.getenv("EMAIL_ADDRESS", "")
    password = os.getenv("EMAIL_PASSWORD", "")
    smtp_host = extra.get("smtp_host") or os.getenv("EMAIL_SMTP_HOST", "")
-    smtp_port = int(os.getenv("EMAIL_SMTP_PORT", "587"))
+    try:
+        smtp_port = int(os.getenv("EMAIL_SMTP_PORT", "587"))
+    except (ValueError, TypeError):
+        smtp_port = 587

    if not all([address, password, smtp_host]):
        return {"error": "Email not configured (EMAIL_ADDRESS, EMAIL_PASSWORD, EMAIL_SMTP_HOST required)"}
@@ -1020,7 +1024,8 @@ async def _send_feishu(pconfig, chat_id, message, media_files=None, thread_id=No

 def _check_send_message():
    """Gate send_message on gateway running (always available on messaging platforms)."""
-    platform = os.getenv("HERMES_SESSION_PLATFORM", "")
+    from gateway.session_context import get_session_env
+    platform = get_session_env("HERMES_SESSION_PLATFORM", "")
    if platform and platform != "local":
        return True
    try:
@@ -426,7 +426,7 @@ def _patch_skill(
    # from exact-match failures on minor formatting mismatches.
    from tools.fuzzy_match import fuzzy_find_and_replace

-    new_content, match_count, match_error = fuzzy_find_and_replace(
+    new_content, match_count, _strategy, match_error = fuzzy_find_and_replace(
        content, old_string, new_string, replace_all
    )
    if match_error:
@@ -1788,7 +1788,10 @@ class ClawHubSource(SkillSource):
                    follow_redirects=True,
                )
                if resp.status_code == 429:
-                    retry_after = int(resp.headers.get("retry-after", "5"))
+                    try:
+                        retry_after = int(resp.headers.get("retry-after", "5"))
+                    except (ValueError, TypeError):
+                        retry_after = 5
                    retry_after = min(retry_after, 15)  # Cap wait time
                    logger.debug(
                        "ClawHub download rate-limited for %s, retrying in %ds (attempt %d/%d)",
@@ -347,7 +347,8 @@ def _capture_required_environment_variables(
 def _is_gateway_surface() -> bool:
    if os.getenv("HERMES_GATEWAY_SESSION"):
        return True
-    return bool(os.getenv("HERMES_SESSION_PLATFORM"))
+    from gateway.session_context import get_session_env
+    return bool(get_session_env("HERMES_SESSION_PLATFORM"))


 def _get_terminal_backend_name() -> str:
@@ -1420,10 +1420,11 @@ def terminal_tool(
                    # In gateway mode, auto-register a fast watcher so the
                    # gateway can detect completion and trigger a new agent
                    # turn.  CLI mode uses the completion_queue directly.
-                    _gw_platform = os.getenv("HERMES_SESSION_PLATFORM", "")
+                    from gateway.session_context import get_session_env as _gse
+                    _gw_platform = _gse("HERMES_SESSION_PLATFORM", "")
                    if _gw_platform and not check_interval:
-                        _gw_chat_id = os.getenv("HERMES_SESSION_CHAT_ID", "")
-                        _gw_thread_id = os.getenv("HERMES_SESSION_THREAD_ID", "")
+                        _gw_chat_id = _gse("HERMES_SESSION_CHAT_ID", "")
+                        _gw_thread_id = _gse("HERMES_SESSION_THREAD_ID", "")
                        proc_session.watcher_platform = _gw_platform
                        proc_session.watcher_chat_id = _gw_chat_id
                        proc_session.watcher_thread_id = _gw_thread_id
@@ -1445,9 +1446,10 @@ def terminal_tool(
                        result_data["check_interval_note"] = (
                            f"Requested {check_interval}s raised to minimum 30s"
                        )
-                    watcher_platform = os.getenv("HERMES_SESSION_PLATFORM", "")
-                    watcher_chat_id = os.getenv("HERMES_SESSION_CHAT_ID", "")
-                    watcher_thread_id = os.getenv("HERMES_SESSION_THREAD_ID", "")
+                    from gateway.session_context import get_session_env as _gse2
+                    watcher_platform = _gse2("HERMES_SESSION_PLATFORM", "")
+                    watcher_chat_id = _gse2("HERMES_SESSION_CHAT_ID", "")
+                    watcher_thread_id = _gse2("HERMES_SESSION_THREAD_ID", "")

                    # Store on session for checkpoint persistence
                    proc_session.watcher_platform = watcher_platform
@@ -480,7 +480,8 @@ def text_to_speech_tool(
    # Telegram voice bubbles require Opus (.ogg); OpenAI and ElevenLabs can
    # produce Opus natively (no ffmpeg needed).  Edge TTS always outputs MP3
    # and needs ffmpeg for conversion.
-    platform = os.getenv("HERMES_SESSION_PLATFORM", "").lower()
+    from gateway.session_context import get_session_env
+    platform = get_session_env("HERMES_SESSION_PLATFORM", "").lower()
    want_opus = (platform == "telegram")

    # Determine output path
@@ -152,19 +152,6 @@ wheels = [
    { url = "https://files.pythonhosted.org/packages/1a/99/84ba7273339d0f3dfa57901b846489d2e5c2cd731470167757f1935fffbd/aiohttp_retry-2.9.1-py3-none-any.whl", hash = "sha256:66d2759d1921838256a05a3f80ad7e724936f083e35be5abb5e16eed6be6dc54", size = 9981, upload-time = "2024-11-06T10:44:52.917Z" },
 ]

-[[package]]
-name = "aiohttp-socks"
-version = "0.11.0"
-source = { registry = "https://pypi.org/simple" }
-dependencies = [
-    { name = "aiohttp" },
-    { name = "python-socks" },
-]
-sdist = { url = "https://files.pythonhosted.org/packages/1f/cc/e5bbd54f76bd56291522251e47267b645dac76327b2657ade9545e30522c/aiohttp_socks-0.11.0.tar.gz", hash = "sha256:0afe51638527c79077e4bd6e57052c87c4824233d6e20bb061c53766421b10f0", size = 11196, upload-time = "2025-12-09T13:35:52.564Z" }
-wheels = [
-    { url = "https://files.pythonhosted.org/packages/bf/7d/4b633d709b8901d59444d2e512b93e72fe62d2b492a040097c3f7ba017bb/aiohttp_socks-0.11.0-py3-none-any.whl", hash = "sha256:9aacce57c931b8fbf8f6d333cf3cafe4c35b971b35430309e167a35a8aab9ec1", size = 10556, upload-time = "2025-12-09T13:35:50.18Z" },
-]
-
 [[package]]
 name = "aiosignal"
 version = "1.4.0"
@@ -253,12 +240,6 @@ wheels = [
    { url = "https://files.pythonhosted.org/packages/38/0e/27be9fdef66e72d64c0cdc3cc2823101b80585f8119b5c112c2e8f5f7dab/anyio-4.12.1-py3-none-any.whl", hash = "sha256:d405828884fc140aa80a3c667b8beed277f1dfedec42ba031bd6ac3db606ab6c", size = 113592, upload-time = "2026-01-06T11:45:19.497Z" },
 ]

-[[package]]
-name = "atomicwrites"
-version = "1.4.1"
-source = { registry = "https://pypi.org/simple" }
-sdist = { url = "https://files.pythonhosted.org/packages/87/c6/53da25344e3e3a9c01095a89f16dbcda021c609ddb42dd6d7c0528236fb2/atomicwrites-1.4.1.tar.gz", hash = "sha256:81b2c9071a49367a7f770170e5eec8cb66567cfbbc8c73d20ce5ca4a8d71cf11", size = 14227, upload-time = "2022-07-08T18:31:40.459Z" }
-
 [[package]]
 name = "atroposlib"
 version = "0.4.0"
@@ -376,6 +357,15 @@ wheels = [
    { url = "https://files.pythonhosted.org/packages/41/0a/0896b829a39b5669a2d811e1a79598de661693685cd62b31f11d0c18e65b/av-17.0.0-cp314-cp314t-win_arm64.whl", hash = "sha256:dba98603fc4665b4f750de86fbaf6c0cfaece970671a9b529e0e3d1711e8367e", size = 22071058, upload-time = "2026-03-14T14:38:43.663Z" },
 ]

+[[package]]
+name = "base58"
+version = "2.1.1"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/7f/45/8ae61209bb9015f516102fa559a2914178da1d5868428bd86a1b4421141d/base58-2.1.1.tar.gz", hash = "sha256:c5d0cb3f5b6e81e8e35da5754388ddcc6d0d14b6c6a132cb93d69ed580a7278c", size = 6528, upload-time = "2021-10-30T22:12:17.858Z" }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/4a/45/ec96b29162a402fc4c1c5512d114d7b3787b9d1c2ec241d9568b4816ee23/base58-2.1.1-py3-none-any.whl", hash = "sha256:11a36f4d3ce51dfc1043f3218591ac4eb1ceb172919cebe05b52a5bcc8d245c2", size = 5621, upload-time = "2021-10-30T22:12:16.658Z" },
+]
+
 [[package]]
 name = "blinker"
 version = "1.9.0"
@@ -1661,7 +1651,7 @@ dependencies = [
    { name = "fal-client" },
    { name = "fire" },
    { name = "firecrawl-py" },
-    { name = "httpx" },
+    { name = "httpx", extra = ["socks"] },
    { name = "jinja2" },
    { name = "openai" },
    { name = "parallel-web" },
@@ -1691,6 +1681,8 @@ all = [
    { name = "faster-whisper" },
    { name = "honcho-ai" },
    { name = "lark-oapi" },
+    { name = "markdown", marker = "sys_platform == 'linux'" },
+    { name = "mautrix", extra = ["encryption"], marker = "sys_platform == 'linux'" },
    { name = "mcp" },
    { name = "mistralai" },
    { name = "modal" },
@@ -1736,7 +1728,7 @@ honcho = [
 ]
 matrix = [
    { name = "markdown" },
-    { name = "matrix-nio", extra = ["e2e"] },
+    { name = "mautrix", extra = ["encryption"] },
 ]
 mcp = [
    { name = "mcp" },
@@ -1827,6 +1819,7 @@ requires-dist = [
    { name = "hermes-agent", extras = ["homeassistant"], marker = "extra == 'all'" },
    { name = "hermes-agent", extras = ["honcho"], marker = "extra == 'all'" },
    { name = "hermes-agent", extras = ["honcho"], marker = "extra == 'termux'" },
+    { name = "hermes-agent", extras = ["matrix"], marker = "sys_platform == 'linux' and extra == 'all'" },
    { name = "hermes-agent", extras = ["mcp"], marker = "extra == 'all'" },
    { name = "hermes-agent", extras = ["mcp"], marker = "extra == 'termux'" },
    { name = "hermes-agent", extras = ["messaging"], marker = "extra == 'all'" },
@@ -1839,11 +1832,11 @@ requires-dist = [
    { name = "hermes-agent", extras = ["tts-premium"], marker = "extra == 'all'" },
    { name = "hermes-agent", extras = ["voice"], marker = "extra == 'all'" },
    { name = "honcho-ai", marker = "extra == 'honcho'", specifier = ">=2.0.1,<3" },
-    { name = "httpx", specifier = ">=0.28.1,<1" },
+    { name = "httpx", extras = ["socks"], specifier = ">=0.28.1,<1" },
    { name = "jinja2", specifier = ">=3.1.5,<4" },
    { name = "lark-oapi", marker = "extra == 'feishu'", specifier = ">=1.5.3,<2" },
    { name = "markdown", marker = "extra == 'matrix'", specifier = ">=3.6,<4" },
-    { name = "matrix-nio", extras = ["e2e"], marker = "extra == 'matrix'", specifier = ">=0.24.0,<1" },
+    { name = "mautrix", extras = ["encryption"], marker = "extra == 'matrix'", specifier = ">=0.20,<1" },
    { name = "mcp", marker = "extra == 'dev'", specifier = ">=1.2.0,<2" },
    { name = "mcp", marker = "extra == 'mcp'", specifier = ">=1.2.0,<2" },
    { name = "mistralai", marker = "extra == 'mistral'", specifier = ">=2.3.0,<3" },
@@ -2033,6 +2026,9 @@ wheels = [
 http2 = [
    { name = "h2" },
 ]
+socks = [
+    { name = "socksio" },
+]

 [[package]]
 name = "httpx-sse"
@@ -2595,30 +2591,25 @@ wheels = [
 ]

 [[package]]
-name = "matrix-nio"
-version = "0.25.2"
+name = "mautrix"
+version = "0.21.0"
 source = { registry = "https://pypi.org/simple" }
 dependencies = [
-    { name = "aiofiles" },
    { name = "aiohttp" },
-    { name = "aiohttp-socks" },
-    { name = "h11" },
-    { name = "h2" },
-    { name = "jsonschema" },
-    { name = "pycryptodome" },
-    { name = "unpaddedbase64" },
+    { name = "attrs" },
+    { name = "yarl" },
 ]
-sdist = { url = "https://files.pythonhosted.org/packages/33/50/c20129fd6f0e1aad3510feefd3229427fc8163a111f3911ed834e414116b/matrix_nio-0.25.2.tar.gz", hash = "sha256:8ef8180c374e12368e5c83a692abfb3bab8d71efcd17c5560b5c40c9b6f2f600", size = 155480, upload-time = "2024-10-04T07:51:41.62Z" }
+sdist = { url = "https://files.pythonhosted.org/packages/74/a7/8d6d0589e211ecf3a72ce4b28cc32c857c4043d1a6963d63ac9f726af653/mautrix-0.21.0.tar.gz", hash = "sha256:a14e0582e114cb241f282f9e717014608f36c03f1dc59afcd71b4e81780ffe2e", size = 254726, upload-time = "2025-11-17T13:53:09.996Z" }
 wheels = [
-    { url = "https://files.pythonhosted.org/packages/7b/0f/8b958d46e23ed4f69d2cffd63b46bb097a1155524e2e7f5c4279c8691c4a/matrix_nio-0.25.2-py3-none-any.whl", hash = "sha256:9c2880004b0e475db874456c0f79b7dd2b6285073a7663bcaca29e0754a67495", size = 181982, upload-time = "2024-10-04T07:51:39.451Z" },
+    { url = "https://files.pythonhosted.org/packages/8c/d6/d4b3ae380dacdc9fb07bc3eb7dd17f43b8a7ce391465a184d1094acb66c1/mautrix-0.21.0-py3-none-any.whl", hash = "sha256:1cba30d69f46351918a3b8bc4e5657465cac8470d42ddd2287a742653cab7194", size = 334131, upload-time = "2025-11-17T13:53:08.117Z" },
 ]

 [package.optional-dependencies]
-e2e = [
-    { name = "atomicwrites" },
-    { name = "cachetools" },
-    { name = "peewee" },
+encryption = [
+    { name = "base58" },
+    { name = "pycryptodome" },
    { name = "python-olm" },
+    { name = "unpaddedbase64" },
 ]

 [[package]]
@@ -3331,15 +3322,6 @@ wheels = [
    { url = "https://files.pythonhosted.org/packages/a0/3e/2218fa29637781b8e7ac35a928108ff2614ddd40879389d3af2caa725af5/parallel_web-0.4.2-py3-none-any.whl", hash = "sha256:aa3a4a9aecc08972c5ce9303271d4917903373dff4dd277d9a3e30f9cff53346", size = 144012, upload-time = "2026-03-09T22:24:33.979Z" },
 ]

-[[package]]
-name = "peewee"
-version = "3.19.0"
-source = { registry = "https://pypi.org/simple" }
-sdist = { url = "https://files.pythonhosted.org/packages/88/b0/79462b42e89764998756e0557f2b58a15610a5b4512fbbcccae58fba7237/peewee-3.19.0.tar.gz", hash = "sha256:f88292a6f0d7b906cb26bca9c8599b8f4d8920ebd36124400d0cbaaaf915511f", size = 974035, upload-time = "2026-01-07T17:24:59.597Z" }
-wheels = [
-    { url = "https://files.pythonhosted.org/packages/1a/41/19c65578ef9a54b3083253c68a607f099642747168fe00f3a2bceb7c3a34/peewee-3.19.0-py3-none-any.whl", hash = "sha256:de220b94766e6008c466e00ce4ba5299b9a832117d9eb36d45d0062f3cfd7417", size = 411885, upload-time = "2026-01-07T17:24:58.33Z" },
-]
-
 [[package]]
 name = "pillow"
 version = "12.1.1"
@@ -4002,15 +3984,6 @@ wheels = [
    { url = "https://files.pythonhosted.org/packages/79/93/f6729f10149305262194774d6c8b438c0b084740cf239f48ab97b4df02fa/python_olm-3.2.16-cp312-cp312-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:10a5e68a2f4b5a2bfa5fdb5dbfa22396a551730df6c4a572235acaa96e997d3f", size = 297000, upload-time = "2023-11-28T19:25:31.045Z" },
 ]

-[[package]]
-name = "python-socks"
-version = "2.8.1"
-source = { registry = "https://pypi.org/simple" }
-sdist = { url = "https://files.pythonhosted.org/packages/36/0b/cd77011c1bc01b76404f7aba07fca18aca02a19c7626e329b40201217624/python_socks-2.8.1.tar.gz", hash = "sha256:698daa9616d46dddaffe65b87db222f2902177a2d2b2c0b9a9361df607ab3687", size = 38909, upload-time = "2026-02-16T05:24:00.745Z" }
-wheels = [
-    { url = "https://files.pythonhosted.org/packages/15/fe/9a58cb6eec633ff6afae150ca53c16f8cc8b65862ccb3d088051efdfceb7/python_socks-2.8.1-py3-none-any.whl", hash = "sha256:28232739c4988064e725cdbcd15be194743dd23f1c910f784163365b9d7be035", size = 55087, upload-time = "2026-02-16T05:23:59.147Z" },
-]
-
 [[package]]
 name = "python-telegram-bot"
 version = "22.6"
@@ -4500,6 +4473,15 @@ wheels = [
    { url = "https://files.pythonhosted.org/packages/e9/44/75a9c9421471a6c4805dbf2356f7c181a29c1879239abab1ea2cc8f38b40/sniffio-1.3.1-py3-none-any.whl", hash = "sha256:2f6da418d1f1e0fddd844478f41680e794e6051915791a034ff65e5f100525a2", size = 10235, upload-time = "2024-02-25T23:20:01.196Z" },
 ]

+[[package]]
+name = "socksio"
+version = "1.0.0"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/f8/5c/48a7d9495be3d1c651198fd99dbb6ce190e2274d0f28b9051307bdec6b85/socksio-1.0.0.tar.gz", hash = "sha256:f88beb3da5b5c38b9890469de67d0cb0f9d494b78b106ca1845f96c10b91c4ac", size = 19055, upload-time = "2020-04-17T15:50:34.664Z" }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/37/c3/6eeb6034408dac0fa653d126c9204ade96b819c936e136c5e8a6897eee9c/socksio-1.0.0-py3-none-any.whl", hash = "sha256:95dc1f15f9b34e8d7b16f06d74b8ccf48f609af32ab33c608d08761c5dcbb1f3", size = 12763, upload-time = "2020-04-17T15:50:31.878Z" },
+]
+
 [[package]]
 name = "sounddevice"
 version = "0.5.5"
@@ -226,7 +226,8 @@ After each turn:
 |------|---------|
 | `run_agent.py` | AIAgent class — the complete agent loop (~9,200 lines) |
 | `agent/prompt_builder.py` | System prompt assembly from memory, skills, context files, personality |
-| `agent/context_compressor.py` | Conversation compression algorithm |
+| `agent/context_engine.py` | ContextEngine ABC — pluggable context management |
+| `agent/context_compressor.py` | Default engine — lossy summarization algorithm |
 | `agent/prompt_caching.py` | Anthropic prompt caching markers and cache metrics |
 | `agent/auxiliary_client.py` | Auxiliary LLM client for side tasks (vision, summarization) |
 | `model_tools.py` | Tool schema collection, `handle_function_call()` dispatch |
@@ -62,7 +62,8 @@ hermes-agent/
 │
 ├── agent/                    # Agent internals
 │   ├── prompt_builder.py     # System prompt assembly
-│   ├── context_compressor.py # Conversation compression algorithm
+│   ├── context_engine.py     # ContextEngine ABC (pluggable)
+│   ├── context_compressor.py # Default engine — lossy summarization
 │   ├── prompt_caching.py     # Anthropic prompt caching
 │   ├── auxiliary_client.py   # Auxiliary LLM for side tasks (vision, summarization)
 │   ├── model_metadata.py     # Model context lengths, token estimation
@@ -123,6 +124,7 @@ hermes-agent/
 ├── acp_adapter/              # ACP server (VS Code / Zed / JetBrains)
 ├── cron/                     # Scheduler (jobs.py, scheduler.py)
 ├── plugins/memory/           # Memory provider plugins
+├── plugins/context_engine/   # Context engine plugins
 ├── environments/             # RL training environments (Atropos)
 ├── skills/                   # Bundled skills (always available)
 ├── optional-skills/          # Official optional skills (install explicitly)
@@ -227,7 +229,7 @@ Long-running process with 14 platform adapters, unified session routing, user au

 ### Plugin System

-Three discovery sources: `~/.hermes/plugins/` (user), `.hermes/plugins/` (project), and pip entry points. Plugins register tools, hooks, and CLI commands through a context API. Memory providers are a specialized plugin type under `plugins/memory/`.
+Three discovery sources: `~/.hermes/plugins/` (user), `.hermes/plugins/` (project), and pip entry points. Plugins register tools, hooks, and CLI commands through a context API. Two specialized plugin types exist: memory providers (`plugins/memory/`) and context engines (`plugins/context_engine/`). Both are single-select — only one of each can be active at a time, configured via `hermes plugins` or `config.yaml`.

 → [Plugin Guide](/docs/guides/build-a-hermes-plugin), [Memory Provider Plugin](./memory-provider-plugin.md)

@@ -3,10 +3,37 @@
 Hermes Agent uses a dual compression system and Anthropic prompt caching to
 manage context window usage efficiently across long conversations.

-Source files: `agent/context_compressor.py`, `agent/prompt_caching.py`,
-`gateway/run.py` (session hygiene), `run_agent.py` (search for `_compress_context`)
+Source files: `agent/context_engine.py` (ABC), `agent/context_compressor.py` (default engine),
+`agent/prompt_caching.py`, `gateway/run.py` (session hygiene), `run_agent.py` (search for `_compress_context`)


+## Pluggable Context Engine
+
+Context management is built on the `ContextEngine` ABC (`agent/context_engine.py`). The built-in `ContextCompressor` is the default implementation, but plugins can replace it with alternative engines (e.g., Lossless Context Management).
+
+```yaml
+context:
+  engine: "compressor"    # default — built-in lossy summarization
+  engine: "lcm"           # example — plugin providing lossless context
+```
+
+The engine is responsible for:
+- Deciding when compaction should fire (`should_compress()`)
+- Performing compaction (`compress()`)
+- Optionally exposing tools the agent can call (e.g., `lcm_grep`)
+- Tracking token usage from API responses
+
+Selection is config-driven via `context.engine` in `config.yaml`. The resolution order:
+1. Check `plugins/context_engine/<name>/` directory
+2. Check general plugin system (`register_context_engine()`)
+3. Fall back to built-in `ContextCompressor`
+
+Plugin engines are **never auto-activated** — the user must explicitly set `context.engine` to the plugin's name. The default `"compressor"` always uses the built-in.
+
+Configure via `hermes plugins` → Provider Plugins → Context Engine, or edit `config.yaml` directly.
+
+For building a context engine plugin, see [Context Engine Plugins](/docs/developer-guide/context-engine-plugin).
+
 ## Dual Compression System

 Hermes has two separate compression layers that operate independently:
@@ -0,0 +1,189 @@
+---
+sidebar_position: 9
+title: "Context Engine Plugins"
+description: "How to build a context engine plugin that replaces the built-in ContextCompressor"
+---
+
+# Building a Context Engine Plugin
+
+Context engine plugins replace the built-in `ContextCompressor` with an alternative strategy for managing conversation context. For example, a Lossless Context Management (LCM) engine that builds a knowledge DAG instead of lossy summarization.
+
+## How it works
+
+The agent's context management is built on the `ContextEngine` ABC (`agent/context_engine.py`). The built-in `ContextCompressor` is the default implementation. Plugin engines must implement the same interface.
+
+Only **one** context engine can be active at a time. Selection is config-driven:
+
+```yaml
+# config.yaml
+context:
+  engine: "compressor"    # default built-in
+  engine: "lcm"           # activates a plugin engine named "lcm"
+```
+
+Plugin engines are **never auto-activated** — the user must explicitly set `context.engine` to the plugin's name.
+
+## Directory structure
+
+Each context engine lives in `plugins/context_engine/<name>/`:
+
+```
+plugins/context_engine/lcm/
+├── __init__.py      # exports the ContextEngine subclass
+├── plugin.yaml      # metadata (name, description, version)
+└── ...              # any other modules your engine needs
+```
+
+## The ContextEngine ABC
+
+Your engine must implement these **required** methods:
+
+```python
+from agent.context_engine import ContextEngine
+
+class LCMEngine(ContextEngine):
+
+    @property
+    def name(self) -> str:
+        """Short identifier, e.g. 'lcm'. Must match config.yaml value."""
+        return "lcm"
+
+    def update_from_response(self, usage: dict) -> None:
+        """Called after every LLM call with the usage dict.
+
+        Update self.last_prompt_tokens, self.last_completion_tokens,
+        self.last_total_tokens from the response.
+        """
+
+    def should_compress(self, prompt_tokens: int = None) -> bool:
+        """Return True if compaction should fire this turn."""
+
+    def compress(self, messages: list, current_tokens: int = None) -> list:
+        """Compact the message list and return a new (possibly shorter) list.
+
+        The returned list must be a valid OpenAI-format message sequence.
+        """
+```
+
+### Class attributes your engine must maintain
+
+The agent reads these directly for display and logging:
+
+```python
+last_prompt_tokens: int = 0
+last_completion_tokens: int = 0
+last_total_tokens: int = 0
+threshold_tokens: int = 0        # when compression triggers
+context_length: int = 0          # model's full context window
+compression_count: int = 0       # how many times compress() has run
+```
+
+### Optional methods
+
+These have sensible defaults in the ABC. Override as needed:
+
+| Method | Default | Override when |
+|--------|---------|--------------|
+| `on_session_start(session_id, **kwargs)` | No-op | You need to load persisted state (DAG, DB) |
+| `on_session_end(session_id, messages)` | No-op | You need to flush state, close connections |
+| `on_session_reset()` | Resets token counters | You have per-session state to clear |
+| `update_model(model, context_length, ...)` | Updates context_length + threshold | You need to recalculate budgets on model switch |
+| `get_tool_schemas()` | Returns `[]` | Your engine provides agent-callable tools (e.g., `lcm_grep`) |
+| `handle_tool_call(name, args, **kwargs)` | Returns error JSON | You implement tool handlers |
+| `should_compress_preflight(messages)` | Returns `False` | You can do a cheap pre-API-call estimate |
+| `get_status()` | Standard token/threshold dict | You have custom metrics to expose |
+
+## Engine tools
+
+Context engines can expose tools the agent calls directly. Return schemas from `get_tool_schemas()` and handle calls in `handle_tool_call()`:
+
+```python
+def get_tool_schemas(self):
+    return [{
+        "name": "lcm_grep",
+        "description": "Search the context knowledge graph",
+        "parameters": {
+            "type": "object",
+            "properties": {
+                "query": {"type": "string", "description": "Search query"}
+            },
+            "required": ["query"],
+        },
+    }]
+
+def handle_tool_call(self, name, args, **kwargs):
+    if name == "lcm_grep":
+        results = self._search_dag(args["query"])
+        return json.dumps({"results": results})
+    return json.dumps({"error": f"Unknown tool: {name}"})
+```
+
+Engine tools are injected into the agent's tool list at startup and dispatched automatically — no registry registration needed.
+
+## Registration
+
+### Via directory (recommended)
+
+Place your engine in `plugins/context_engine/<name>/`. The `__init__.py` must export a `ContextEngine` subclass. The discovery system finds and instantiates it automatically.
+
+### Via general plugin system
+
+A general plugin can also register a context engine:
+
+```python
+def register(ctx):
+    engine = LCMEngine(context_length=200000)
+    ctx.register_context_engine(engine)
+```
+
+Only one engine can be registered. A second plugin attempting to register is rejected with a warning.
+
+## Lifecycle
+
+```
+1. Engine instantiated (plugin load or directory discovery)
+2. on_session_start() — conversation begins
+3. update_from_response() — after each API call
+4. should_compress() — checked each turn
+5. compress() — called when should_compress() returns True
+6. on_session_end() — session boundary (CLI exit, /reset, gateway expiry)
+```
+
+`on_session_reset()` is called on `/new` or `/reset` to clear per-session state without a full shutdown.
+
+## Configuration
+
+Users select your engine via `hermes plugins` → Provider Plugins → Context Engine, or by editing `config.yaml`:
+
+```yaml
+context:
+  engine: "lcm"   # must match your engine's name property
+```
+
+The `compression` config block (`compression.threshold`, `compression.protect_last_n`, etc.) is specific to the built-in `ContextCompressor`. Your engine should define its own config format if needed, reading from `config.yaml` during initialization.
+
+## Testing
+
+```python
+from agent.context_engine import ContextEngine
+
+def test_engine_satisfies_abc():
+    engine = YourEngine(context_length=200000)
+    assert isinstance(engine, ContextEngine)
+    assert engine.name == "your-name"
+
+def test_compress_returns_valid_messages():
+    engine = YourEngine(context_length=200000)
+    msgs = [{"role": "user", "content": "hello"}]
+    result = engine.compress(msgs)
+    assert isinstance(result, list)
+    assert all("role" in m for m in result)
+```
+
+See `tests/agent/test_context_engine.py` for the full ABC contract test suite.
+
+## See also
+
+- [Context Compression and Caching](/docs/developer-guide/context-compression-and-caching) — how the built-in compressor works
+- [Memory Provider Plugins](/docs/developer-guide/memory-provider-plugin) — analogous single-select plugin system for memory
+- [Plugins](/docs/user-guide/features/plugins) — general plugin system overview
@@ -153,7 +153,7 @@ gateway/platforms/
 ├── slack.py             # Slack Socket Mode
 ├── whatsapp.py          # WhatsApp Business Cloud API
 ├── signal.py            # Signal via signal-cli REST API
-├── matrix.py            # Matrix via matrix-nio (optional E2EE)
+├── matrix.py            # Matrix via mautrix (optional E2EE)
 ├── mattermost.py        # Mattermost WebSocket API
 ├── email.py             # Email via IMAP/SMTP
 ├── sms.py               # SMS via Twilio
@@ -8,6 +8,10 @@ description: "How to build a memory provider plugin for Hermes Agent"

 Memory provider plugins give Hermes Agent persistent, cross-session knowledge beyond the built-in MEMORY.md and USER.md. This guide covers how to build one.

+:::tip
+Memory providers are one of two **provider plugin** types. The other is [Context Engine Plugins](/docs/developer-guide/context-engine-plugin), which replace the built-in context compressor. Both follow the same pattern: single-select, config-driven, managed via `hermes plugins`.
+:::
+
 ## Directory Structure

 Each memory provider lives in `plugins/memory/<name>/`:
@@ -547,6 +547,12 @@ After registration, users can run `hermes my-plugin status`, `hermes my-plugin c

 **Active-provider gating:** Memory plugin CLI commands only appear when their provider is the active `memory.provider` in config. If a user hasn't set up your provider, your CLI commands won't clutter the help output.

+:::tip
+This guide covers **general plugins** (tools, hooks, CLI commands). For specialized plugin types, see:
+- [Memory Provider Plugins](/docs/developer-guide/memory-provider-plugin) — cross-session knowledge backends
+- [Context Engine Plugins](/docs/developer-guide/context-engine-plugin) — alternative context management strategies
+:::
+
 ### Distribute via pip

 For sharing plugins publicly, add an entry point to your Python package:
@@ -140,15 +140,19 @@ Subcommands:

 | Subcommand | Description |
 |------------|-------------|
-| `run` | Run the gateway in the foreground. |
-| `start` | Start the installed gateway service. |
-| `stop` | Stop the service. |
+| `run` | Run the gateway in the foreground. Recommended for WSL, Docker, and Termux. |
+| `start` | Start the installed systemd/launchd background service. |
+| `stop` | Stop the service (or foreground process). |
 | `restart` | Restart the service. |
 | `status` | Show service status. |
-| `install` | Install as a user service (`systemd` on Linux, `launchd` on macOS). |
+| `install` | Install as a systemd (Linux) or launchd (macOS) background service. |
 | `uninstall` | Remove the installed service. |
 | `setup` | Interactive messaging-platform setup. |

+:::tip WSL users
+Use `hermes gateway run` instead of `hermes gateway start` — WSL's systemd support is unreliable. Wrap it in tmux for persistence: `tmux new -s hermes 'hermes gateway run'`. See [WSL FAQ](/docs/reference/faq#wsl-gateway-keeps-disconnecting-or-hermes-gateway-start-fails) for details.
+:::
+
 ## `hermes setup`

 ```bash
@@ -586,11 +590,14 @@ See [MCP Config Reference](./mcp-config-reference.md), [Use MCP with Hermes](../
 hermes plugins [subcommand]
 ```

-Manage Hermes Agent plugins. Running `hermes plugins` with no subcommand launches an interactive curses checklist to enable/disable installed plugins.
+Unified plugin management — general plugins, memory providers, and context engines in one place. Running `hermes plugins` with no subcommand opens a composite interactive screen with two sections:
+
+- **General Plugins** — multi-select checkboxes to enable/disable installed plugins
+- **Provider Plugins** — single-select configuration for Memory Provider and Context Engine. Press ENTER on a category to open a radio picker.

 | Subcommand | Description |
 |------------|-------------|
-| *(none)* | Interactive toggle UI — enable/disable plugins with arrow keys and space. |
+| *(none)* | Composite interactive UI — general plugin toggles + provider plugin configuration. |
 | `install <identifier> [--force]` | Install a plugin from a Git URL or `owner/repo`. |
 | `update <name>` | Pull latest changes for an installed plugin. |
 | `remove <name>` (aliases: `rm`, `uninstall`) | Remove an installed plugin. |
@@ -598,7 +605,11 @@ Manage Hermes Agent plugins. Running `hermes plugins` with no subcommand launche
 | `disable <name>` | Disable a plugin without removing it. |
 | `list` (alias: `ls`) | List installed plugins with enabled/disabled status. |

-Disabled plugins are stored in `config.yaml` under `plugins.disabled` and skipped during loading.
+Provider plugin selections are saved to `config.yaml`:
+- `memory.provider` — active memory provider (empty = built-in only)
+- `context.engine` — active context engine (`"compressor"` = built-in default)
+
+General plugin disabled list is stored in `config.yaml` under `plugins.disabled`.

 See [Plugins](../user-guide/features/plugins.md) and [Build a Hermes Plugin](../guides/build-a-hermes-plugin.md).

--- a/Show More
+++ b/Show More