fix: restore agent.close() cleanup and correct /restart category

- Add agent.close() call to _finalize_shutdown_agents() to prevent zombie processes (terminal sandboxes, browser daemons, httpx clients) - Global cleanup (process_registry, environments, browsers) preserved in _stop_impl() during conflict resolution - Move /restart CommandDef from 'Info' to 'Session' category to match /stop and /status
fix(gateway): address restart review feedback
2026-04-10 18:55:28 -07:00 · 2026-04-10 18:54:48 -07:00 · 2026-04-10 18:54:48 -07:00 · 2026-04-10 18:54:48 -07:00 · 2026-04-10 18:54:48 -07:00 · 2026-04-11 05:59:56 +05:30
58 changed files with 3041 additions and 507 deletions
@@ -857,7 +857,7 @@ def _read_main_provider() -> str:
    return ""


-def _resolve_custom_runtime() -> Tuple[Optional[str], Optional[str]]:
+def _resolve_custom_runtime() -> Tuple[Optional[str], Optional[str], Optional[str]]:
    """Resolve the active custom/main endpoint the same way the main CLI does.

    This covers both env-driven OPENAI_BASE_URL setups and config-saved custom
@@ -870,18 +870,29 @@ def _resolve_custom_runtime() -> Tuple[Optional[str], Optional[str]]:
        runtime = resolve_runtime_provider(requested="custom")
    except Exception as exc:
        logger.debug("Auxiliary client: custom runtime resolution failed: %s", exc)
-        return None, None
+        runtime = None
+
+    if not isinstance(runtime, dict):
+        openai_base = os.getenv("OPENAI_BASE_URL", "").strip().rstrip("/")
+        openai_key = os.getenv("OPENAI_API_KEY", "").strip()
+        if not openai_base:
+            return None, None, None
+        runtime = {
+            "base_url": openai_base,
+            "api_key": openai_key,
+        }

    custom_base = runtime.get("base_url")
    custom_key = runtime.get("api_key")
+    custom_mode = runtime.get("api_mode")
    if not isinstance(custom_base, str) or not custom_base.strip():
-        return None, None
+        return None, None, None

    custom_base = custom_base.strip().rstrip("/")
    if "openrouter.ai" in custom_base.lower():
        # requested='custom' falls back to OpenRouter when no custom endpoint is
        # configured. Treat that as "no custom endpoint" for auxiliary routing.
-        return None, None
+        return None, None, None

    # Local servers (Ollama, llama.cpp, vLLM, LM Studio) don't require auth.
    # Use a placeholder key — the OpenAI SDK requires a non-empty string but
@@ -890,20 +901,33 @@ def _resolve_custom_runtime() -> Tuple[Optional[str], Optional[str]]:
    if not isinstance(custom_key, str) or not custom_key.strip():
        custom_key = "no-key-required"

-    return custom_base, custom_key.strip()
+    if not isinstance(custom_mode, str) or not custom_mode.strip():
+        custom_mode = None
+
+    return custom_base, custom_key.strip(), custom_mode


 def _current_custom_base_url() -> str:
-    custom_base, _ = _resolve_custom_runtime()
+    custom_base, _, _ = _resolve_custom_runtime()
    return custom_base or ""


 def _try_custom_endpoint() -> Tuple[Optional[OpenAI], Optional[str]]:
-    custom_base, custom_key = _resolve_custom_runtime()
+    runtime = _resolve_custom_runtime()
+    if len(runtime) == 2:
+        custom_base, custom_key = runtime
+        custom_mode = None
+    else:
+        custom_base, custom_key, custom_mode = runtime
    if not custom_base or not custom_key:
        return None, None
+    if custom_base.lower().startswith(_CODEX_AUX_BASE_URL.lower()):
+        return None, None
    model = _read_main_model() or "gpt-4o-mini"
-    logger.debug("Auxiliary client: custom endpoint (%s)", model)
+    logger.debug("Auxiliary client: custom endpoint (%s, api_mode=%s)", model, custom_mode or "chat_completions")
+    if custom_mode == "codex_responses":
+        real_client = OpenAI(api_key=custom_key, base_url=custom_base)
+        return CodexAuxiliaryClient(real_client, model), model
    return OpenAI(api_key=custom_key, base_url=custom_base), model


@@ -487,7 +487,7 @@ def _parse_skill_file(skill_file: Path) -> tuple[bool, dict, str]:
    (True, {}, "") to err on the side of showing the skill.
    """
    try:
-        raw = skill_file.read_text(encoding="utf-8")[:2000]
+        raw = skill_file.read_text(encoding="utf-8")
        frontmatter, _ = parse_frontmatter(raw)

        if not skill_matches_platform(frontmatter):
@@ -495,7 +495,7 @@ def _parse_skill_file(skill_file: Path) -> tuple[bool, dict, str]:

        return True, frontmatter, extract_skill_description(frontmatter)
    except Exception as e:
-        logger.debug("Failed to parse skill file %s: %s", skill_file, e)
+        logger.warning("Failed to parse skill file %s: %s", skill_file, e)
        return True, {}, ""


@@ -558,9 +558,10 @@ def build_skills_system_prompt(
    # ── Layer 1: in-process LRU cache ─────────────────────────────────
    # Include the resolved platform so per-platform disabled-skill lists
    # produce distinct cache entries (gateway serves multiple platforms).
+    from gateway.session_context import get_session_env
    _platform_hint = (
        os.environ.get("HERMES_PLATFORM")
-        or os.environ.get("HERMES_SESSION_PLATFORM")
+        or get_session_env("HERMES_SESSION_PLATFORM")
        or ""
    )
    cache_key = (
@@ -145,10 +145,11 @@ def get_disabled_skill_names(platform: str | None = None) -> Set[str]:
    if not isinstance(skills_cfg, dict):
        return set()

+    from gateway.session_context import get_session_env
    resolved_platform = (
        platform
        or os.getenv("HERMES_PLATFORM")
-        or os.getenv("HERMES_SESSION_PLATFORM")
+        or get_session_env("HERMES_SESSION_PLATFORM")
    )
    if resolved_platform:
        platform_disabled = (skills_cfg.get("platform_disabled") or {}).get(
@@ -480,6 +480,12 @@ agent:
  # Fires once per run when inactivity reaches this threshold (seconds).
  # Set to 0 to disable the warning.
  # gateway_timeout_warning: 900
+
+  # Graceful drain timeout for gateway stop/restart (seconds).
+  # The gateway stops accepting new work, waits for in-flight agents to
+  # finish, then interrupts anything still running after this timeout.
+  # 0 = no drain, interrupt immediately.
+  # restart_drain_timeout: 60
  
  # Enable verbose logging
  verbose: false
@@ -31,7 +31,7 @@ except ImportError:
 # Configuration
 # =============================================================================

-HERMES_DIR = get_hermes_home()
+HERMES_DIR = get_hermes_home().resolve()
 CRON_DIR = HERMES_DIR / "cron"
 JOBS_FILE = CRON_DIR / "jobs.json"
 OUTPUT_DIR = CRON_DIR / "output"
@@ -338,10 +338,12 @@ def load_jobs() -> List[Dict[str, Any]]:
                    save_jobs(jobs)
                    logger.warning("Auto-repaired jobs.json (had invalid control characters)")
                return jobs
-        except Exception:
-            return []
-    except IOError:
-        return []
+        except Exception as e:
+            logger.error("Failed to auto-repair jobs.json: %s", e)
+            raise RuntimeError(f"Cron database corrupted and unrepairable: {e}") from e
+    except IOError as e:
+        logger.error("IOError reading jobs.json: %s", e)
+        raise RuntimeError(f"Failed to read cron database: {e}") from e


 def save_jobs(jobs: List[Dict[str, Any]]):
@@ -452,6 +454,7 @@ def create_job(
        "last_run_at": None,
        "last_status": None,
        "last_error": None,
+        "last_delivery_error": None,
        # Delivery configuration
        "deliver": deliver,
        "origin": origin,  # Tracks where job was created for "origin" delivery
@@ -620,8 +623,8 @@ def mark_job_run(job_id: str, success: bool, error: Optional[str] = None,

            save_jobs(jobs)
            return
-    
-    save_jobs(jobs)
+
+    logger.warning("mark_job_run: job_id %s not found, skipping save", job_id)


 def advance_next_run(job_id: str) -> bool:
@@ -769,7 +769,7 @@ def run_job(job: dict) -> tuple[bool, str, str, Optional[str]]:
            _cron_pool.shutdown(wait=False, cancel_futures=True)
            raise
        finally:
-            _cron_pool.shutdown(wait=False)
+            _cron_pool.shutdown(wait=False, cancel_futures=True)

        if _inactivity_timeout:
            # Build diagnostic summary from the agent's activity tracker.
@@ -12,7 +12,7 @@ INSTALL_DIR="/opt/hermes"
 # The "home/" subdirectory is a per-profile HOME for subprocesses (git,
 # ssh, gh, npm …).  Without it those tools write to /root which is
 # ephemeral and shared across profiles.  See issue #4426.
-mkdir -p "$HERMES_HOME"/{cron,sessions,logs,hooks,memories,skills,home}
+mkdir -p "$HERMES_HOME"/{cron,sessions,logs,hooks,memories,skills,skins,plans,workspace,home}

 # .env
 if [ ! -f "$HERMES_HOME/.env" ]; then
@@ -76,10 +76,15 @@ def build_channel_directory(adapters: Dict[Any, Any]) -> Dict[str, Any]:
        except Exception as e:
            logger.warning("Channel directory: failed to build %s: %s", platform.value, e)

-    # Telegram, WhatsApp & Signal can't enumerate chats -- pull from session history
-    for plat_name in ("telegram", "whatsapp", "signal", "weixin", "email", "sms", "bluebubbles"):
-        if plat_name not in platforms:
-            platforms[plat_name] = _build_from_sessions(plat_name)
+    # Platforms that don't support direct channel enumeration get session-based
+    # discovery automatically.  Skip infrastructure entries that aren't messaging
+    # platforms — everything else falls through to _build_from_sessions().
+    _SKIP_SESSION_DISCOVERY = frozenset({"local", "api_server", "webhook"})
+    for plat in Platform:
+        plat_name = plat.value
+        if plat_name in _SKIP_SESSION_DISCOVERY or plat_name in platforms:
+            continue
+        platforms[plat_name] = _build_from_sessions(plat_name)

    directory = {
        "updated_at": datetime.now().isoformat(),
@@ -642,6 +642,8 @@ def load_gateway_config() -> GatewayConfig:
                    os.environ["MATRIX_FREE_RESPONSE_ROOMS"] = str(frc)
                if "auto_thread" in matrix_cfg and not os.getenv("MATRIX_AUTO_THREAD"):
                    os.environ["MATRIX_AUTO_THREAD"] = str(matrix_cfg["auto_thread"]).lower()
+                if "dm_mention_threads" in matrix_cfg and not os.getenv("MATRIX_DM_MENTION_THREADS"):
+                    os.environ["MATRIX_DM_MENTION_THREADS"] = str(matrix_cfg["dm_mention_threads"]).lower()

    except Exception as e:
        logger.warning(
@@ -55,7 +55,6 @@ MAX_STORED_RESPONSES = 100
 MAX_REQUEST_BYTES = 1_000_000  # 1 MB default limit for POST bodies


-
 def check_api_server_requirements() -> bool:
    """Check if API server dependencies are available."""
    return AIOHTTP_AVAILABLE
@@ -410,7 +409,7 @@ class APIServerAdapter(BasePlatformAdapter):

        Returns None if auth is OK, or a 401 web.Response on failure.
        If no API key is configured, all requests are allowed (only when API
-        server is local)
+        server is local).
        """
        if not self._api_key:
            return None  # No key configured — allow all (local-only use)
@@ -673,6 +673,32 @@ class SendResult:
    retryable: bool = False  # True for transient connection errors — base will retry automatically


+def merge_pending_message_event(
+    pending_messages: Dict[str, MessageEvent],
+    session_key: str,
+    event: MessageEvent,
+) -> None:
+    """Store or merge a pending event for a session.
+
+    Photo bursts/albums often arrive as multiple near-simultaneous PHOTO
+    events. Merge those into the existing queued event so the next turn sees
+    the whole burst, while non-photo follow-ups still replace the pending
+    event normally.
+    """
+    existing = pending_messages.get(session_key)
+    if (
+        existing
+        and getattr(existing, "message_type", None) == MessageType.PHOTO
+        and event.message_type == MessageType.PHOTO
+    ):
+        existing.media_urls.extend(event.media_urls)
+        existing.media_types.extend(event.media_types)
+        if event.text:
+            existing.text = BasePlatformAdapter._merge_caption(existing.text, event.text)
+        return
+    pending_messages[session_key] = event
+
+
 # Error substrings that indicate a transient *connection* failure worth retrying.
 # "timeout" / "timed out" / "readtimeout" / "writetimeout" are intentionally
 # excluded: a read/write timeout on a non-idempotent call (e.g. send_message)
@@ -727,6 +753,7 @@ class BasePlatformAdapter(ABC):
        # working on a task after --replace or manual restarts.
        self._background_tasks: set[asyncio.Task] = set()
        self._expected_cancelled_tasks: set[asyncio.Task] = set()
+        self._busy_session_handler: Optional[Callable[[MessageEvent, str], Awaitable[bool]]] = None
        # Chats where auto-TTS on voice input is disabled (set by /voice off)
        self._auto_tts_disabled_chats: set = set()
        # Chats where typing indicator is paused (e.g. during approval waits).
@@ -815,6 +842,10 @@ class BasePlatformAdapter(ABC):
        an optional response string.
        """
        self._message_handler = handler
+
+    def set_busy_session_handler(self, handler: Optional[Callable[[MessageEvent, str], Awaitable[bool]]]) -> None:
+        """Set an optional handler for messages arriving during active sessions."""
+        self._busy_session_handler = handler
    
    def set_session_store(self, session_store: Any) -> None:
        """
@@ -1396,7 +1427,7 @@ class BasePlatformAdapter(ABC):
            # session lifecycle and its cleanup races with the running task
            # (see PR #4926).
            cmd = event.get_command()
-            if cmd in ("approve", "deny", "status", "stop", "new", "reset", "background"):
+            if cmd in ("approve", "deny", "status", "stop", "new", "reset", "background", "restart"):
                logger.debug(
                    "[%s] Command '/%s' bypassing active-session guard for %s",
                    self.name, cmd, session_key,
@@ -1415,19 +1446,19 @@ class BasePlatformAdapter(ABC):
                    logger.error("[%s] Command '/%s' dispatch failed: %s", self.name, cmd, e, exc_info=True)
                return

+            if self._busy_session_handler is not None:
+                try:
+                    if await self._busy_session_handler(event, session_key):
+                        return
+                except Exception as e:
+                    logger.error("[%s] Busy-session handler failed: %s", self.name, e, exc_info=True)
+
            # Special case: photo bursts/albums frequently arrive as multiple near-
            # simultaneous messages. Queue them without interrupting the active run,
            # then process them immediately after the current task finishes.
            if event.message_type == MessageType.PHOTO:
                logger.debug("[%s] Queuing photo follow-up for session %s without interrupt", self.name, session_key)
-                existing = self._pending_messages.get(session_key)
-                if existing and existing.message_type == MessageType.PHOTO:
-                    existing.media_urls.extend(event.media_urls)
-                    existing.media_types.extend(event.media_types)
-                    if event.text:
-                        existing.text = self._merge_caption(existing.text, event.text)
-                else:
-                    self._pending_messages[session_key] = event
+                merge_pending_message_event(self._pending_messages, session_key, event)
                return  # Don't interrupt now - will run after current task completes

            # Default behavior for non-photo follow-ups: interrupt the running agent
@@ -1190,6 +1190,8 @@ class FeishuAdapter(BasePlatformAdapter):
                lambda data: self._on_reaction_event("im.message.reaction.deleted_v1", data)
            )
            .register_p2_card_action_trigger(self._on_card_action_trigger)
+            .register_p2_im_chat_member_bot_added_v1(self._on_bot_added_to_chat)
+            .register_p2_im_chat_member_bot_deleted_v1(self._on_bot_removed_from_chat)
            .build()
        )

@@ -18,6 +18,7 @@ Environment variables:
    MATRIX_REQUIRE_MENTION      Require @mention in rooms (default: true)
    MATRIX_FREE_RESPONSE_ROOMS  Comma-separated room IDs exempt from mention requirement
    MATRIX_AUTO_THREAD          Auto-create threads for room messages (default: true)
+    MATRIX_DM_MENTION_THREADS   Create a thread when bot is @mentioned in a DM (default: false)
 """

 from __future__ import annotations
@@ -1043,6 +1044,13 @@ class MatrixAdapter(BasePlatformAdapter):
                if not self._is_bot_mentioned(body, formatted_body):
                    return

+        # DM mention-thread: when enabled, @mentioning bot in a DM creates a thread.
+        if is_dm and not thread_id:
+            dm_mention_threads = os.getenv("MATRIX_DM_MENTION_THREADS", "false").lower() in ("true", "1", "yes")
+            if dm_mention_threads and self._is_bot_mentioned(body, source_content.get("formatted_body")):
+                thread_id = event.event_id
+                self._track_thread(thread_id)
+
        # Strip mention from body when present (including in DMs).
        if self._is_bot_mentioned(body, source_content.get("formatted_body")):
            body = self._strip_mention(body)
@@ -1360,6 +1368,13 @@ class MatrixAdapter(BasePlatformAdapter):
                if not self._is_bot_mentioned(body, formatted_body):
                    return

+        # DM mention-thread: when enabled, @mentioning bot in a DM creates a thread.
+        if is_dm and not thread_id:
+            dm_mention_threads = os.getenv("MATRIX_DM_MENTION_THREADS", "false").lower() in ("true", "1", "yes")
+            if dm_mention_threads and self._is_bot_mentioned(body, source_content.get("formatted_body")):
+                thread_id = event.event_id
+                self._track_thread(thread_id)
+
        # Strip mention from body when present (including in DMs).
        if self._is_bot_mentioned(body, source_content.get("formatted_body")):
            body = self._strip_mention(body)
@@ -0,0 +1,20 @@
+"""Shared gateway restart constants and parsing helpers."""
+
+from hermes_cli.config import DEFAULT_CONFIG
+
+# EX_TEMPFAIL from sysexits.h — used to ask the service manager to restart
+# the gateway after a graceful drain/reload path completes.
+GATEWAY_SERVICE_RESTART_EXIT_CODE = 75
+
+DEFAULT_GATEWAY_RESTART_DRAIN_TIMEOUT = float(
+    DEFAULT_CONFIG["agent"]["restart_drain_timeout"]
+)
+
+
+def parse_restart_drain_timeout(raw: object) -> float:
+    """Parse a configured drain timeout, falling back to the shared default."""
+    try:
+        value = float(raw) if str(raw or "").strip() else DEFAULT_GATEWAY_RESTART_DRAIN_TIMEOUT
+    except (TypeError, ValueError):
+        return DEFAULT_GATEWAY_RESTART_DRAIN_TIMEOUT
+    return max(0.0, value)
@@ -186,6 +186,12 @@ if _config_path.exists():
                os.environ["HERMES_AGENT_TIMEOUT"] = str(_agent_cfg["gateway_timeout"])
            if "gateway_timeout_warning" in _agent_cfg and "HERMES_AGENT_TIMEOUT_WARNING" not in os.environ:
                os.environ["HERMES_AGENT_TIMEOUT_WARNING"] = str(_agent_cfg["gateway_timeout_warning"])
+            if "restart_drain_timeout" in _agent_cfg and "HERMES_RESTART_DRAIN_TIMEOUT" not in os.environ:
+                os.environ["HERMES_RESTART_DRAIN_TIMEOUT"] = str(_agent_cfg["restart_drain_timeout"])
+        _display_cfg = _cfg.get("display", {})
+        if _display_cfg and isinstance(_display_cfg, dict):
+            if "busy_input_mode" in _display_cfg and "HERMES_GATEWAY_BUSY_INPUT_MODE" not in os.environ:
+                os.environ["HERMES_GATEWAY_BUSY_INPUT_MODE"] = str(_display_cfg["busy_input_mode"])
        # Timezone: bridge config.yaml → HERMES_TIMEZONE env var.
        # HERMES_TIMEZONE from .env takes precedence (already in os.environ).
        _tz_cfg = _cfg.get("timezone", "")
@@ -235,7 +241,17 @@ from gateway.session import (
    build_session_key,
 )
 from gateway.delivery import DeliveryRouter
-from gateway.platforms.base import BasePlatformAdapter, MessageEvent, MessageType
+from gateway.platforms.base import (
+    BasePlatformAdapter,
+    MessageEvent,
+    MessageType,
+    merge_pending_message_event,
+)
+from gateway.restart import (
+    DEFAULT_GATEWAY_RESTART_DRAIN_TIMEOUT,
+    GATEWAY_SERVICE_RESTART_EXIT_CODE,
+    parse_restart_drain_timeout,
+)


 def _normalize_whatsapp_identifier(value: str) -> str:
@@ -471,6 +487,16 @@ class GatewayRunner:
    # Class-level defaults so partial construction in tests doesn't
    # blow up on attribute access.
    _running_agents_ts: Dict[str, float] = {}
+    _busy_input_mode: str = "interrupt"
+    _restart_drain_timeout: float = DEFAULT_GATEWAY_RESTART_DRAIN_TIMEOUT
+    _exit_code: Optional[int] = None
+    _draining: bool = False
+    _restart_requested: bool = False
+    _restart_task_started: bool = False
+    _restart_detached: bool = False
+    _restart_via_service: bool = False
+    _stop_task: Optional[asyncio.Task] = None
+    _session_model_overrides: Dict[str, Dict[str, str]] = {}
    
    def __init__(self, config: Optional[GatewayConfig] = None):
        self.config = config or load_gateway_config()
@@ -483,6 +509,8 @@ class GatewayRunner:
        self._reasoning_config = self._load_reasoning_config()
        self._service_tier = self._load_service_tier()
        self._show_reasoning = self._load_show_reasoning()
+        self._busy_input_mode = self._load_busy_input_mode()
+        self._restart_drain_timeout = self._load_restart_drain_timeout()
        self._provider_routing = self._load_provider_routing()
        self._fallback_model = self._load_fallback_model()
        self._smart_model_routing = self._load_smart_model_routing()
@@ -499,6 +527,13 @@ class GatewayRunner:
        self._exit_cleanly = False
        self._exit_with_failure = False
        self._exit_reason: Optional[str] = None
+        self._exit_code: Optional[int] = None
+        self._draining = False
+        self._restart_requested = False
+        self._restart_task_started = False
+        self._restart_detached = False
+        self._restart_via_service = False
+        self._stop_task: Optional[asyncio.Task] = None
        
        # Track running agents per session for interrupt support
        # Key: session_key, Value: AIAgent instance
@@ -759,6 +794,10 @@ class GatewayRunner:
    def exit_reason(self) -> Optional[str]:
        return self._exit_reason

+    @property
+    def exit_code(self) -> Optional[int]:
+        return self._exit_code
+
    def _session_key_for_source(self, source: SessionSource) -> str:
        """Resolve the current session key for a source, honoring gateway config when available."""
        if hasattr(self, "session_store") and self.session_store is not None:
@@ -868,6 +907,30 @@ class GatewayRunner:
        self._exit_cleanly = True
        self._exit_reason = reason
        self._shutdown_event.set()
+
+    def _running_agent_count(self) -> int:
+        return len(self._running_agents)
+
+    def _status_action_label(self) -> str:
+        return "restart" if self._restart_requested else "shutdown"
+
+    def _status_action_gerund(self) -> str:
+        return "restarting" if self._restart_requested else "shutting down"
+
+    def _queue_during_drain_enabled(self) -> bool:
+        return self._restart_requested and self._busy_input_mode == "queue"
+
+    def _update_runtime_status(self, gateway_state: Optional[str] = None, exit_reason: Optional[str] = None) -> None:
+        try:
+            from gateway.status import write_runtime_status
+            write_runtime_status(
+                gateway_state=gateway_state,
+                exit_reason=exit_reason,
+                restart_requested=self._restart_requested,
+                active_agents=self._running_agent_count(),
+            )
+        except Exception:
+            pass
    
    @staticmethod
    def _load_prefill_messages() -> List[Dict[str, Any]]:
@@ -994,6 +1057,48 @@ class GatewayRunner:
            pass
        return False

+    @staticmethod
+    def _load_busy_input_mode() -> str:
+        """Load gateway drain-time busy-input behavior from config/env."""
+        mode = os.getenv("HERMES_GATEWAY_BUSY_INPUT_MODE", "").strip().lower()
+        if not mode:
+            try:
+                import yaml as _y
+                cfg_path = _hermes_home / "config.yaml"
+                if cfg_path.exists():
+                    with open(cfg_path, encoding="utf-8") as _f:
+                        cfg = _y.safe_load(_f) or {}
+                    mode = str(cfg.get("display", {}).get("busy_input_mode", "") or "").strip().lower()
+            except Exception:
+                pass
+        return "queue" if mode == "queue" else "interrupt"
+
+    @staticmethod
+    def _load_restart_drain_timeout() -> float:
+        """Load graceful gateway restart/stop drain timeout in seconds."""
+        raw = os.getenv("HERMES_RESTART_DRAIN_TIMEOUT", "").strip()
+        if not raw:
+            try:
+                import yaml as _y
+                cfg_path = _hermes_home / "config.yaml"
+                if cfg_path.exists():
+                    with open(cfg_path, encoding="utf-8") as _f:
+                        cfg = _y.safe_load(_f) or {}
+                    raw = str(cfg.get("agent", {}).get("restart_drain_timeout", "") or "").strip()
+            except Exception:
+                pass
+        value = parse_restart_drain_timeout(raw)
+        if raw and value == DEFAULT_GATEWAY_RESTART_DRAIN_TIMEOUT:
+            try:
+                float(raw)
+            except (TypeError, ValueError):
+                logger.warning(
+                    "Invalid restart_drain_timeout '%s', using default %.0fs",
+                    raw,
+                    DEFAULT_GATEWAY_RESTART_DRAIN_TIMEOUT,
+                )
+        return value
+
    @staticmethod
    def _load_background_notifications_mode() -> str:
        """Load background process notification mode from config or env var.
@@ -1078,6 +1183,155 @@ class GatewayRunner:
            pass
        return {}

+    def _snapshot_running_agents(self) -> Dict[str, Any]:
+        return {
+            session_key: agent
+            for session_key, agent in self._running_agents.items()
+            if agent is not _AGENT_PENDING_SENTINEL
+        }
+
+    def _queue_or_replace_pending_event(self, session_key: str, event: MessageEvent) -> None:
+        adapter = self.adapters.get(event.source.platform)
+        if not adapter:
+            return
+        merge_pending_message_event(adapter._pending_messages, session_key, event)
+
+    async def _handle_active_session_busy_message(self, event: MessageEvent, session_key: str) -> bool:
+        if not self._draining:
+            return False
+
+        adapter = self.adapters.get(event.source.platform)
+        if not adapter:
+            return True
+
+        thread_meta = {"thread_id": event.source.thread_id} if event.source.thread_id else None
+        if self._queue_during_drain_enabled():
+            self._queue_or_replace_pending_event(session_key, event)
+            message = f"⏳ Gateway {self._status_action_gerund()} — queued for the next turn after it comes back."
+        else:
+            message = f"⏳ Gateway is {self._status_action_gerund()} and is not accepting another turn right now."
+
+        await adapter._send_with_retry(
+            chat_id=event.source.chat_id,
+            content=message,
+            reply_to=event.message_id,
+            metadata=thread_meta,
+        )
+        return True
+
+    async def _drain_active_agents(self, timeout: float) -> tuple[Dict[str, Any], bool]:
+        snapshot = self._snapshot_running_agents()
+        last_active_count = self._running_agent_count()
+        last_status_at = 0.0
+
+        def _maybe_update_status(force: bool = False) -> None:
+            nonlocal last_active_count, last_status_at
+            now = asyncio.get_running_loop().time()
+            active_count = self._running_agent_count()
+            if force or active_count != last_active_count or (now - last_status_at) >= 1.0:
+                self._update_runtime_status("draining")
+                last_active_count = active_count
+                last_status_at = now
+
+        if not self._running_agents:
+            _maybe_update_status(force=True)
+            return snapshot, False
+
+        _maybe_update_status(force=True)
+        if timeout <= 0:
+            return snapshot, True
+
+        deadline = asyncio.get_running_loop().time() + timeout
+        while self._running_agents and asyncio.get_running_loop().time() < deadline:
+            _maybe_update_status()
+            await asyncio.sleep(0.1)
+        timed_out = bool(self._running_agents)
+        _maybe_update_status(force=True)
+        return snapshot, timed_out
+
+    def _interrupt_running_agents(self, reason: str) -> None:
+        for session_key, agent in list(self._running_agents.items()):
+            if agent is _AGENT_PENDING_SENTINEL:
+                continue
+            try:
+                agent.interrupt(reason)
+                logger.debug("Interrupted running agent for session %s during shutdown", session_key[:20])
+            except Exception as e:
+                logger.debug("Failed interrupting agent during shutdown: %s", e)
+
+    def _finalize_shutdown_agents(self, active_agents: Dict[str, Any]) -> None:
+        for agent in active_agents.values():
+            try:
+                from hermes_cli.plugins import invoke_hook as _invoke_hook
+                _invoke_hook(
+                    "on_session_finalize",
+                    session_id=getattr(agent, "session_id", None),
+                    platform="gateway",
+                )
+            except Exception:
+                pass
+            try:
+                if hasattr(agent, "shutdown_memory_provider"):
+                    agent.shutdown_memory_provider()
+            except Exception:
+                pass
+            # Close tool resources (terminal sandboxes, browser daemons,
+            # background processes, httpx clients) to prevent zombie
+            # process accumulation.
+            try:
+                if hasattr(agent, 'close'):
+                    agent.close()
+            except Exception:
+                pass
+
+    async def _launch_detached_restart_command(self) -> None:
+        import shutil
+        import subprocess
+
+        hermes_cmd = _resolve_hermes_bin()
+        if not hermes_cmd:
+            logger.error("Could not locate hermes binary for detached /restart")
+            return
+
+        current_pid = os.getpid()
+        cmd = " ".join(shlex.quote(part) for part in hermes_cmd)
+        shell_cmd = (
+            f"while kill -0 {current_pid} 2>/dev/null; do sleep 0.2; done; "
+            f"{cmd} gateway restart"
+        )
+        setsid_bin = shutil.which("setsid")
+        if setsid_bin:
+            subprocess.Popen(
+                [setsid_bin, "bash", "-lc", shell_cmd],
+                stdout=subprocess.DEVNULL,
+                stderr=subprocess.DEVNULL,
+                start_new_session=True,
+            )
+        else:
+            subprocess.Popen(
+                ["bash", "-lc", shell_cmd],
+                stdout=subprocess.DEVNULL,
+                stderr=subprocess.DEVNULL,
+                start_new_session=True,
+            )
+
+    def request_restart(self, *, detached: bool = False, via_service: bool = False) -> bool:
+        if self._restart_task_started:
+            return False
+        self._restart_requested = True
+        self._restart_detached = detached
+        self._restart_via_service = via_service
+        self._restart_task_started = True
+
+        async def _run_restart() -> None:
+            await asyncio.sleep(0.05)
+            await self.stop(restart=True, detached_restart=detached, service_restart=via_service)
+
+        task = asyncio.create_task(_run_restart())
+        self._background_tasks.add(task)
+        task.add_done_callback(self._background_tasks.discard)
+        return True
+
    async def start(self) -> bool:
        """
        Start the gateway and all configured platform adapters.
@@ -1165,6 +1419,7 @@ class GatewayRunner:
            adapter.set_message_handler(self._handle_message)
            adapter.set_fatal_error_handler(self._handle_adapter_fatal_error)
            adapter.set_session_store(self.session_store)
+            adapter.set_busy_session_handler(self._handle_active_session_busy_message)
            
            # Try to connect
            logger.info("Connecting to %s...", platform.value)
@@ -1240,11 +1495,7 @@ class GatewayRunner:
        self.delivery_router.adapters = self.adapters
        
        self._running = True
-        try:
-            from gateway.status import write_runtime_status
-            write_runtime_status(gateway_state="running", exit_reason=None)
-        except Exception:
-            pass
+        self._update_runtime_status("running")
        
        # Emit gateway:startup hook
        hook_count = len(self.hooks.loaded_hooks)
@@ -1348,12 +1599,28 @@ class GatewayRunner:
                for key, entry in _expired_entries:
                    try:
                        await self._async_flush_memories(entry.session_id)
-                        # Shut down memory provider on the cached agent
-                        cached_agent = self._running_agents.get(key)
-                        if cached_agent and cached_agent is not _AGENT_PENDING_SENTINEL:
+                        # Shut down memory provider and close tool resources
+                        # on the cached agent.  Idle agents live in
+                        # _agent_cache (not _running_agents), so look there.
+                        _cached_agent = None
+                        _cache_lock = getattr(self, "_agent_cache_lock", None)
+                        if _cache_lock is not None:
+                            with _cache_lock:
+                                _cached = self._agent_cache.get(key)
+                                _cached_agent = _cached[0] if isinstance(_cached, tuple) else _cached if _cached else None
+                        # Fall back to _running_agents in case the agent is
+                        # still mid-turn when the expiry fires.
+                        if _cached_agent is None:
+                            _cached_agent = self._running_agents.get(key)
+                        if _cached_agent and _cached_agent is not _AGENT_PENDING_SENTINEL:
                            try:
-                                if hasattr(cached_agent, 'shutdown_memory_provider'):
-                                    cached_agent.shutdown_memory_provider()
+                                if hasattr(_cached_agent, 'shutdown_memory_provider'):
+                                    _cached_agent.shutdown_memory_provider()
+                            except Exception:
+                                pass
+                            try:
+                                if hasattr(_cached_agent, 'close'):
+                                    _cached_agent.close()
                            except Exception:
                                pass
                        # Mark as flushed and persist to disk so the flag
@@ -1463,6 +1730,7 @@ class GatewayRunner:
                    adapter.set_message_handler(self._handle_message)
                    adapter.set_fatal_error_handler(self._handle_adapter_fatal_error)
                    adapter.set_session_store(self.session_store)
+                    adapter.set_busy_session_handler(self._handle_active_session_busy_message)

                    success = await adapter.connect()
                    if success:
@@ -1509,64 +1777,108 @@ class GatewayRunner:
                    return
                await asyncio.sleep(1)

-    async def stop(self) -> None:
+    async def stop(
+        self,
+        *,
+        restart: bool = False,
+        detached_restart: bool = False,
+        service_restart: bool = False,
+    ) -> None:
        """Stop the gateway and disconnect all adapters."""
-        logger.info("Stopping gateway...")
-        self._running = False
+        if restart:
+            self._restart_requested = True
+            self._restart_detached = detached_restart
+            self._restart_via_service = service_restart
+        if self._stop_task is not None:
+            await self._stop_task
+            return

-        for session_key, agent in list(self._running_agents.items()):
-            if agent is _AGENT_PENDING_SENTINEL:
-                continue
+        async def _stop_impl() -> None:
+            logger.info(
+                "Stopping gateway%s...",
+                " for restart" if self._restart_requested else "",
+            )
+            self._running = False
+            self._draining = True
+
+            timeout = self._restart_drain_timeout
+            active_agents, timed_out = await self._drain_active_agents(timeout)
+            if timed_out:
+                logger.warning(
+                    "Gateway drain timed out after %.1fs with %d active agent(s); interrupting remaining work.",
+                    timeout,
+                    self._running_agent_count(),
+                )
+                self._interrupt_running_agents(
+                    "Gateway restarting" if self._restart_requested else "Gateway shutting down"
+                )
+                interrupt_deadline = asyncio.get_running_loop().time() + 5.0
+                while self._running_agents and asyncio.get_running_loop().time() < interrupt_deadline:
+                    self._update_runtime_status("draining")
+                    await asyncio.sleep(0.1)
+
+            if self._restart_requested and self._restart_detached:
+                try:
+                    await self._launch_detached_restart_command()
+                except Exception as e:
+                    logger.error("Failed to launch detached gateway restart: %s", e)
+
+            self._finalize_shutdown_agents(active_agents)
+
+            for platform, adapter in list(self.adapters.items()):
+                try:
+                    await adapter.cancel_background_tasks()
+                except Exception as e:
+                    logger.debug("✗ %s background-task cancel error: %s", platform.value, e)
+                try:
+                    await adapter.disconnect()
+                    logger.info("✓ %s disconnected", platform.value)
+                except Exception as e:
+                    logger.error("✗ %s disconnect error: %s", platform.value, e)
+
+            for _task in list(self._background_tasks):
+                if _task is self._stop_task:
+                    continue
+                _task.cancel()
+            self._background_tasks.clear()
+
+            self.adapters.clear()
+            self._running_agents.clear()
+            self._pending_messages.clear()
+            self._pending_approvals.clear()
+            self._shutdown_event.set()
+
+            # Global cleanup: kill any remaining tool subprocesses not tied
+            # to a specific agent (catch-all for zombie prevention).
            try:
-                agent.interrupt("Gateway shutting down")
-                logger.debug("Interrupted running agent for session %s during shutdown", session_key[:20])
-            except Exception as e:
-                logger.debug("Failed interrupting agent during shutdown: %s", e)
-            # Fire plugin on_session_finalize hook before memory shutdown
-            try:
-                from hermes_cli.plugins import invoke_hook as _invoke_hook
-                _invoke_hook("on_session_finalize",
-                             session_id=getattr(agent, 'session_id', None),
-                             platform="gateway")
+                from tools.process_registry import process_registry
+                process_registry.kill_all()
            except Exception:
                pass
-            # Shut down memory provider at actual session boundary
            try:
-                if hasattr(agent, 'shutdown_memory_provider'):
-                    agent.shutdown_memory_provider()
+                from tools.terminal_tool import cleanup_all_environments
+                cleanup_all_environments()
+            except Exception:
+                pass
+            try:
+                from tools.browser_tool import cleanup_all_browsers
+                cleanup_all_browsers()
            except Exception:
                pass

-        for platform, adapter in list(self.adapters.items()):
-            try:
-                await adapter.cancel_background_tasks()
-            except Exception as e:
-                logger.debug("✗ %s background-task cancel error: %s", platform.value, e)
-            try:
-                await adapter.disconnect()
-                logger.info("✓ %s disconnected", platform.value)
-            except Exception as e:
-                logger.error("✗ %s disconnect error: %s", platform.value, e)
+            from gateway.status import remove_pid_file
+            remove_pid_file()

-        # Cancel any pending background tasks
-        for _task in list(self._background_tasks):
-            _task.cancel()
-        self._background_tasks.clear()
+            if self._restart_requested and self._restart_via_service:
+                self._exit_code = GATEWAY_SERVICE_RESTART_EXIT_CODE
+                self._exit_reason = self._exit_reason or "Gateway restart requested"

-        self.adapters.clear()
-        self._running_agents.clear()
-        self._pending_messages.clear()
-        self._pending_approvals.clear()
-        self._shutdown_event.set()
-        
-        from gateway.status import remove_pid_file, write_runtime_status
-        remove_pid_file()
-        try:
-            write_runtime_status(gateway_state="stopped", exit_reason=self._exit_reason)
-        except Exception:
-            pass
-        
-        logger.info("Gateway stopped")
+            self._draining = False
+            self._update_runtime_status("stopped", self._exit_reason)
+            logger.info("Gateway stopped")
+
+        self._stop_task = asyncio.create_task(_stop_impl())
+        await self._stop_task
    
    async def wait_for_shutdown(self) -> None:
        """Wait for shutdown signal."""
@@ -1972,6 +2284,9 @@ class GatewayRunner:
            _evt_cmd = event.get_command()
            _cmd_def_inner = _resolve_cmd_inner(_evt_cmd) if _evt_cmd else None

+            if _cmd_def_inner and _cmd_def_inner.name == "restart":
+                return await self._handle_restart_command(event)
+
            # /stop must hard-kill the session when an agent is running.
            # A soft interrupt (agent.interrupt()) doesn't help when the agent
            # is truly hung — the executor thread is blocked and never checks
@@ -2052,18 +2367,7 @@ class GatewayRunner:
                logger.debug("PRIORITY photo follow-up for session %s — queueing without interrupt", _quick_key[:20])
                adapter = self.adapters.get(source.platform)
                if adapter:
-                    # Reuse adapter queue semantics so photo bursts merge cleanly.
-                    if _quick_key in adapter._pending_messages:
-                        existing = adapter._pending_messages[_quick_key]
-                        if getattr(existing, "message_type", None) == MessageType.PHOTO:
-                            existing.media_urls.extend(event.media_urls)
-                            existing.media_types.extend(event.media_types)
-                            if event.text:
-                                existing.text = BasePlatformAdapter._merge_caption(existing.text, event.text)
-                        else:
-                            adapter._pending_messages[_quick_key] = event
-                    else:
-                        adapter._pending_messages[_quick_key] = event
+                    merge_pending_message_event(adapter._pending_messages, _quick_key, event)
                return None

            running_agent = self._running_agents.get(_quick_key)
@@ -2081,6 +2385,14 @@ class GatewayRunner:
                if adapter:
                    adapter._pending_messages[_quick_key] = event
                return None
+            if self._draining:
+                if self._queue_during_drain_enabled():
+                    self._queue_or_replace_pending_event(_quick_key, event)
+                return (
+                    f"⏳ Gateway {self._status_action_gerund()} — queued for the next turn after it comes back."
+                    if self._queue_during_drain_enabled()
+                    else f"⏳ Gateway is {self._status_action_gerund()} and is not accepting another turn right now."
+                )
            logger.debug("PRIORITY interrupt for session %s", _quick_key[:20])
            running_agent.interrupt(event.text)
            if _quick_key in self._pending_messages:
@@ -2122,6 +2434,9 @@ class GatewayRunner:

        if canonical == "status":
            return await self._handle_status_command(event)
+
+        if canonical == "restart":
+            return await self._handle_restart_command(event)
        
        if canonical == "stop":
            return await self._handle_stop_command(event)
@@ -2220,6 +2535,9 @@ class GatewayRunner:
        if canonical == "voice":
            return await self._handle_voice_command(event)

+        if self._draining:
+            return f"⏳ Gateway is {self._status_action_gerund()} and is not accepting new work right now."
+
        # User-defined quick commands (bypass agent loop, no LLM call)
        if command:
            if isinstance(self.config, dict):
@@ -2400,8 +2718,8 @@ class GatewayRunner:
        # Build session context
        context = build_session_context(source, self.config, session_entry)
        
-        # Set environment variables for tools
-        self._set_session_env(context)
+        # Set session context variables for tools (task-local, concurrency-safe)
+        _session_env_tokens = self._set_session_env(context)
        
        # Read privacy.redact_pii from config (re-read per message)
        _redact_pii = False
@@ -3234,8 +3552,8 @@ class GatewayRunner:
                "Try again or use /reset to start a fresh session."
            )
        finally:
-            # Clear session env
-            self._clear_session_env()
+            # Restore session context variables to their pre-handler state
+            self._clear_session_env(_session_env_tokens)
    
    def _format_session_info(self) -> str:
        """Resolve current model config and return a formatted info block.
@@ -3335,8 +3653,22 @@ class GatewayRunner:
                _flush_task.add_done_callback(self._background_tasks.discard)
        except Exception as e:
            logger.debug("Gateway memory flush on reset failed: %s", e)
+        # Close tool resources on the old agent (terminal sandboxes, browser
+        # daemons, background processes) before evicting from cache.
+        # Guard with getattr because test fixtures may skip __init__.
+        _cache_lock = getattr(self, "_agent_cache_lock", None)
+        if _cache_lock is not None:
+            with _cache_lock:
+                _cached = self._agent_cache.get(session_key)
+                _old_agent = _cached[0] if isinstance(_cached, tuple) else _cached if _cached else None
+            if _old_agent is not None:
+                try:
+                    if hasattr(_old_agent, "close"):
+                        _old_agent.close()
+                except Exception:
+                    pass
        self._evict_cached_agent(session_key)
-        
+
        try:
            from tools.env_passthrough import clear_env_passthrough
            clear_env_passthrough()
@@ -3500,7 +3832,21 @@ class GatewayRunner:
            return "⚡ Force-stopped. The session is unlocked — you can send a new message."
        else:
            return "No active task to stop."
-    
+
+    async def _handle_restart_command(self, event: MessageEvent) -> str:
+        """Handle /restart command - drain active work, then restart the gateway."""
+        if self._restart_requested or self._draining:
+            count = self._running_agent_count()
+            if count:
+                return f"⏳ Draining {count} active agent(s) before restart..."
+            return "⏳ Gateway restart already in progress..."
+
+        active_agents = self._running_agent_count()
+        self.request_restart(detached=True, via_service=False)
+        if active_agents:
+            return f"⏳ Draining {active_agents} active agent(s) before restart..."
+        return "♻ Restarting gateway..."
+
    async def _handle_help_command(self, event: MessageEvent) -> str:
        """Handle /help command - list available commands."""
        from hermes_cli.commands import gateway_help_lines
@@ -3623,7 +3969,7 @@ class GatewayRunner:
        # Check for session override
        source = event.source
        session_key = self._session_key_for_source(source)
-        override = getattr(self, "_session_model_overrides", {}).get(session_key, {})
+        override = self._session_model_overrides.get(session_key, {})
        if override:
            current_model = override.get("model", current_model)
            current_provider = override.get("provider", current_provider)
@@ -3705,8 +4051,6 @@ class GatewayRunner:
                            f"via {result.provider_label or result.target_provider}. "
                            f"Adjust your self-identification accordingly.]"
                        )
-                        if not hasattr(_self, "_session_model_overrides"):
-                            _self._session_model_overrides = {}
                        _self._session_model_overrides[_session_key] = {
                            "model": result.new_model,
                            "provider": result.target_provider,
@@ -3820,8 +4164,6 @@ class GatewayRunner:
        )

        # Store session override so next agent creation uses the new model
-        if not hasattr(self, "_session_model_overrides"):
-            self._session_model_overrides = {}
        self._session_model_overrides[session_key] = {
            "model": result.new_model,
            "provider": result.target_provider,
@@ -6120,20 +6462,27 @@ class GatewayRunner:

        return True

-    def _set_session_env(self, context: SessionContext) -> None:
-        """Set environment variables for the current session."""
-        os.environ["HERMES_SESSION_PLATFORM"] = context.source.platform.value
-        os.environ["HERMES_SESSION_CHAT_ID"] = context.source.chat_id
-        if context.source.chat_name:
-            os.environ["HERMES_SESSION_CHAT_NAME"] = context.source.chat_name
-        if context.source.thread_id:
-            os.environ["HERMES_SESSION_THREAD_ID"] = str(context.source.thread_id)
-    
-    def _clear_session_env(self) -> None:
-        """Clear session environment variables."""
-        for var in ["HERMES_SESSION_PLATFORM", "HERMES_SESSION_CHAT_ID", "HERMES_SESSION_CHAT_NAME", "HERMES_SESSION_THREAD_ID"]:
-            if var in os.environ:
-                del os.environ[var]
+    def _set_session_env(self, context: SessionContext) -> list:
+        """Set session context variables for the current async task.
+
+        Uses ``contextvars`` instead of ``os.environ`` so that concurrent
+        gateway messages cannot overwrite each other's session state.
+
+        Returns a list of reset tokens; pass them to ``_clear_session_env``
+        in a ``finally`` block.
+        """
+        from gateway.session_context import set_session_vars
+        return set_session_vars(
+            platform=context.source.platform.value,
+            chat_id=context.source.chat_id,
+            chat_name=context.source.chat_name or "",
+            thread_id=str(context.source.thread_id) if context.source.thread_id else "",
+        )
+
+    def _clear_session_env(self, tokens: list) -> None:
+        """Restore session context variables to their pre-handler values."""
+        from gateway.session_context import clear_session_vars
+        clear_session_vars(tokens)
    
    async def _enrich_message_with_vision(
        self,
@@ -7300,6 +7649,8 @@ class GatewayRunner:
                await asyncio.sleep(0.05)
            if session_key:
                self._running_agents[session_key] = agent_holder[0]
+                if self._draining:
+                    self._update_runtime_status("draining")
        
        tracking_task = asyncio.create_task(track_agent())
        
@@ -7545,6 +7896,14 @@ class GatewayRunner:
                    except Exception:
                        pass

+            if self._draining and pending:
+                logger.info(
+                    "Discarding pending follow-up for session %s during gateway %s",
+                    session_key[:20] if session_key else "?",
+                    self._status_action_label(),
+                )
+                pending = None
+
            if pending:
                logger.debug("Processing pending message: '%s...'", pending[:40])
                
@@ -7621,6 +7980,8 @@ class GatewayRunner:
                del self._running_agents[session_key]
            if session_key:
                self._running_agents_ts.pop(session_key, None)
+            if self._draining:
+                self._update_runtime_status("draining")
            
            # Wait for cancelled tasks
            for task in [progress_task, interrupt_monitor, tracking_task, _notify_task]:
@@ -7818,13 +8179,21 @@ async def start_gateway(config: Optional[GatewayConfig] = None, replace: bool =
    runner = GatewayRunner(config)
    
    # Set up signal handlers
-    def signal_handler():
+    def shutdown_signal_handler():
        asyncio.create_task(runner.stop())
+
+    def restart_signal_handler():
+        runner.request_restart(detached=False, via_service=True)
    
    loop = asyncio.get_event_loop()
    for sig in (signal.SIGINT, signal.SIGTERM):
        try:
-            loop.add_signal_handler(sig, signal_handler)
+            loop.add_signal_handler(sig, shutdown_signal_handler)
+        except NotImplementedError:
+            pass
+    if hasattr(signal, "SIGUSR1"):
+        try:
+            loop.add_signal_handler(signal.SIGUSR1, restart_signal_handler)
        except NotImplementedError:
            pass
    
@@ -7874,6 +8243,9 @@ async def start_gateway(config: Optional[GatewayConfig] = None, replace: bool =
    except Exception:
        pass

+    if runner.exit_code is not None:
+        raise SystemExit(runner.exit_code)
+
    return True


@@ -0,0 +1,113 @@
+"""
+Session-scoped context variables for the Hermes gateway.
+
+Replaces the previous ``os.environ``-based session state
+(``HERMES_SESSION_PLATFORM``, ``HERMES_SESSION_CHAT_ID``, etc.) with
+Python's ``contextvars.ContextVar``.
+
+**Why this matters**
+
+The gateway processes messages concurrently via ``asyncio``.  When two
+messages arrive at the same time the old code did:
+
+    os.environ["HERMES_SESSION_THREAD_ID"] = str(context.source.thread_id)
+
+Because ``os.environ`` is *process-global*, Message A's value was
+silently overwritten by Message B before Message A's agent finished
+running.  Background-task notifications and tool calls therefore routed
+to the wrong thread.
+
+``contextvars.ContextVar`` values are *task-local*: each ``asyncio``
+task (and any ``run_in_executor`` thread it spawns) gets its own copy,
+so concurrent messages never interfere.
+
+**Backward compatibility**
+
+The public helper ``get_session_env(name, default="")`` mirrors the old
+``os.getenv("HERMES_SESSION_*", ...)`` calls.  Existing tool code only
+needs to replace the import + call site:
+
+    # before
+    import os
+    platform = os.getenv("HERMES_SESSION_PLATFORM", "")
+
+    # after
+    from gateway.session_context import get_session_env
+    platform = get_session_env("HERMES_SESSION_PLATFORM", "")
+"""
+
+from contextvars import ContextVar
+
+# ---------------------------------------------------------------------------
+# Per-task session variables
+# ---------------------------------------------------------------------------
+
+_SESSION_PLATFORM: ContextVar[str] = ContextVar("HERMES_SESSION_PLATFORM", default="")
+_SESSION_CHAT_ID: ContextVar[str] = ContextVar("HERMES_SESSION_CHAT_ID", default="")
+_SESSION_CHAT_NAME: ContextVar[str] = ContextVar("HERMES_SESSION_CHAT_NAME", default="")
+_SESSION_THREAD_ID: ContextVar[str] = ContextVar("HERMES_SESSION_THREAD_ID", default="")
+
+_VAR_MAP = {
+    "HERMES_SESSION_PLATFORM": _SESSION_PLATFORM,
+    "HERMES_SESSION_CHAT_ID": _SESSION_CHAT_ID,
+    "HERMES_SESSION_CHAT_NAME": _SESSION_CHAT_NAME,
+    "HERMES_SESSION_THREAD_ID": _SESSION_THREAD_ID,
+}
+
+
+def set_session_vars(
+    platform: str = "",
+    chat_id: str = "",
+    chat_name: str = "",
+    thread_id: str = "",
+) -> list:
+    """Set all session context variables and return reset tokens.
+
+    Call ``clear_session_vars(tokens)`` in a ``finally`` block to restore
+    the previous values when the handler exits.
+
+    Returns a list of ``Token`` objects (one per variable) that can be
+    passed to ``clear_session_vars``.
+    """
+    tokens = [
+        _SESSION_PLATFORM.set(platform),
+        _SESSION_CHAT_ID.set(chat_id),
+        _SESSION_CHAT_NAME.set(chat_name),
+        _SESSION_THREAD_ID.set(thread_id),
+    ]
+    return tokens
+
+
+def clear_session_vars(tokens: list) -> None:
+    """Restore session context variables to their pre-handler values."""
+    if not tokens:
+        return
+    vars_in_order = [
+        _SESSION_PLATFORM,
+        _SESSION_CHAT_ID,
+        _SESSION_CHAT_NAME,
+        _SESSION_THREAD_ID,
+    ]
+    for var, token in zip(vars_in_order, tokens):
+        var.reset(token)
+
+
+def get_session_env(name: str, default: str = "") -> str:
+    """Read a session context variable by its legacy ``HERMES_SESSION_*`` name.
+
+    Drop-in replacement for ``os.getenv("HERMES_SESSION_*", default)``.
+
+    Resolution order:
+    1. Context variable (set by the gateway for concurrency-safe access)
+    2. ``os.environ`` (used by CLI, cron scheduler, and tests)
+    3. *default*
+    """
+    import os
+
+    var = _VAR_MAP.get(name)
+    if var is not None:
+        value = var.get()
+        if value:
+            return value
+    # Fall back to os.environ for CLI, cron, and test compatibility
+    return os.getenv(name, default)
@@ -158,6 +158,8 @@ def _build_runtime_status_record() -> dict[str, Any]:
    payload.update({
        "gateway_state": "starting",
        "exit_reason": None,
+        "restart_requested": False,
+        "active_agents": 0,
        "platforms": {},
        "updated_at": _utc_now_iso(),
    })
@@ -218,6 +220,8 @@ def write_runtime_status(
    *,
    gateway_state: Optional[str] = None,
    exit_reason: Optional[str] = None,
+    restart_requested: Optional[bool] = None,
+    active_agents: Optional[int] = None,
    platform: Optional[str] = None,
    platform_state: Optional[str] = None,
    error_code: Optional[str] = None,
@@ -236,6 +240,10 @@ def write_runtime_status(
        payload["gateway_state"] = gateway_state
    if exit_reason is not None:
        payload["exit_reason"] = exit_reason
+    if restart_requested is not None:
+        payload["restart_requested"] = bool(restart_requested)
+    if active_agents is not None:
+        payload["active_agents"] = max(0, int(active_agents))

    if platform is not None:
        platform_payload = payload["platforms"].get(platform, {})
@@ -140,6 +140,8 @@ COMMAND_REGISTRY: list[CommandDef] = [
    CommandDef("commands", "Browse all commands and skills (paginated)", "Info",
               gateway_only=True, args_hint="[page]"),
    CommandDef("help", "Show available commands", "Info"),
+    CommandDef("restart", "Gracefully restart the gateway after draining active runs", "Session",
+               gateway_only=True),
    CommandDef("usage", "Show token usage and rate limits for the current session", "Info"),
    CommandDef("insights", "Show usage insights and analytics", "Info",
               args_hint="[days]"),
@@ -269,6 +269,11 @@ DEFAULT_CONFIG = {
        # tools or receiving API responses.  Only fires when the agent has
        # been completely idle for this duration.  0 = unlimited.
        "gateway_timeout": 1800,
+        # Graceful drain timeout for gateway stop/restart (seconds).
+        # The gateway stops accepting new work, waits for running agents
+        # to finish, then interrupts any remaining runs after the timeout.
+        # 0 = no drain, interrupt immediately.
+        "restart_drain_timeout": 60,
        "service_tier": "",
        # Tool-use enforcement: injects system prompt guidance that tells the
        # model to actually call tools instead of describing intended actions.
@@ -15,7 +15,19 @@ from pathlib import Path
 PROJECT_ROOT = Path(__file__).parent.parent.resolve()

 from gateway.status import terminate_pid
-from hermes_cli.config import get_env_value, get_hermes_home, save_env_value, is_managed, managed_error
+from gateway.restart import (
+    DEFAULT_GATEWAY_RESTART_DRAIN_TIMEOUT,
+    GATEWAY_SERVICE_RESTART_EXIT_CODE,
+    parse_restart_drain_timeout,
+)
+from hermes_cli.config import (
+    get_env_value,
+    get_hermes_home,
+    is_managed,
+    managed_error,
+    read_raw_config,
+    save_env_value,
+)
 # display_hermes_home is imported lazily at call sites to avoid ImportError
 # when hermes_constants is cached from a pre-update version during `hermes update`.
 from hermes_cli.setup import (
@@ -92,6 +104,59 @@ def _get_service_pids() -> set:
    return pids


+def _get_parent_pid(pid: int) -> int | None:
+    """Return the parent PID for ``pid``, or ``None`` when unavailable."""
+    if pid <= 1:
+        return None
+    try:
+        result = subprocess.run(
+            ["ps", "-o", "ppid=", "-p", str(pid)],
+            capture_output=True,
+            text=True,
+            timeout=5,
+        )
+    except (FileNotFoundError, subprocess.TimeoutExpired):
+        return None
+    if result.returncode != 0:
+        return None
+    raw = result.stdout.strip()
+    if not raw:
+        return None
+    try:
+        parent_pid = int(raw.splitlines()[-1].strip())
+    except ValueError:
+        return None
+    return parent_pid if parent_pid > 0 else None
+
+
+def _is_pid_ancestor_of_current_process(target_pid: int) -> bool:
+    """Return True when ``target_pid`` is this process or one of its ancestors."""
+    if target_pid <= 0:
+        return False
+
+    pid = os.getpid()
+    seen: set[int] = set()
+    while pid and pid not in seen:
+        if pid == target_pid:
+            return True
+        seen.add(pid)
+        pid = _get_parent_pid(pid) or 0
+    return False
+
+
+def _request_gateway_self_restart(pid: int) -> bool:
+    """Ask a running gateway ancestor to restart itself asynchronously."""
+    if not hasattr(signal, "SIGUSR1"):
+        return False
+    if not _is_pid_ancestor_of_current_process(pid):
+        return False
+    try:
+        os.kill(pid, signal.SIGUSR1)
+    except (ProcessLookupError, PermissionError, OSError):
+        return False
+    return True
+
+
 def find_gateway_pids(exclude_pids: set | None = None) -> list:
    """Find PIDs of running gateway processes.

@@ -665,6 +730,7 @@ def generate_systemd_unit(system: bool = False, run_as_user: str | None = None)
            path_entries.append(resolved_node_dir)

    common_bin_paths = ["/usr/local/sbin", "/usr/local/bin", "/usr/sbin", "/usr/bin", "/sbin", "/bin"]
+    restart_timeout = max(60, int(_get_restart_drain_timeout() or 0))

    if system:
        username, group_name, home_dir = _system_service_identity(run_as_user)
@@ -703,9 +769,11 @@ Environment="VIRTUAL_ENV={venv_dir}"
 Environment="HERMES_HOME={hermes_home}"
 Restart=on-failure
 RestartSec=30
+RestartForceExitStatus={GATEWAY_SERVICE_RESTART_EXIT_CODE}
 KillMode=mixed
 KillSignal=SIGTERM
-TimeoutStopSec=60
+ExecReload=/bin/kill -USR1 $MAINPID
+TimeoutStopSec={restart_timeout}
 StandardOutput=journal
 StandardError=journal

@@ -733,9 +801,11 @@ Environment="VIRTUAL_ENV={venv_dir}"
 Environment="HERMES_HOME={hermes_home}"
 Restart=on-failure
 RestartSec=30
+RestartForceExitStatus={GATEWAY_SERVICE_RESTART_EXIT_CODE}
 KillMode=mixed
 KillSignal=SIGTERM
-TimeoutStopSec=60
+ExecReload=/bin/kill -USR1 $MAINPID
+TimeoutStopSec={restart_timeout}
 StandardOutput=journal
 StandardError=journal

@@ -838,6 +908,20 @@ def _select_systemd_scope(system: bool = False) -> bool:
    return get_systemd_unit_path(system=True).exists() and not get_systemd_unit_path(system=False).exists()


+def _get_restart_drain_timeout() -> float:
+    """Return the configured gateway restart drain timeout in seconds."""
+    raw = os.getenv("HERMES_RESTART_DRAIN_TIMEOUT", "").strip()
+    if not raw:
+        cfg = read_raw_config()
+        agent_cfg = cfg.get("agent", {}) if isinstance(cfg, dict) else {}
+        raw = str(
+            agent_cfg.get(
+                "restart_drain_timeout", DEFAULT_GATEWAY_RESTART_DRAIN_TIMEOUT
+            )
+        )
+    return parse_restart_drain_timeout(raw)
+
+
 def systemd_install(force: bool = False, system: bool = False, run_as_user: str | None = None):
    if system:
        _require_root_for_system_service("install")
@@ -923,7 +1007,13 @@ def systemd_restart(system: bool = False):
    if system:
        _require_root_for_system_service("restart")
    refresh_systemd_unit_if_needed(system=system)
-    subprocess.run(_systemctl_cmd(system) + ["restart", get_service_name()], check=True, timeout=90)
+    from gateway.status import get_running_pid
+
+    pid = get_running_pid()
+    if pid is not None and _request_gateway_self_restart(pid):
+        print(f"✓ {_service_scope_label(system).capitalize()} service restart requested")
+        return
+    subprocess.run(_systemctl_cmd(system) + ["reload-or-restart", get_service_name()], check=True, timeout=90)
    print(f"✓ {_service_scope_label(system).capitalize()} service restarted")


@@ -1211,7 +1301,7 @@ def launchd_stop():
    _wait_for_gateway_exit(timeout=10.0, force_after=5.0)
    print("✓ Service stopped")

-def _wait_for_gateway_exit(timeout: float = 10.0, force_after: float = 5.0):
+def _wait_for_gateway_exit(timeout: float = 10.0, force_after: float | None = 5.0) -> bool:
    """Wait for the gateway process (by saved PID) to exit.

    Uses the PID from the gateway.pid file — not launchd labels — so this
@@ -1226,21 +1316,21 @@ def _wait_for_gateway_exit(timeout: float = 10.0, force_after: float = 5.0):
    from gateway.status import get_running_pid

    deadline = time.monotonic() + timeout
-    force_deadline = time.monotonic() + force_after
+    force_deadline = (time.monotonic() + force_after) if force_after is not None else None
    force_sent = False

    while time.monotonic() < deadline:
        pid = get_running_pid()
        if pid is None:
-            return  # Process exited cleanly.
+            return True  # Process exited cleanly.

-        if not force_sent and time.monotonic() >= force_deadline:
+        if force_after is not None and not force_sent and time.monotonic() >= force_deadline:
            # Grace period expired — force-kill the specific PID.
            try:
                terminate_pid(pid, force=True)
                print(f"⚠ Gateway PID {pid} did not exit gracefully; sent SIGKILL")
            except (ProcessLookupError, PermissionError, OSError):
-                return  # Already gone or we can't touch it.
+                return True  # Already gone or we can't touch it.
            force_sent = True

        time.sleep(0.3)
@@ -1249,15 +1339,30 @@ def _wait_for_gateway_exit(timeout: float = 10.0, force_after: float = 5.0):
    remaining_pid = get_running_pid()
    if remaining_pid is not None:
        print(f"⚠ Gateway PID {remaining_pid} still running after {timeout}s — restart may fail")
+        return False
+    return True


 def launchd_restart():
    label = get_launchd_label()
    target = f"{_launchd_domain()}/{label}"
-    # Use kickstart -k so launchd performs an atomic kill+restart.
-    # A two-step stop/start from inside the gateway's own process tree
-    # would kill the shell before the start command is reached.
+    drain_timeout = _get_restart_drain_timeout()
+    from gateway.status import get_running_pid
+
    try:
+        pid = get_running_pid()
+        if pid is not None and _request_gateway_self_restart(pid):
+            print("✓ Service restart requested")
+            return
+        if pid is not None:
+            try:
+                terminate_pid(pid, force=False)
+            except (ProcessLookupError, PermissionError, OSError):
+                pid = None
+            if pid is not None:
+                exited = _wait_for_gateway_exit(timeout=drain_timeout, force_after=None)
+                if not exited:
+                    print(f"⚠ Gateway drain timed out after {drain_timeout:.0f}s — forcing launchd restart")
        subprocess.run(["launchctl", "kickstart", "-k", target], check=True, timeout=90)
        print("✓ Service restarted")
    except subprocess.CalledProcessError as e:
@@ -1728,6 +1833,8 @@ def _runtime_health_lines() -> list[str]:
    lines: list[str] = []
    gateway_state = state.get("gateway_state")
    exit_reason = state.get("exit_reason")
+    active_agents = state.get("active_agents")
+    restart_requested = state.get("restart_requested")
    platforms = state.get("platforms", {}) or {}

    for platform, pdata in platforms.items():
@@ -1737,6 +1844,10 @@ def _runtime_health_lines() -> list[str]:

    if gateway_state == "startup_failed" and exit_reason:
        lines.append(f"⚠ Last startup issue: {exit_reason}")
+    elif gateway_state == "draining":
+        action = "restart" if restart_requested else "shutdown"
+        count = int(active_agents or 0)
+        lines.append(f"⏳ Gateway draining for {action} ({count} active agent(s))")
    elif gateway_state == "stopped" and exit_reason:
        lines.append(f"⚠ Last shutdown reason: {exit_reason}")

@@ -16,7 +16,7 @@ dependencies = [
  "anthropic>=0.39.0,<1",
  "python-dotenv>=1.2.1,<2",
  "fire>=0.7.1,<1",
-  "httpx>=0.28.1,<1",
+  "httpx[socks]>=0.28.1,<1",
  "rich>=14.3.3,<15",
  "tenacity>=9.1.4,<10",
  "pyyaml>=6.0.2,<7",
@@ -88,10 +88,10 @@ all = [
  "hermes-agent[modal]",
  "hermes-agent[daytona]",
  "hermes-agent[messaging]",
-  # matrix excluded: python-olm (required by matrix-nio[e2e]) is upstream-broken
-  # on modern macOS (archived libolm, C++ errors with Clang 21+). Including it
-  # here causes the entire [all] install to fail, dropping all other extras.
-  # Users who need Matrix can install manually: pip install 'hermes-agent[matrix]'
+  # matrix: python-olm (required by matrix-nio[e2e]) is upstream-broken on
+  # modern macOS (archived libolm, C++ errors with Clang 21+).  On Linux the
+  # [matrix] extra's own marker pulls in the [e2e] variant automatically.
+  "hermes-agent[matrix]; sys_platform == 'linux'",
  "hermes-agent[cron]",
  "hermes-agent[cli]",
  "hermes-agent[dev]",
@@ -1977,19 +1977,14 @@ class AIAgent:
            except Exception as e:
                logger.debug("Background memory/skill review failed: %s", e)
            finally:
-                # Explicitly close the OpenAI/httpx client so GC doesn't
-                # try to clean it up on a dead asyncio event loop (which
-                # produces "Event loop is closed" errors in the terminal).
+                # Close all resources (httpx client, subprocesses, etc.) so
+                # GC doesn't try to clean them up on a dead asyncio event
+                # loop (which produces "Event loop is closed" errors).
                if review_agent is not None:
-                    client = getattr(review_agent, "client", None)
-                    if client is not None:
-                        try:
-                            review_agent._close_openai_client(
-                                client, reason="bg_review_done", shared=True
-                            )
-                            review_agent.client = None
-                        except Exception:
-                            pass
+                    try:
+                        review_agent.close()
+                    except Exception:
+                        pass

        t = threading.Thread(target=_run_review, daemon=True, name="bg-review")
        t.start()
@@ -2729,6 +2724,64 @@ class AIAgent:
            except Exception:
                pass
    
+    def close(self) -> None:
+        """Release all resources held by this agent instance.
+
+        Cleans up subprocess resources that would otherwise become orphans:
+        - Background processes tracked in ProcessRegistry
+        - Terminal sandbox environments
+        - Browser daemon sessions
+        - Active child agents (subagent delegation)
+        - OpenAI/httpx client connections
+
+        Safe to call multiple times (idempotent).  Each cleanup step is
+        independently guarded so a failure in one does not prevent the rest.
+        """
+        task_id = getattr(self, "session_id", None) or ""
+
+        # 1. Kill background processes for this task
+        try:
+            from tools.process_registry import process_registry
+            process_registry.kill_all(task_id=task_id)
+        except Exception:
+            pass
+
+        # 2. Clean terminal sandbox environments
+        try:
+            from tools.terminal_tool import cleanup_vm
+            cleanup_vm(task_id)
+        except Exception:
+            pass
+
+        # 3. Clean browser daemon sessions
+        try:
+            from tools.browser_tool import cleanup_browser
+            cleanup_browser(task_id)
+        except Exception:
+            pass
+
+        # 4. Close active child agents
+        try:
+            with self._active_children_lock:
+                children = list(self._active_children)
+                self._active_children.clear()
+            for child in children:
+                try:
+                    child.close()
+                except Exception:
+                    pass
+        except Exception:
+            pass
+
+        # 5. Close the OpenAI/httpx client
+        try:
+            client = getattr(self, "client", None)
+            if client is not None:
+                self._close_openai_client(client, reason="agent_close", shared=True)
+                self.client = None
+        except Exception:
+            pass
+
    def _hydrate_todo_store(self, history: List[Dict[str, Any]]) -> None:
        """
        Recover todo state from conversation history.
@@ -7708,6 +7761,7 @@ class AIAgent:

            finish_reason = "stop"
            response = None  # Guard against UnboundLocalError if all retries fail
+            api_kwargs = None  # Guard against UnboundLocalError in except handler

            while retry_count < max_retries:
                try:
@@ -8742,9 +8796,10 @@ class AIAgent:
                        if self._try_activate_fallback():
                            retry_count = 0
                            continue
-                        self._dump_api_request_debug(
-                            api_kwargs, reason="non_retryable_client_error", error=api_error,
-                        )
+                        if api_kwargs is not None:
+                            self._dump_api_request_debug(
+                                api_kwargs, reason="non_retryable_client_error", error=api_error,
+                            )
                        self._emit_status(
                            f"❌ Non-retryable error (HTTP {status_code}): "
                            f"{self._summarize_api_error(api_error)}"
@@ -8847,9 +8902,10 @@ class AIAgent:
                            self.log_prefix, max_retries, _final_summary,
                            _provider, _model, len(api_messages), f"{approx_tokens:,}",
                        )
-                        self._dump_api_request_debug(
-                            api_kwargs, reason="max_retries_exhausted", error=api_error,
-                        )
+                        if api_kwargs is not None:
+                            self._dump_api_request_debug(
+                                api_kwargs, reason="max_retries_exhausted", error=api_error,
+                            )
                        self._persist_session(messages, conversation_history)
                        _final_response = f"API call failed after {max_retries} retries: {_final_summary}"
                        if _is_stream_drop:
@@ -1082,10 +1082,19 @@ install_node_deps() {
        log_success "Node.js dependencies installed"

        # Install Playwright browser + system dependencies.
-        # Playwright's install-deps only supports apt/dnf/zypper natively.
+        # Playwright's --with-deps only supports apt-based systems natively.
        # For Arch/Manjaro we install the system libs via pacman first.
+        # Other systems must install Chromium dependencies manually.
        log_info "Installing browser engine (Playwright Chromium)..."
        case "$DISTRO" in
+            ubuntu|debian|raspbian|pop|linuxmint|elementary|zorin|kali|parrot)
+                log_info "Playwright may request sudo to install browser system dependencies (shared libraries)."
+                log_info "This is standard Playwright setup — Hermes itself does not require root access."
+                cd "$INSTALL_DIR" && npx playwright install --with-deps chromium 2>/dev/null || {
+                    log_warn "Playwright browser installation failed — browser tools will not work."
+                    log_warn "Try running manually: cd $INSTALL_DIR && npx playwright install --with-deps chromium"
+                }
+                ;;
            arch|manjaro)
                if command -v pacman &> /dev/null; then
                    log_info "Arch/Manjaro detected — installing Chromium system dependencies via pacman..."
@@ -1100,15 +1109,35 @@ install_node_deps() {
                        log_warn "  sudo pacman -S nss atk at-spi2-core cups libdrm libxkbcommon mesa pango cairo alsa-lib"
                    fi
                fi
-                cd "$INSTALL_DIR" && npx playwright install chromium 2>/dev/null || true
+                cd "$INSTALL_DIR" && npx playwright install chromium 2>/dev/null || {
+                    log_warn "Playwright browser installation failed — browser tools will not work."
+                }
+                ;;
+            fedora|rhel|centos|rocky|alma)
+                log_warn "Playwright does not support automatic dependency installation on RPM-based systems."
+                log_info "Install Chromium system dependencies manually before using browser tools:"
+                log_info "  sudo dnf install nss atk at-spi2-core cups-libs libdrm libxkbcommon mesa-libgbm pango cairo alsa-lib"
+                cd "$INSTALL_DIR" && npx playwright install chromium 2>/dev/null || {
+                    log_warn "Playwright browser installation failed — install dependencies above and retry."
+                }
+                ;;
+            opensuse*|sles)
+                log_warn "Playwright does not support automatic dependency installation on zypper-based systems."
+                log_info "Install Chromium system dependencies manually before using browser tools:"
+                log_info "  sudo zypper install mozilla-nss libatk-1_0-0 at-spi2-core cups-libs libdrm2 libxkbcommon0 Mesa-libgbm1 pango cairo libasound2"
+                cd "$INSTALL_DIR" && npx playwright install chromium 2>/dev/null || {
+                    log_warn "Playwright browser installation failed — install dependencies above and retry."
+                }
                ;;
            *)
-                log_info "Playwright may request sudo to install browser system dependencies (shared libraries)."
-                log_info "This is standard Playwright setup — Hermes itself does not require root access."
-                cd "$INSTALL_DIR" && npx playwright install --with-deps chromium 2>/dev/null || true
+                log_warn "Playwright does not support automatic dependency installation on $DISTRO."
+                log_info "Install Chromium/browser system dependencies for your distribution, then run:"
+                log_info "  cd $INSTALL_DIR && npx playwright install chromium"
+                log_info "Browser tools will not work until dependencies are installed."
+                cd "$INSTALL_DIR" && npx playwright install chromium 2>/dev/null || true
                ;;
        esac
-        log_success "Browser engine installed"
+        log_success "Browser engine setup complete"
    fi

    # Install WhatsApp bridge dependencies
@@ -658,6 +658,19 @@ class TestGetTextAuxiliaryClient:
        assert client is None
        assert model is None

+    def test_custom_endpoint_uses_codex_wrapper_when_runtime_requests_responses_api(self):
+        with patch("agent.auxiliary_client._resolve_custom_runtime",
+                   return_value=("https://api.openai.com/v1", "sk-test", "codex_responses")), \
+             patch("agent.auxiliary_client._read_main_model", return_value="gpt-5.3-codex"), \
+             patch("agent.auxiliary_client.OpenAI") as mock_openai:
+            client, model = get_text_auxiliary_client()
+
+        from agent.auxiliary_client import CodexAuxiliaryClient
+        assert isinstance(client, CodexAuxiliaryClient)
+        assert model == "gpt-5.3-codex"
+        assert mock_openai.call_args.kwargs["base_url"] == "https://api.openai.com/v1"
+        assert mock_openai.call_args.kwargs["api_key"] == "sk-test"
+

 class TestVisionClientFallback:
    """Vision client auto mode resolves known-good multimodal backends."""
@@ -1,4 +1,4 @@
-"""Shared fixtures for Telegram gateway e2e tests.
+"""Shared fixtures for gateway e2e tests (Telegram, Discord).

 These tests exercise the full async message flow:
    adapter.handle_message(event)
@@ -14,19 +14,22 @@ import sys
 import uuid
 from datetime import datetime
 from types import SimpleNamespace
-from unittest.mock import AsyncMock, MagicMock
+from unittest.mock import AsyncMock, MagicMock, patch
+
+import pytest

 from gateway.config import GatewayConfig, Platform, PlatformConfig
 from gateway.platforms.base import MessageEvent, SendResult
 from gateway.session import SessionEntry, SessionSource, build_session_key


-#Ensure telegram module is available (mock it if not installed)
+# Platform library mocks

+# Ensure telegram module is available (mock it if not installed)
 def _ensure_telegram_mock():
    """Install mock telegram modules so TelegramAdapter can be imported."""
    if "telegram" in sys.modules and hasattr(sys.modules["telegram"], "__file__"):
-        return  # Real library installed
+        return # Real library installed

    telegram_mod = MagicMock()
    telegram_mod.Update = MagicMock()
@@ -51,24 +54,118 @@ def _ensure_telegram_mock():
        sys.modules.setdefault(name, telegram_mod)


-_ensure_telegram_mock()
+# Ensure discord module is available (mock it if not installed)
+def _ensure_discord_mock():
+    """Install mock discord modules so DiscordAdapter can be imported."""
+    if "discord" in sys.modules and hasattr(sys.modules["discord"], "__file__"):
+        return # Real library installed

+    discord_mod = MagicMock()
+    discord_mod.Intents.default.return_value = MagicMock()
+    discord_mod.DMChannel = type("DMChannel", (), {})
+    discord_mod.Thread = type("Thread", (), {})
+    discord_mod.ForumChannel = type("ForumChannel", (), {})
+    discord_mod.Interaction = object
+    discord_mod.app_commands = SimpleNamespace(
+        describe=lambda **kwargs: (lambda fn: fn),
+        choices=lambda **kwargs: (lambda fn: fn),
+        Choice=lambda **kwargs: SimpleNamespace(**kwargs),
+    )
+    discord_mod.opus.is_loaded.return_value = True
+
+    ext_mod = MagicMock()
+    commands_mod = MagicMock()
+    commands_mod.Bot = MagicMock
+    ext_mod.commands = commands_mod
+
+    sys.modules.setdefault("discord", discord_mod)
+    sys.modules.setdefault("discord.ext", ext_mod)
+    sys.modules.setdefault("discord.ext.commands", commands_mod)
+    sys.modules.setdefault("discord.opus", discord_mod.opus)
+
+
+def _ensure_slack_mock():
+    """Install mock slack modules so SlackAdapter can be imported."""
+    if "slack_bolt" in sys.modules and hasattr(sys.modules["slack_bolt"], "__file__"):
+        return  # Real library installed
+
+    slack_bolt = MagicMock()
+    slack_bolt.async_app.AsyncApp = MagicMock
+    slack_bolt.adapter.socket_mode.async_handler.AsyncSocketModeHandler = MagicMock
+
+    slack_sdk = MagicMock()
+    slack_sdk.web.async_client.AsyncWebClient = MagicMock
+
+    for name, mod in [
+        ("slack_bolt", slack_bolt),
+        ("slack_bolt.async_app", slack_bolt.async_app),
+        ("slack_bolt.adapter", slack_bolt.adapter),
+        ("slack_bolt.adapter.socket_mode", slack_bolt.adapter.socket_mode),
+        ("slack_bolt.adapter.socket_mode.async_handler", slack_bolt.adapter.socket_mode.async_handler),
+        ("slack_sdk", slack_sdk),
+        ("slack_sdk.web", slack_sdk.web),
+        ("slack_sdk.web.async_client", slack_sdk.web.async_client),
+    ]:
+        sys.modules.setdefault(name, mod)
+
+
+_ensure_telegram_mock()
+_ensure_discord_mock()
+_ensure_slack_mock()
+
+from gateway.platforms.discord import DiscordAdapter   # noqa: E402
 from gateway.platforms.telegram import TelegramAdapter  # noqa: E402

+import gateway.platforms.slack as _slack_mod  # noqa: E402
+_slack_mod.SLACK_AVAILABLE = True
+from gateway.platforms.slack import SlackAdapter  # noqa: E402

-#GatewayRunner factory (based on tests/gateway/test_status_command.py)

-def make_runner(session_entry: SessionEntry) -> "GatewayRunner":
+# Platform-generic factories
+
+def make_source(platform: Platform, chat_id: str = "e2e-chat-1", user_id: str = "e2e-user-1") -> SessionSource:
+    return SessionSource(
+        platform=platform,
+        chat_id=chat_id,
+        user_id=user_id,
+        user_name="e2e_tester",
+        chat_type="dm",
+    )
+
+
+def make_session_entry(platform: Platform, source: SessionSource = None) -> SessionEntry:
+    source = source or make_source(platform)
+    return SessionEntry(
+        session_key=build_session_key(source),
+        session_id=f"sess-{uuid.uuid4().hex[:8]}",
+        created_at=datetime.now(),
+        updated_at=datetime.now(),
+        platform=platform,
+        chat_type="dm",
+    )
+
+
+def make_event(platform: Platform, text: str = "/help", chat_id: str = "e2e-chat-1", user_id: str = "e2e-user-1") -> MessageEvent:
+    return MessageEvent(
+        text=text,
+        source=make_source(platform, chat_id, user_id),
+        message_id=f"msg-{uuid.uuid4().hex[:8]}",
+    )
+
+
+def make_runner(platform: Platform, session_entry: SessionEntry = None) -> "GatewayRunner":
    """Create a GatewayRunner with mocked internals for e2e testing.

    Skips __init__ to avoid filesystem/network side effects.
-    All command-dispatch dependencies are wired manually.
    """
    from gateway.run import GatewayRunner

+    if session_entry is None:
+        session_entry = make_session_entry(platform)
+
    runner = object.__new__(GatewayRunner)
    runner.config = GatewayConfig(
-        platforms={Platform.TELEGRAM: PlatformConfig(enabled=True, token="e2e-test-token")}
+        platforms={platform: PlatformConfig(enabled=True, token="e2e-test-token")}
    )
    runner.adapters = {}
    runner._voice_mode = {}
@@ -99,7 +196,6 @@ def make_runner(session_entry: SessionEntry) -> "GatewayRunner":
    runner._capture_gateway_honcho_if_configured = lambda *a, **kw: None
    runner._emit_gateway_run_progress = AsyncMock()

-    # Pairing store (used by authorization rejection path)
    runner.pairing_store = MagicMock()
    runner.pairing_store._is_rate_limited = MagicMock(return_value=False)
    runner.pairing_store.generate_code = MagicMock(return_value="ABC123")
@@ -107,67 +203,63 @@ def make_runner(session_entry: SessionEntry) -> "GatewayRunner":
    return runner


-#TelegramAdapter factory
+def make_adapter(platform: Platform, runner=None):
+    """Create a platform adapter wired to *runner*, with send methods mocked."""
+    if runner is None:
+        runner = make_runner(platform)

-def make_adapter(runner) -> TelegramAdapter:
-    """Create a TelegramAdapter wired to *runner*, with send methods mocked.
-
-    connect() is NOT called — no polling, no token lock, no real HTTP.
-    """
    config = PlatformConfig(enabled=True, token="e2e-test-token")
-    adapter = TelegramAdapter(config)

-    # Mock outbound methods so tests can capture what was sent
+    if platform == Platform.DISCORD:
+        with patch.object(DiscordAdapter, "_load_participated_threads", return_value=set()):
+            adapter = DiscordAdapter(config)
+        platform_key = Platform.DISCORD
+    elif platform == Platform.SLACK:
+        adapter = SlackAdapter(config)
+        platform_key = Platform.SLACK
+    else:
+        adapter = TelegramAdapter(config)
+        platform_key = Platform.TELEGRAM
+
    adapter.send = AsyncMock(return_value=SendResult(success=True, message_id="e2e-resp-1"))
    adapter.send_typing = AsyncMock()

-    # Wire adapter ↔ runner
    adapter.set_message_handler(runner._handle_message)
-    runner.adapters[Platform.TELEGRAM] = adapter
+    runner.adapters[platform_key] = adapter

    return adapter


-#Helpers
-
-def make_source(chat_id: str = "e2e-chat-1", user_id: str = "e2e-user-1") -> SessionSource:
-    return SessionSource(
-        platform=Platform.TELEGRAM,
-        chat_id=chat_id,
-        user_id=user_id,
-        user_name="e2e_tester",
-        chat_type="dm",
-    )
-
-
-def make_event(text: str, chat_id: str = "e2e-chat-1", user_id: str = "e2e-user-1") -> MessageEvent:
-    return MessageEvent(
-        text=text,
-        source=make_source(chat_id, user_id),
-        message_id=f"msg-{uuid.uuid4().hex[:8]}",
-    )
-
-
-def make_session_entry(source: SessionSource = None) -> SessionEntry:
-    source = source or make_source()
-    return SessionEntry(
-        session_key=build_session_key(source),
-        session_id=f"sess-{uuid.uuid4().hex[:8]}",
-        created_at=datetime.now(),
-        updated_at=datetime.now(),
-        platform=Platform.TELEGRAM,
-        chat_type="dm",
-    )
-
-
-async def send_and_capture(adapter: TelegramAdapter, text: str, **event_kwargs) -> AsyncMock:
-    """Send a message through the full e2e flow and return the send mock.
-
-    Drives: adapter.handle_message → background task → runner dispatch → adapter.send.
-    """
-    event = make_event(text, **event_kwargs)
+async def send_and_capture(adapter, text: str, platform: Platform, **event_kwargs) -> AsyncMock:
+    """Send a message through the full e2e flow and return the send mock."""
+    event = make_event(platform, text, **event_kwargs)
    adapter.send.reset_mock()
    await adapter.handle_message(event)
-    # Let the background task complete
    await asyncio.sleep(0.3)
    return adapter.send
+
+
+# Parametrized fixtures for platform-generic tests
+@pytest.fixture(params=[Platform.TELEGRAM, Platform.DISCORD, Platform.SLACK], ids=["telegram", "discord", "slack"])
+def platform(request):
+    return request.param
+
+
+@pytest.fixture()
+def source(platform):
+    return make_source(platform)
+
+
+@pytest.fixture()
+def session_entry(platform, source):
+    return make_session_entry(platform, source)
+
+
+@pytest.fixture()
+def runner(platform, session_entry):
+    return make_runner(platform, session_entry)
+
+
+@pytest.fixture()
+def adapter(platform, runner):
+    return make_adapter(platform, runner)
@@ -1,4 +1,4 @@
-"""E2E tests for Telegram gateway slash commands.
+"""E2E tests for gateway slash commands (Telegram, Discord).

 Each test drives a message through the full async pipeline:
    adapter.handle_message(event)
@@ -7,6 +7,7 @@ Each test drives a message through the full async pipeline:
        → adapter.send() (captured for assertions)

 No LLM involved — only gateway-level commands are tested.
+Tests are parametrized over platforms via the ``platform`` fixture in conftest.
 """

 import asyncio
@@ -15,46 +16,15 @@ from unittest.mock import AsyncMock
 import pytest

 from gateway.platforms.base import SendResult
-from tests.e2e.conftest import (
-    make_adapter,
-    make_event,
-    make_runner,
-    make_session_entry,
-    make_source,
-    send_and_capture,
-)
+from tests.e2e.conftest import make_event, send_and_capture


-#Fixtures
-
-@pytest.fixture()
-def source():
-    return make_source()
-
-
-@pytest.fixture()
-def session_entry(source):
-    return make_session_entry(source)
-
-
-@pytest.fixture()
-def runner(session_entry):
-    return make_runner(session_entry)
-
-
-@pytest.fixture()
-def adapter(runner):
-    return make_adapter(runner)
-
-
-#Tests
-
-class TestTelegramSlashCommands:
+class TestSlashCommands:
    """Gateway slash commands dispatched through the full adapter pipeline."""

    @pytest.mark.asyncio
-    async def test_help_returns_command_list(self, adapter):
-        send = await send_and_capture(adapter, "/help")
+    async def test_help_returns_command_list(self, adapter, platform):
+        send = await send_and_capture(adapter, "/help", platform)

        send.assert_called_once()
        response_text = send.call_args[1].get("content") or send.call_args[0][1]
@@ -62,24 +32,23 @@ class TestTelegramSlashCommands:
        assert "/status" in response_text

    @pytest.mark.asyncio
-    async def test_status_shows_session_info(self, adapter):
-        send = await send_and_capture(adapter, "/status")
+    async def test_status_shows_session_info(self, adapter, platform):
+        send = await send_and_capture(adapter, "/status", platform)

        send.assert_called_once()
        response_text = send.call_args[1].get("content") or send.call_args[0][1]
-        # Status output includes session metadata
        assert "session" in response_text.lower() or "Session" in response_text

    @pytest.mark.asyncio
-    async def test_new_resets_session(self, adapter, runner):
-        send = await send_and_capture(adapter, "/new")
+    async def test_new_resets_session(self, adapter, runner, platform):
+        send = await send_and_capture(adapter, "/new", platform)

        send.assert_called_once()
        runner.session_store.reset_session.assert_called_once()

    @pytest.mark.asyncio
-    async def test_stop_when_no_agent_running(self, adapter):
-        send = await send_and_capture(adapter, "/stop")
+    async def test_stop_when_no_agent_running(self, adapter, platform):
+        send = await send_and_capture(adapter, "/stop", platform)

        send.assert_called_once()
        response_text = send.call_args[1].get("content") or send.call_args[0][1]
@@ -87,8 +56,8 @@ class TestTelegramSlashCommands:
        assert "no" in response_lower or "stop" in response_lower or "not running" in response_lower

    @pytest.mark.asyncio
-    async def test_commands_shows_listing(self, adapter):
-        send = await send_and_capture(adapter, "/commands")
+    async def test_commands_shows_listing(self, adapter, platform):
+        send = await send_and_capture(adapter, "/commands", platform)

        send.assert_called_once()
        response_text = send.call_args[1].get("content") or send.call_args[0][1]
@@ -96,25 +65,25 @@ class TestTelegramSlashCommands:
        assert "/" in response_text

    @pytest.mark.asyncio
-    async def test_sequential_commands_share_session(self, adapter):
+    async def test_sequential_commands_share_session(self, adapter, platform):
        """Two commands from the same chat_id should both succeed."""
-        send_help = await send_and_capture(adapter, "/help")
+        send_help = await send_and_capture(adapter, "/help", platform)
        send_help.assert_called_once()

-        send_status = await send_and_capture(adapter, "/status")
+        send_status = await send_and_capture(adapter, "/status", platform)
        send_status.assert_called_once()

    @pytest.mark.asyncio
-    async def test_provider_shows_current_provider(self, adapter):
-        send = await send_and_capture(adapter, "/provider")
+    async def test_provider_shows_current_provider(self, adapter, platform):
+        send = await send_and_capture(adapter, "/provider", platform)

        send.assert_called_once()
        response_text = send.call_args[1].get("content") or send.call_args[0][1]
        assert "provider" in response_text.lower()

    @pytest.mark.asyncio
-    async def test_verbose_responds(self, adapter):
-        send = await send_and_capture(adapter, "/verbose")
+    async def test_verbose_responds(self, adapter, platform):
+        send = await send_and_capture(adapter, "/verbose", platform)

        send.assert_called_once()
        response_text = send.call_args[1].get("content") or send.call_args[0][1]
@@ -122,42 +91,50 @@ class TestTelegramSlashCommands:
        assert "verbose" in response_text.lower() or "tool_progress" in response_text

    @pytest.mark.asyncio
-    async def test_personality_lists_options(self, adapter):
-        send = await send_and_capture(adapter, "/personality")
+    async def test_personality_lists_options(self, adapter, platform):
+        send = await send_and_capture(adapter, "/personality", platform)

        send.assert_called_once()
        response_text = send.call_args[1].get("content") or send.call_args[0][1]
        assert "personalit" in response_text.lower()  # matches "personality" or "personalities"

    @pytest.mark.asyncio
-    async def test_yolo_toggles_mode(self, adapter):
-        send = await send_and_capture(adapter, "/yolo")
+    async def test_yolo_toggles_mode(self, adapter, platform):
+        send = await send_and_capture(adapter, "/yolo", platform)

        send.assert_called_once()
        response_text = send.call_args[1].get("content") or send.call_args[0][1]
        assert "yolo" in response_text.lower()

+    @pytest.mark.asyncio
+    async def test_compress_command(self, adapter, platform):
+        send = await send_and_capture(adapter, "/compress", platform)
+
+        send.assert_called_once()
+        response_text = send.call_args[1].get("content") or send.call_args[0][1]
+        assert "compress" in response_text.lower() or "context" in response_text.lower()
+

 class TestSessionLifecycle:
    """Verify session state changes across command sequences."""

    @pytest.mark.asyncio
-    async def test_new_then_status_reflects_reset(self, adapter, runner, session_entry):
+    async def test_new_then_status_reflects_reset(self, adapter, runner, session_entry, platform):
        """After /new, /status should report the fresh session."""
-        await send_and_capture(adapter, "/new")
+        await send_and_capture(adapter, "/new", platform)
        runner.session_store.reset_session.assert_called_once()

-        send = await send_and_capture(adapter, "/status")
+        send = await send_and_capture(adapter, "/status", platform)
        send.assert_called_once()
        response_text = send.call_args[1].get("content") or send.call_args[0][1]
        # Session ID from the entry should appear in the status output
        assert session_entry.session_id[:8] in response_text

    @pytest.mark.asyncio
-    async def test_new_is_idempotent(self, adapter, runner):
+    async def test_new_is_idempotent(self, adapter, runner, platform):
        """/new called twice should not crash."""
-        await send_and_capture(adapter, "/new")
-        await send_and_capture(adapter, "/new")
+        await send_and_capture(adapter, "/new", platform)
+        await send_and_capture(adapter, "/new", platform)
        assert runner.session_store.reset_session.call_count == 2


@@ -165,11 +142,11 @@ class TestAuthorization:
    """Verify the pipeline handles unauthorized users."""

    @pytest.mark.asyncio
-    async def test_unauthorized_user_gets_pairing_response(self, adapter, runner):
+    async def test_unauthorized_user_gets_pairing_response(self, adapter, runner, platform):
        """Unauthorized DM should trigger pairing code, not a command response."""
        runner._is_user_authorized = lambda _source: False

-        event = make_event("/help")
+        event = make_event(platform, "/help")
        adapter.send.reset_mock()
        await adapter.handle_message(event)
        await asyncio.sleep(0.3)
@@ -181,11 +158,11 @@ class TestAuthorization:
        assert "recognize" in response_text.lower() or "pair" in response_text.lower() or "ABC123" in response_text

    @pytest.mark.asyncio
-    async def test_unauthorized_user_does_not_get_help(self, adapter, runner):
+    async def test_unauthorized_user_does_not_get_help(self, adapter, runner, platform):
        """Unauthorized user should NOT see the help command output."""
        runner._is_user_authorized = lambda _source: False

-        event = make_event("/help")
+        event = make_event(platform, "/help")
        adapter.send.reset_mock()
        await adapter.handle_message(event)
        await asyncio.sleep(0.3)
@@ -200,12 +177,12 @@ class TestSendFailureResilience:
    """Verify the pipeline handles send failures gracefully."""

    @pytest.mark.asyncio
-    async def test_send_failure_does_not_crash_pipeline(self, adapter):
+    async def test_send_failure_does_not_crash_pipeline(self, adapter, platform):
        """If send() returns failure, the pipeline should not raise."""
        adapter.send = AsyncMock(return_value=SendResult(success=False, error="network timeout"))
-        adapter.set_message_handler(adapter._message_handler)  # re-wire with same handler
+        adapter.set_message_handler(adapter._message_handler) # re-wire with same handler

-        event = make_event("/help")
+        event = make_event(platform, "/help")
        # Should not raise — pipeline handles send failures internally
        await adapter.handle_message(event)
        await asyncio.sleep(0.3)
@@ -0,0 +1,110 @@
+import asyncio
+from unittest.mock import AsyncMock, MagicMock
+
+from gateway.config import GatewayConfig, Platform, PlatformConfig
+from gateway.platforms.base import BasePlatformAdapter, MessageEvent, SendResult
+from gateway.restart import DEFAULT_GATEWAY_RESTART_DRAIN_TIMEOUT
+from gateway.run import GatewayRunner
+from gateway.session import SessionSource
+
+
+class RestartTestAdapter(BasePlatformAdapter):
+    def __init__(self):
+        super().__init__(PlatformConfig(enabled=True, token="***"), Platform.TELEGRAM)
+        self.sent: list[str] = []
+
+    async def connect(self):
+        return True
+
+    async def disconnect(self):
+        return None
+
+    async def send(self, chat_id, content, reply_to=None, metadata=None):
+        self.sent.append(content)
+        return SendResult(success=True, message_id="1")
+
+    async def send_typing(self, chat_id, metadata=None):
+        return None
+
+    async def get_chat_info(self, chat_id):
+        return {"id": chat_id}
+
+
+def make_restart_source(chat_id: str = "123456", chat_type: str = "dm") -> SessionSource:
+    return SessionSource(
+        platform=Platform.TELEGRAM,
+        chat_id=chat_id,
+        chat_type=chat_type,
+    )
+
+
+def make_restart_runner(
+    adapter: BasePlatformAdapter | None = None,
+) -> tuple[GatewayRunner, BasePlatformAdapter]:
+    runner = object.__new__(GatewayRunner)
+    runner.config = GatewayConfig(
+        platforms={Platform.TELEGRAM: PlatformConfig(enabled=True, token="***")}
+    )
+    runner._running = True
+    runner._shutdown_event = asyncio.Event()
+    runner._exit_reason = None
+    runner._exit_code = None
+    runner._running_agents = {}
+    runner._running_agents_ts = {}
+    runner._pending_messages = {}
+    runner._pending_approvals = {}
+    runner._pending_model_notes = {}
+    runner._background_tasks = set()
+    runner._draining = False
+    runner._restart_requested = False
+    runner._restart_task_started = False
+    runner._restart_detached = False
+    runner._restart_via_service = False
+    runner._restart_drain_timeout = DEFAULT_GATEWAY_RESTART_DRAIN_TIMEOUT
+    runner._stop_task = None
+    runner._busy_input_mode = "interrupt"
+    runner._update_prompt_pending = {}
+    runner._voice_mode = {}
+    runner._session_model_overrides = {}
+    runner._shutdown_all_gateway_honcho = lambda: None
+    runner._update_runtime_status = MagicMock()
+    runner._queue_or_replace_pending_event = GatewayRunner._queue_or_replace_pending_event.__get__(
+        runner, GatewayRunner
+    )
+    runner._session_key_for_source = GatewayRunner._session_key_for_source.__get__(
+        runner, GatewayRunner
+    )
+    runner._handle_active_session_busy_message = (
+        GatewayRunner._handle_active_session_busy_message.__get__(runner, GatewayRunner)
+    )
+    runner._handle_restart_command = GatewayRunner._handle_restart_command.__get__(
+        runner, GatewayRunner
+    )
+    runner._status_action_label = GatewayRunner._status_action_label.__get__(
+        runner, GatewayRunner
+    )
+    runner._status_action_gerund = GatewayRunner._status_action_gerund.__get__(
+        runner, GatewayRunner
+    )
+    runner._queue_during_drain_enabled = GatewayRunner._queue_during_drain_enabled.__get__(
+        runner, GatewayRunner
+    )
+    runner._running_agent_count = GatewayRunner._running_agent_count.__get__(
+        runner, GatewayRunner
+    )
+    runner._launch_detached_restart_command = GatewayRunner._launch_detached_restart_command.__get__(
+        runner, GatewayRunner
+    )
+    runner.request_restart = GatewayRunner.request_restart.__get__(runner, GatewayRunner)
+    runner._is_user_authorized = lambda _source: True
+    runner.hooks = MagicMock()
+    runner.hooks.emit = AsyncMock()
+    runner.pairing_store = MagicMock()
+    runner.session_store = MagicMock()
+    runner.delivery_router = MagicMock()
+
+    platform_adapter = adapter or RestartTestAdapter()
+    platform_adapter.set_message_handler(AsyncMock(return_value=None))
+    platform_adapter.set_busy_session_handler(runner._handle_active_session_busy_message)
+    runner.adapters = {Platform.TELEGRAM: platform_adapter}
+    return runner, platform_adapter
@@ -3,43 +3,15 @@ from unittest.mock import AsyncMock, MagicMock, patch

 import pytest

-from gateway.config import GatewayConfig, Platform, PlatformConfig
-from gateway.platforms.base import BasePlatformAdapter, MessageEvent, SendResult
-from gateway.run import GatewayRunner
-from gateway.session import SessionSource, build_session_key
-
-
-class StubAdapter(BasePlatformAdapter):
-    def __init__(self):
-        super().__init__(PlatformConfig(enabled=True, token="***"), Platform.TELEGRAM)
-
-    async def connect(self):
-        return True
-
-    async def disconnect(self):
-        return None
-
-    async def send(self, chat_id, content, reply_to=None, metadata=None):
-        return SendResult(success=True, message_id="1")
-
-    async def send_typing(self, chat_id, metadata=None):
-        return None
-
-    async def get_chat_info(self, chat_id):
-        return {"id": chat_id}
-
-
-def _source(chat_id="123456", chat_type="dm"):
-    return SessionSource(
-        platform=Platform.TELEGRAM,
-        chat_id=chat_id,
-        chat_type=chat_type,
-    )
+from gateway.platforms.base import MessageEvent
+from gateway.restart import GATEWAY_SERVICE_RESTART_EXIT_CODE
+from gateway.session import build_session_key
+from tests.gateway.restart_test_helpers import make_restart_runner, make_restart_source


@pytest.mark.asyncio
 async def test_cancel_background_tasks_cancels_inflight_message_processing():
-    adapter = StubAdapter()
+    _runner, adapter = make_restart_runner()
    release = asyncio.Event()

    async def block_forever(_event):
@@ -47,7 +19,7 @@ async def test_cancel_background_tasks_cancels_inflight_message_processing():
        return None

    adapter.set_message_handler(block_forever)
-    event = MessageEvent(text="work", source=_source(), message_id="1")
+    event = MessageEvent(text="work", source=make_restart_source(), message_id="1")

    await adapter.handle_message(event)
    await asyncio.sleep(0)
@@ -65,17 +37,11 @@ async def test_cancel_background_tasks_cancels_inflight_message_processing():

@pytest.mark.asyncio
 async def test_gateway_stop_interrupts_running_agents_and_cancels_adapter_tasks():
-    runner = object.__new__(GatewayRunner)
-    runner.config = GatewayConfig(platforms={Platform.TELEGRAM: PlatformConfig(enabled=True, token="***")})
-    runner._running = True
-    runner._shutdown_event = asyncio.Event()
-    runner._exit_reason = None
+    runner, adapter = make_restart_runner()
    runner._pending_messages = {"session": "pending text"}
    runner._pending_approvals = {"session": {"command": "rm -rf /tmp/x"}}
-    runner._background_tasks = set()
-    runner._shutdown_all_gateway_honcho = lambda: None
+    runner._restart_drain_timeout = 0.0

-    adapter = StubAdapter()
    release = asyncio.Event()

    async def block_forever(_event):
@@ -83,7 +49,7 @@ async def test_gateway_stop_interrupts_running_agents_and_cancels_adapter_tasks(
        return None

    adapter.set_message_handler(block_forever)
-    event = MessageEvent(text="work", source=_source(), message_id="1")
+    event = MessageEvent(text="work", source=make_restart_source(), message_id="1")
    await adapter.handle_message(event)
    await asyncio.sleep(0)

@@ -93,7 +59,6 @@ async def test_gateway_stop_interrupts_running_agents_and_cancels_adapter_tasks(
    session_key = build_session_key(event.source)
    running_agent = MagicMock()
    runner._running_agents = {session_key: running_agent}
-    runner.adapters = {Platform.TELEGRAM: adapter}

    with patch("gateway.status.remove_pid_file"), patch("gateway.status.write_runtime_status"):
        await runner.stop()
@@ -105,3 +70,78 @@ async def test_gateway_stop_interrupts_running_agents_and_cancels_adapter_tasks(
    assert runner._pending_messages == {}
    assert runner._pending_approvals == {}
    assert runner._shutdown_event.is_set() is True
+
+
+@pytest.mark.asyncio
+async def test_gateway_stop_drains_running_agents_before_disconnect():
+    runner, adapter = make_restart_runner()
+    disconnect_mock = AsyncMock()
+    adapter.disconnect = disconnect_mock
+
+    running_agent = MagicMock()
+    runner._running_agents = {"session": running_agent}
+
+    async def finish_agent():
+        await asyncio.sleep(0.05)
+        runner._running_agents.clear()
+
+    asyncio.create_task(finish_agent())
+
+    with patch("gateway.status.remove_pid_file"), patch("gateway.status.write_runtime_status"):
+        await runner.stop()
+
+    running_agent.interrupt.assert_not_called()
+    disconnect_mock.assert_awaited_once()
+    assert runner._shutdown_event.is_set() is True
+
+
+@pytest.mark.asyncio
+async def test_gateway_stop_interrupts_after_drain_timeout():
+    runner, adapter = make_restart_runner()
+    runner._restart_drain_timeout = 0.05
+
+    disconnect_mock = AsyncMock()
+    adapter.disconnect = disconnect_mock
+
+    running_agent = MagicMock()
+    runner._running_agents = {"session": running_agent}
+
+    with patch("gateway.status.remove_pid_file"), patch("gateway.status.write_runtime_status"):
+        await runner.stop()
+
+    running_agent.interrupt.assert_called_once_with("Gateway shutting down")
+    disconnect_mock.assert_awaited_once()
+    assert runner._shutdown_event.is_set() is True
+
+
+@pytest.mark.asyncio
+async def test_gateway_stop_service_restart_sets_named_exit_code():
+    runner, adapter = make_restart_runner()
+    adapter.disconnect = AsyncMock()
+
+    with patch("gateway.status.remove_pid_file"), patch("gateway.status.write_runtime_status"):
+        await runner.stop(restart=True, service_restart=True)
+
+    assert runner._exit_code == GATEWAY_SERVICE_RESTART_EXIT_CODE
+
+
+@pytest.mark.asyncio
+async def test_drain_active_agents_throttles_status_updates():
+    runner, _adapter = make_restart_runner()
+    runner._update_runtime_status = MagicMock()
+
+    runner._running_agents = {"a": MagicMock(), "b": MagicMock()}
+
+    async def finish_agents():
+        await asyncio.sleep(0.12)
+        runner._running_agents.pop("a")
+        await asyncio.sleep(0.12)
+        runner._running_agents.clear()
+
+    task = asyncio.create_task(finish_agents())
+    await runner._drain_active_agents(1.0)
+    await task
+
+    # Start, one count-change update, and final update. Allow one extra update
+    # if the loop observes the zero-agent state before exiting.
+    assert 3 <= runner._update_runtime_status.call_count <= 4
@@ -436,6 +436,95 @@ class TestThreadPersistence:
        assert len(data) == 5


+# ---------------------------------------------------------------------------
+# DM mention-thread feature
+# ---------------------------------------------------------------------------
+
+
+@pytest.mark.asyncio
+async def test_dm_mention_thread_disabled_by_default(monkeypatch):
+    """Default (dm_mention_threads=false): DM with mention should NOT create a thread."""
+    monkeypatch.delenv("MATRIX_DM_MENTION_THREADS", raising=False)
+    monkeypatch.setenv("MATRIX_AUTO_THREAD", "false")
+
+    adapter = _make_adapter()
+    room = _make_room(member_count=2)
+    event = _make_event("@hermes:example.org help me", event_id="$dm1")
+
+    await adapter._on_room_message(room, event)
+    adapter.handle_message.assert_awaited_once()
+    msg = adapter.handle_message.await_args.args[0]
+    assert msg.source.thread_id is None
+
+
+@pytest.mark.asyncio
+async def test_dm_mention_thread_creates_thread(monkeypatch):
+    """MATRIX_DM_MENTION_THREADS=true: DM with @mention creates a thread."""
+    monkeypatch.setenv("MATRIX_DM_MENTION_THREADS", "true")
+    monkeypatch.setenv("MATRIX_AUTO_THREAD", "false")
+
+    adapter = _make_adapter()
+    room = _make_room(member_count=2)
+    event = _make_event("@hermes:example.org help me", event_id="$dm1")
+
+    with patch.object(adapter, "_save_participated_threads"):
+        await adapter._on_room_message(room, event)
+
+    adapter.handle_message.assert_awaited_once()
+    msg = adapter.handle_message.await_args.args[0]
+    assert msg.source.thread_id == "$dm1"
+    assert msg.text == "help me"
+
+
+@pytest.mark.asyncio
+async def test_dm_mention_thread_no_mention_no_thread(monkeypatch):
+    """MATRIX_DM_MENTION_THREADS=true: DM without mention does NOT create a thread."""
+    monkeypatch.setenv("MATRIX_DM_MENTION_THREADS", "true")
+    monkeypatch.setenv("MATRIX_AUTO_THREAD", "false")
+
+    adapter = _make_adapter()
+    room = _make_room(member_count=2)
+    event = _make_event("hello without mention", event_id="$dm1")
+
+    await adapter._on_room_message(room, event)
+    adapter.handle_message.assert_awaited_once()
+    msg = adapter.handle_message.await_args.args[0]
+    assert msg.source.thread_id is None
+
+
+@pytest.mark.asyncio
+async def test_dm_mention_thread_preserves_existing_thread(monkeypatch):
+    """MATRIX_DM_MENTION_THREADS=true: DM already in a thread keeps that thread_id."""
+    monkeypatch.setenv("MATRIX_DM_MENTION_THREADS", "true")
+    monkeypatch.setenv("MATRIX_AUTO_THREAD", "false")
+
+    adapter = _make_adapter()
+    adapter._bot_participated_threads.add("$existing_thread")
+    room = _make_room(member_count=2)
+    event = _make_event("@hermes:example.org help me", thread_id="$existing_thread")
+
+    await adapter._on_room_message(room, event)
+    adapter.handle_message.assert_awaited_once()
+    msg = adapter.handle_message.await_args.args[0]
+    assert msg.source.thread_id == "$existing_thread"
+
+
+@pytest.mark.asyncio
+async def test_dm_mention_thread_tracks_participation(monkeypatch):
+    """DM mention-thread tracks the thread in _bot_participated_threads."""
+    monkeypatch.setenv("MATRIX_DM_MENTION_THREADS", "true")
+    monkeypatch.setenv("MATRIX_AUTO_THREAD", "false")
+
+    adapter = _make_adapter()
+    room = _make_room(member_count=2)
+    event = _make_event("@hermes:example.org help", event_id="$dm1")
+
+    with patch.object(adapter, "_save_participated_threads"):
+        await adapter._on_room_message(room, event)
+
+    assert "$dm1" in adapter._bot_participated_threads
+
+
 # ---------------------------------------------------------------------------
 # YAML config bridge
 # ---------------------------------------------------------------------------
@@ -480,6 +569,25 @@ class TestMatrixConfigBridge:
        assert os.getenv("MATRIX_FREE_RESPONSE_ROOMS") == "!room1:example.org,!room2:example.org"
        assert os.getenv("MATRIX_AUTO_THREAD") == "false"

+    def test_yaml_bridge_sets_dm_mention_threads(self, monkeypatch, tmp_path):
+        """Matrix YAML dm_mention_threads should bridge to env var."""
+        monkeypatch.delenv("MATRIX_DM_MENTION_THREADS", raising=False)
+
+        import os
+        import yaml
+
+        yaml_content = {"matrix": {"dm_mention_threads": True}}
+        config_file = tmp_path / "config.yaml"
+        config_file.write_text(yaml.dump(yaml_content))
+
+        yaml_cfg = yaml.safe_load(config_file.read_text())
+        matrix_cfg = yaml_cfg.get("matrix", {})
+        if isinstance(matrix_cfg, dict):
+            if "dm_mention_threads" in matrix_cfg and not os.getenv("MATRIX_DM_MENTION_THREADS"):
+                monkeypatch.setenv("MATRIX_DM_MENTION_THREADS", str(matrix_cfg["dm_mention_threads"]).lower())
+
+        assert os.getenv("MATRIX_DM_MENTION_THREADS") == "true"
+
    def test_env_vars_take_precedence_over_yaml(self, monkeypatch):
        """Env vars should not be overwritten by YAML values."""
        monkeypatch.setenv("MATRIX_REQUIRE_MENTION", "true")
@@ -0,0 +1,160 @@
+import asyncio
+import shutil
+import subprocess
+from unittest.mock import AsyncMock, MagicMock
+
+import pytest
+
+import gateway.run as gateway_run
+from gateway.platforms.base import MessageEvent, MessageType
+from gateway.restart import DEFAULT_GATEWAY_RESTART_DRAIN_TIMEOUT
+from gateway.session import build_session_key
+from tests.gateway.restart_test_helpers import make_restart_runner, make_restart_source
+
+
+@pytest.mark.asyncio
+async def test_restart_command_while_busy_requests_drain_without_interrupt():
+    runner, _adapter = make_restart_runner()
+    runner.request_restart = MagicMock(return_value=True)
+    event = MessageEvent(
+        text="/restart",
+        message_type=MessageType.TEXT,
+        source=make_restart_source(),
+        message_id="m1",
+    )
+    session_key = build_session_key(event.source)
+    running_agent = MagicMock()
+    runner._running_agents[session_key] = running_agent
+
+    result = await runner._handle_message(event)
+
+    assert result == "⏳ Draining 1 active agent(s) before restart..."
+    running_agent.interrupt.assert_not_called()
+    runner.request_restart.assert_called_once_with(detached=True, via_service=False)
+
+
+@pytest.mark.asyncio
+async def test_drain_queue_mode_queues_follow_up_without_interrupt():
+    runner, adapter = make_restart_runner()
+    runner._draining = True
+    runner._restart_requested = True
+    runner._busy_input_mode = "queue"
+
+    event = MessageEvent(
+        text="follow up",
+        message_type=MessageType.TEXT,
+        source=make_restart_source(),
+        message_id="m2",
+    )
+    session_key = build_session_key(event.source)
+    adapter._active_sessions[session_key] = asyncio.Event()
+
+    await adapter.handle_message(event)
+
+    assert session_key in adapter._pending_messages
+    assert adapter._pending_messages[session_key].text == "follow up"
+    assert not adapter._active_sessions[session_key].is_set()
+    assert any("queued for the next turn" in message for message in adapter.sent)
+
+
+@pytest.mark.asyncio
+async def test_draining_rejects_new_session_messages():
+    runner, _adapter = make_restart_runner()
+    runner._draining = True
+    runner._restart_requested = True
+
+    event = MessageEvent(
+        text="hello",
+        message_type=MessageType.TEXT,
+        source=make_restart_source("fresh"),
+        message_id="m3",
+    )
+
+    result = await runner._handle_message(event)
+
+    assert result == "⏳ Gateway is restarting and is not accepting new work right now."
+
+
+def test_load_busy_input_mode_prefers_env_then_config_then_default(tmp_path, monkeypatch):
+    monkeypatch.setattr(gateway_run, "_hermes_home", tmp_path)
+    monkeypatch.delenv("HERMES_GATEWAY_BUSY_INPUT_MODE", raising=False)
+
+    assert gateway_run.GatewayRunner._load_busy_input_mode() == "interrupt"
+
+    (tmp_path / "config.yaml").write_text(
+        "display:\n  busy_input_mode: queue\n", encoding="utf-8"
+    )
+    assert gateway_run.GatewayRunner._load_busy_input_mode() == "queue"
+
+    monkeypatch.setenv("HERMES_GATEWAY_BUSY_INPUT_MODE", "interrupt")
+    assert gateway_run.GatewayRunner._load_busy_input_mode() == "interrupt"
+
+
+def test_load_restart_drain_timeout_prefers_env_then_config_then_default(
+    tmp_path, monkeypatch, caplog
+):
+    monkeypatch.setattr(gateway_run, "_hermes_home", tmp_path)
+    monkeypatch.delenv("HERMES_RESTART_DRAIN_TIMEOUT", raising=False)
+
+    assert (
+        gateway_run.GatewayRunner._load_restart_drain_timeout()
+        == DEFAULT_GATEWAY_RESTART_DRAIN_TIMEOUT
+    )
+
+    (tmp_path / "config.yaml").write_text(
+        "agent:\n  restart_drain_timeout: 12\n", encoding="utf-8"
+    )
+    assert gateway_run.GatewayRunner._load_restart_drain_timeout() == 12.0
+
+    monkeypatch.setenv("HERMES_RESTART_DRAIN_TIMEOUT", "7")
+    assert gateway_run.GatewayRunner._load_restart_drain_timeout() == 7.0
+
+    monkeypatch.setenv("HERMES_RESTART_DRAIN_TIMEOUT", "invalid")
+    assert (
+        gateway_run.GatewayRunner._load_restart_drain_timeout()
+        == DEFAULT_GATEWAY_RESTART_DRAIN_TIMEOUT
+    )
+    assert "Invalid restart_drain_timeout" in caplog.text
+
+
+@pytest.mark.asyncio
+async def test_request_restart_is_idempotent():
+    runner, _adapter = make_restart_runner()
+    runner.stop = AsyncMock()
+
+    assert runner.request_restart(detached=True, via_service=False) is True
+    first_task = next(iter(runner._background_tasks))
+    assert runner.request_restart(detached=True, via_service=False) is False
+
+    await first_task
+
+    runner.stop.assert_awaited_once_with(
+        restart=True, detached_restart=True, service_restart=False
+    )
+
+
+@pytest.mark.asyncio
+async def test_launch_detached_restart_command_uses_setsid(monkeypatch):
+    runner, _adapter = make_restart_runner()
+    popen_calls = []
+
+    monkeypatch.setattr(gateway_run, "_resolve_hermes_bin", lambda: ["/usr/bin/hermes"])
+    monkeypatch.setattr(gateway_run.os, "getpid", lambda: 321)
+    monkeypatch.setattr(shutil, "which", lambda cmd: "/usr/bin/setsid" if cmd == "setsid" else None)
+
+    def fake_popen(cmd, **kwargs):
+        popen_calls.append((cmd, kwargs))
+        return MagicMock()
+
+    monkeypatch.setattr(subprocess, "Popen", fake_popen)
+
+    await runner._launch_detached_restart_command()
+
+    assert len(popen_calls) == 1
+    cmd, kwargs = popen_calls[0]
+    assert cmd[:2] == ["/usr/bin/setsid", "bash"]
+    assert "gateway restart" in cmd[-1]
+    assert "kill -0 321" in cmd[-1]
+    assert kwargs["start_new_session"] is True
+    assert kwargs["stdout"] is subprocess.DEVNULL
+    assert kwargs["stderr"] is subprocess.DEVNULL
@@ -127,6 +127,16 @@ async def test_shutdown_fires_finalize_for_active_agents(mock_invoke_hook):
    runner._shutdown_event = MagicMock()
    runner.adapters = {}
    runner._exit_reason = "test"
+    runner._exit_code = None
+    runner._draining = False
+    runner._restart_requested = False
+    runner._restart_task_started = False
+    runner._restart_detached = False
+    runner._restart_via_service = False
+    runner._restart_drain_timeout = 0.0
+    runner._stop_task = None
+    runner._running_agents_ts = {}
+    runner._update_runtime_status = MagicMock()

    agent1 = MagicMock()
    agent1.session_id = "sess-a"
@@ -3,9 +3,15 @@ import os
 from gateway.config import Platform
 from gateway.run import GatewayRunner
 from gateway.session import SessionContext, SessionSource
+from gateway.session_context import (
+    get_session_env,
+    set_session_vars,
+    clear_session_vars,
+)


-def test_set_session_env_includes_thread_id(monkeypatch):
+def test_set_session_env_sets_contextvars(monkeypatch):
+    """_set_session_env should populate contextvars, not os.environ."""
    runner = object.__new__(GatewayRunner)
    source = SessionSource(
        platform=Platform.TELEGRAM,
@@ -21,25 +27,93 @@ def test_set_session_env_includes_thread_id(monkeypatch):
    monkeypatch.delenv("HERMES_SESSION_CHAT_NAME", raising=False)
    monkeypatch.delenv("HERMES_SESSION_THREAD_ID", raising=False)

-    runner._set_session_env(context)
+    tokens = runner._set_session_env(context)

-    assert os.getenv("HERMES_SESSION_PLATFORM") == "telegram"
-    assert os.getenv("HERMES_SESSION_CHAT_ID") == "-1001"
-    assert os.getenv("HERMES_SESSION_CHAT_NAME") == "Group"
-    assert os.getenv("HERMES_SESSION_THREAD_ID") == "17585"
+    # Values should be readable via get_session_env (contextvar path)
+    assert get_session_env("HERMES_SESSION_PLATFORM") == "telegram"
+    assert get_session_env("HERMES_SESSION_CHAT_ID") == "-1001"
+    assert get_session_env("HERMES_SESSION_CHAT_NAME") == "Group"
+    assert get_session_env("HERMES_SESSION_THREAD_ID") == "17585"
+
+    # os.environ should NOT be touched
+    assert os.getenv("HERMES_SESSION_PLATFORM") is None
+    assert os.getenv("HERMES_SESSION_THREAD_ID") is None
+
+    # Clean up
+    runner._clear_session_env(tokens)


-def test_clear_session_env_removes_thread_id(monkeypatch):
+def test_clear_session_env_restores_previous_state(monkeypatch):
+    """_clear_session_env should restore contextvars to their pre-handler values."""
    runner = object.__new__(GatewayRunner)

-    monkeypatch.setenv("HERMES_SESSION_PLATFORM", "telegram")
-    monkeypatch.setenv("HERMES_SESSION_CHAT_ID", "-1001")
-    monkeypatch.setenv("HERMES_SESSION_CHAT_NAME", "Group")
-    monkeypatch.setenv("HERMES_SESSION_THREAD_ID", "17585")
+    monkeypatch.delenv("HERMES_SESSION_PLATFORM", raising=False)
+    monkeypatch.delenv("HERMES_SESSION_CHAT_ID", raising=False)
+    monkeypatch.delenv("HERMES_SESSION_CHAT_NAME", raising=False)
+    monkeypatch.delenv("HERMES_SESSION_THREAD_ID", raising=False)

-    runner._clear_session_env()
+    source = SessionSource(
+        platform=Platform.TELEGRAM,
+        chat_id="-1001",
+        chat_name="Group",
+        chat_type="group",
+        thread_id="17585",
+    )
+    context = SessionContext(source=source, connected_platforms=[], home_channels={})

-    assert os.getenv("HERMES_SESSION_PLATFORM") is None
-    assert os.getenv("HERMES_SESSION_CHAT_ID") is None
-    assert os.getenv("HERMES_SESSION_CHAT_NAME") is None
-    assert os.getenv("HERMES_SESSION_THREAD_ID") is None
+    tokens = runner._set_session_env(context)
+    assert get_session_env("HERMES_SESSION_PLATFORM") == "telegram"
+
+    runner._clear_session_env(tokens)
+
+    # After clear, contextvars should return to defaults (empty)
+    assert get_session_env("HERMES_SESSION_PLATFORM") == ""
+    assert get_session_env("HERMES_SESSION_CHAT_ID") == ""
+    assert get_session_env("HERMES_SESSION_CHAT_NAME") == ""
+    assert get_session_env("HERMES_SESSION_THREAD_ID") == ""
+
+
+def test_get_session_env_falls_back_to_os_environ(monkeypatch):
+    """get_session_env should fall back to os.environ when contextvar is unset."""
+    monkeypatch.setenv("HERMES_SESSION_PLATFORM", "discord")
+
+    # No contextvar set — should read from os.environ
+    assert get_session_env("HERMES_SESSION_PLATFORM") == "discord"
+
+    # Now set a contextvar — should prefer it
+    tokens = set_session_vars(platform="telegram")
+    assert get_session_env("HERMES_SESSION_PLATFORM") == "telegram"
+
+    # Restore — should fall back to os.environ again
+    clear_session_vars(tokens)
+    assert get_session_env("HERMES_SESSION_PLATFORM") == "discord"
+
+
+def test_get_session_env_default_when_nothing_set(monkeypatch):
+    """get_session_env returns default when neither contextvar nor env is set."""
+    monkeypatch.delenv("HERMES_SESSION_PLATFORM", raising=False)
+
+    assert get_session_env("HERMES_SESSION_PLATFORM") == ""
+    assert get_session_env("HERMES_SESSION_PLATFORM", "fallback") == "fallback"
+
+
+def test_set_session_env_handles_missing_optional_fields():
+    """_set_session_env should handle None chat_name and thread_id gracefully."""
+    runner = object.__new__(GatewayRunner)
+    source = SessionSource(
+        platform=Platform.TELEGRAM,
+        chat_id="-1001",
+        chat_name=None,
+        chat_type="private",
+        thread_id=None,
+    )
+    context = SessionContext(source=source, connected_platforms=[], home_channels={})
+
+    tokens = runner._set_session_env(context)
+
+    assert get_session_env("HERMES_SESSION_PLATFORM") == "telegram"
+    assert get_session_env("HERMES_SESSION_CHAT_ID") == "-1001"
+    assert get_session_env("HERMES_SESSION_CHAT_NAME") == ""
+    assert get_session_env("HERMES_SESSION_THREAD_ID") == ""
+
+    runner._clear_session_env(tokens)
@@ -41,6 +41,15 @@ def _make_runner():
    runner._pending_approvals = {}
    runner._voice_mode = {}
    runner._background_tasks = set()
+    runner._draining = False
+    runner._restart_requested = False
+    runner._restart_task_started = False
+    runner._restart_detached = False
+    runner._restart_via_service = False
+    runner._restart_drain_timeout = 0.0
+    runner._stop_task = None
+    runner._exit_code = None
+    runner._update_runtime_status = MagicMock()
    runner._is_user_authorized = lambda _source: True
    runner.hooks = MagicMock()
    runner.hooks.emit = AsyncMock()
@@ -5,6 +5,10 @@ from pathlib import Path
 from types import SimpleNamespace

 import hermes_cli.gateway as gateway_cli
+from gateway.restart import (
+    DEFAULT_GATEWAY_RESTART_DRAIN_TIMEOUT,
+    GATEWAY_SERVICE_RESTART_EXIT_CODE,
+)


 class TestSystemdServiceRefresh:
@@ -74,7 +78,7 @@ class TestSystemdServiceRefresh:
        assert unit_path.read_text(encoding="utf-8") == "new unit\n"
        assert calls[:2] == [
            ["systemctl", "--user", "daemon-reload"],
-            ["systemctl", "--user", "restart", gateway_cli.get_service_name()],
+            ["systemctl", "--user", "reload-or-restart", gateway_cli.get_service_name()],
        ]


@@ -84,6 +88,8 @@ class TestGeneratedSystemdUnits:

        assert "ExecStart=" in unit
        assert "ExecStop=" not in unit
+        assert "ExecReload=/bin/kill -USR1 $MAINPID" in unit
+        assert f"RestartForceExitStatus={GATEWAY_SERVICE_RESTART_EXIT_CODE}" in unit
        assert "TimeoutStopSec=60" in unit

    def test_user_unit_includes_resolved_node_directory_in_path(self, monkeypatch):
@@ -98,6 +104,8 @@ class TestGeneratedSystemdUnits:

        assert "ExecStart=" in unit
        assert "ExecStop=" not in unit
+        assert "ExecReload=/bin/kill -USR1 $MAINPID" in unit
+        assert f"RestartForceExitStatus={GATEWAY_SERVICE_RESTART_EXIT_CODE}" in unit
        assert "TimeoutStopSec=60" in unit
        assert "WantedBy=multi-user.target" in unit

@@ -157,6 +165,31 @@ class TestGatewayStopCleanup:


 class TestLaunchdServiceRecovery:
+    def test_get_restart_drain_timeout_prefers_env_then_config_then_default(self, monkeypatch):
+        monkeypatch.delenv("HERMES_RESTART_DRAIN_TIMEOUT", raising=False)
+        monkeypatch.setattr(gateway_cli, "read_raw_config", lambda: {})
+
+        assert (
+            gateway_cli._get_restart_drain_timeout()
+            == DEFAULT_GATEWAY_RESTART_DRAIN_TIMEOUT
+        )
+
+        monkeypatch.setattr(
+            gateway_cli,
+            "read_raw_config",
+            lambda: {"agent": {"restart_drain_timeout": 14}},
+        )
+        assert gateway_cli._get_restart_drain_timeout() == 14.0
+
+        monkeypatch.setenv("HERMES_RESTART_DRAIN_TIMEOUT", "9")
+        assert gateway_cli._get_restart_drain_timeout() == 9.0
+
+        monkeypatch.setenv("HERMES_RESTART_DRAIN_TIMEOUT", "invalid")
+        assert (
+            gateway_cli._get_restart_drain_timeout()
+            == DEFAULT_GATEWAY_RESTART_DRAIN_TIMEOUT
+        )
+
    def test_launchd_install_repairs_outdated_plist_without_force(self, tmp_path, monkeypatch):
        plist_path = tmp_path / "ai.hermes.gateway.plist"
        plist_path.write_text("<plist>old content</plist>", encoding="utf-8")
@@ -234,6 +267,55 @@ class TestLaunchdServiceRecovery:
            ["launchctl", "kickstart", target],
        ]

+    def test_launchd_restart_drains_running_gateway_before_kickstart(self, monkeypatch):
+        calls = []
+        target = f"{gateway_cli._launchd_domain()}/{gateway_cli.get_launchd_label()}"
+
+        monkeypatch.setattr(gateway_cli, "_get_restart_drain_timeout", lambda: 12.0)
+        monkeypatch.setattr(gateway_cli, "_request_gateway_self_restart", lambda pid: False)
+        monkeypatch.setattr(gateway_cli, "_wait_for_gateway_exit", lambda timeout, force_after=None: True)
+        monkeypatch.setattr(gateway_cli, "terminate_pid", lambda pid, force=False: calls.append(("term", pid, force)))
+        monkeypatch.setattr(
+            "gateway.status.get_running_pid",
+            lambda: 321,
+        )
+
+        def fake_run(cmd, check=False, **kwargs):
+            calls.append(cmd)
+            return SimpleNamespace(returncode=0, stdout="", stderr="")
+
+        monkeypatch.setattr(gateway_cli.subprocess, "run", fake_run)
+
+        gateway_cli.launchd_restart()
+
+        assert calls == [
+            ("term", 321, False),
+            ["launchctl", "kickstart", "-k", target],
+        ]
+
+    def test_launchd_restart_self_requests_graceful_restart_without_kickstart(self, monkeypatch, capsys):
+        calls = []
+
+        monkeypatch.setattr(
+            "gateway.status.get_running_pid",
+            lambda: 321,
+        )
+        monkeypatch.setattr(
+            gateway_cli,
+            "_request_gateway_self_restart",
+            lambda pid: calls.append(("self", pid)) or True,
+        )
+        monkeypatch.setattr(
+            gateway_cli.subprocess,
+            "run",
+            lambda *args, **kwargs: (_ for _ in ()).throw(AssertionError("launchctl should not run")),
+        )
+
+        gateway_cli.launchd_restart()
+
+        assert calls == [("self", 321)]
+        assert "restart requested" in capsys.readouterr().out.lower()
+
    def test_launchd_stop_uses_bootout_not_kill(self, monkeypatch):
        """launchd_stop must bootout the service so KeepAlive doesn't respawn it."""
        label = gateway_cli.get_launchd_label()
@@ -337,6 +419,31 @@ class TestGatewayServiceDetection:


 class TestGatewaySystemServiceRouting:
+    def test_systemd_restart_self_requests_graceful_restart_without_reload_or_restart(self, monkeypatch, capsys):
+        calls = []
+
+        monkeypatch.setattr(gateway_cli, "_select_systemd_scope", lambda system=False: False)
+        monkeypatch.setattr(gateway_cli, "refresh_systemd_unit_if_needed", lambda system=False: calls.append(("refresh", system)))
+        monkeypatch.setattr(
+            "gateway.status.get_running_pid",
+            lambda: 654,
+        )
+        monkeypatch.setattr(
+            gateway_cli,
+            "_request_gateway_self_restart",
+            lambda pid: calls.append(("self", pid)) or True,
+        )
+        monkeypatch.setattr(
+            gateway_cli.subprocess,
+            "run",
+            lambda *args, **kwargs: (_ for _ in ()).throw(AssertionError("systemctl should not run")),
+        )
+
+        gateway_cli.systemd_restart()
+
+        assert calls == [("refresh", False), ("self", 654)]
+        assert "restart requested" in capsys.readouterr().out.lower()
+
    def test_gateway_install_passes_system_flags(self, monkeypatch):
        monkeypatch.setattr(gateway_cli, "supports_systemd_services", lambda: True)
        monkeypatch.setattr(gateway_cli, "is_termux", lambda: False)
@@ -2125,6 +2125,28 @@ class TestRetryExhaustion:
        assert "error" in result
        assert "rate limited" in result["error"]

+    def test_build_api_kwargs_error_no_unbound_local(self, agent):
+        """When _build_api_kwargs raises, except handler must not crash with UnboundLocalError.
+
+        Regression: _dump_api_request_debug(api_kwargs, ...) in the except block
+        referenced api_kwargs before it was assigned when _build_api_kwargs threw.
+        """
+        self._setup_agent(agent)
+        with (
+            patch.object(agent, "_build_api_kwargs", side_effect=ValueError("bad messages")),
+            patch.object(agent, "_persist_session"),
+            patch.object(agent, "_save_trajectory"),
+            patch.object(agent, "_cleanup_task_resources"),
+            patch("run_agent.time", self._make_fast_time_mock()),
+        ):
+            result = agent.run_conversation("hello")
+        # Must surface the real error, not UnboundLocalError
+        assert result.get("completed") is False
+        assert result.get("failed") is True
+        assert "error" in result
+        assert "UnboundLocalError" not in result.get("error", "")
+        assert "bad messages" in result["error"]
+

 # ---------------------------------------------------------------------------
 # Flush sentinel leak
@@ -11,12 +11,19 @@ def _load_optional_dependencies():
    return project["optional-dependencies"]


-def test_matrix_extra_exists_but_excluded_from_all():
+def test_matrix_extra_linux_only_in_all():
    """matrix-nio[e2e] depends on python-olm which is upstream-broken on modern
    macOS (archived libolm, C++ errors with Clang 21+).  The [matrix] extra is
-    kept for opt-in install but deliberately excluded from [all] so one broken
-    upstream dep doesn't nuke every other extra during ``hermes update``."""
+    included in [all] but gated to Linux via a platform marker so that
+    ``hermes update`` doesn't fail on macOS."""
    optional_dependencies = _load_optional_dependencies()

    assert "matrix" in optional_dependencies
+    # Must NOT be unconditional — python-olm has no macOS wheels.
    assert "hermes-agent[matrix]" not in optional_dependencies["all"]
+    # Must be present with a Linux platform marker.
+    linux_gated = [
+        dep for dep in optional_dependencies["all"]
+        if "matrix" in dep and "linux" in dep
+    ]
+    assert linux_gated, "expected hermes-agent[matrix] with sys_platform=='linux' marker in [all]"
@@ -333,3 +333,25 @@ class TestShellFileOpsWriteDenied:
        result = file_ops.patch_replace("~/.ssh/authorized_keys", "old", "new")
        assert result.error is not None
        assert "denied" in result.error.lower()
+
+    def test_delete_file_denied_path(self, file_ops):
+        result = file_ops.delete_file("~/.ssh/authorized_keys")
+        assert result.error is not None
+        assert "denied" in result.error.lower()
+
+    def test_move_file_src_denied(self, file_ops):
+        result = file_ops.move_file("~/.ssh/id_rsa", "/tmp/dest.txt")
+        assert result.error is not None
+        assert "denied" in result.error.lower()
+
+    def test_move_file_dst_denied(self, file_ops):
+        result = file_ops.move_file("/tmp/src.txt", "~/.aws/credentials")
+        assert result.error is not None
+        assert "denied" in result.error.lower()
+
+    def test_move_file_failure_path(self, mock_env):
+        mock_env.execute.return_value = {"output": "No such file or directory", "returncode": 1}
+        ops = ShellFileOperations(mock_env)
+        result = ops.move_file("/tmp/nonexistent.txt", "/tmp/dest.txt")
+        assert result.error is not None
+        assert "Failed to move" in result.error
@@ -6,31 +6,31 @@ from tools.fuzzy_match import fuzzy_find_and_replace
 class TestExactMatch:
    def test_single_replacement(self):
        content = "hello world"
-        new, count, err = fuzzy_find_and_replace(content, "hello", "hi")
+        new, count, _, err = fuzzy_find_and_replace(content, "hello", "hi")
        assert err is None
        assert count == 1
        assert new == "hi world"

    def test_no_match(self):
        content = "hello world"
-        new, count, err = fuzzy_find_and_replace(content, "xyz", "abc")
+        new, count, _, err = fuzzy_find_and_replace(content, "xyz", "abc")
        assert count == 0
        assert err is not None
        assert new == content

    def test_empty_old_string(self):
-        new, count, err = fuzzy_find_and_replace("abc", "", "x")
+        new, count, _, err = fuzzy_find_and_replace("abc", "", "x")
        assert count == 0
        assert err is not None

    def test_identical_strings(self):
-        new, count, err = fuzzy_find_and_replace("abc", "abc", "abc")
+        new, count, _, err = fuzzy_find_and_replace("abc", "abc", "abc")
        assert count == 0
        assert "identical" in err

    def test_multiline_exact(self):
        content = "line1\nline2\nline3"
-        new, count, err = fuzzy_find_and_replace(content, "line1\nline2", "replaced")
+        new, count, _, err = fuzzy_find_and_replace(content, "line1\nline2", "replaced")
        assert err is None
        assert count == 1
        assert new == "replaced\nline3"
@@ -39,7 +39,7 @@ class TestExactMatch:
 class TestWhitespaceDifference:
    def test_extra_spaces_match(self):
        content = "def  foo(  x,  y  ):"
-        new, count, err = fuzzy_find_and_replace(content, "def foo( x, y ):", "def bar(x, y):")
+        new, count, _, err = fuzzy_find_and_replace(content, "def foo( x, y ):", "def bar(x, y):")
        assert count == 1
        assert "bar" in new

@@ -47,7 +47,7 @@ class TestWhitespaceDifference:
 class TestIndentDifference:
    def test_different_indentation(self):
        content = "    def foo():\n        pass"
-        new, count, err = fuzzy_find_and_replace(content, "def foo():\n    pass", "def bar():\n    return 1")
+        new, count, _, err = fuzzy_find_and_replace(content, "def foo():\n    pass", "def bar():\n    return 1")
        assert count == 1
        assert "bar" in new

@@ -55,13 +55,96 @@ class TestIndentDifference:
 class TestReplaceAll:
    def test_multiple_matches_without_flag_errors(self):
        content = "aaa bbb aaa"
-        new, count, err = fuzzy_find_and_replace(content, "aaa", "ccc", replace_all=False)
+        new, count, _, err = fuzzy_find_and_replace(content, "aaa", "ccc", replace_all=False)
        assert count == 0
        assert "Found 2 matches" in err

    def test_multiple_matches_with_flag(self):
        content = "aaa bbb aaa"
-        new, count, err = fuzzy_find_and_replace(content, "aaa", "ccc", replace_all=True)
+        new, count, _, err = fuzzy_find_and_replace(content, "aaa", "ccc", replace_all=True)
        assert err is None
        assert count == 2
        assert new == "ccc bbb ccc"
+
+
+class TestUnicodeNormalized:
+    """Tests for the unicode_normalized strategy (Bug 5)."""
+
+    def test_em_dash_matched(self):
+        """Em-dash in content should match ASCII '--' in pattern."""
+        content = "return value\u2014fallback"
+        new, count, strategy, err = fuzzy_find_and_replace(
+            content, "return value--fallback", "return value or fallback"
+        )
+        assert count == 1, f"Expected match via unicode_normalized, got err={err}"
+        assert strategy == "unicode_normalized"
+        assert "return value or fallback" in new
+
+    def test_smart_quotes_matched(self):
+        """Smart double quotes in content should match straight quotes in pattern."""
+        content = 'print(\u201chello\u201d)'
+        new, count, strategy, err = fuzzy_find_and_replace(
+            content, 'print("hello")', 'print("world")'
+        )
+        assert count == 1, f"Expected match via unicode_normalized, got err={err}"
+        assert "world" in new
+
+    def test_no_unicode_skips_strategy(self):
+        """When content and pattern have no Unicode variants, strategy is skipped."""
+        content = "hello world"
+        # Should match via exact, not unicode_normalized
+        new, count, strategy, err = fuzzy_find_and_replace(content, "hello", "hi")
+        assert count == 1
+        assert strategy == "exact"
+
+
+class TestBlockAnchorThreshold:
+    """Tests for the raised block_anchor threshold (Bug 4)."""
+
+    def test_high_similarity_matches(self):
+        """A block with >50% middle similarity should match."""
+        content = "def foo():\n    x = 1\n    y = 2\n    return x + y\n"
+        pattern = "def foo():\n    x = 1\n    y = 9\n    return x + y"
+        new, count, strategy, err = fuzzy_find_and_replace(content, pattern, "def foo():\n    return 0\n")
+        # Should match via block_anchor or earlier strategy
+        assert count == 1
+
+    def test_completely_different_middle_does_not_match(self):
+        """A block where only first+last lines match but middle is completely different
+        should NOT match under the raised 0.50 threshold."""
+        content = (
+            "class Foo:\n"
+            "    completely = 'unrelated'\n"
+            "    content = 'here'\n"
+            "    nothing = 'in common'\n"
+            "    pass\n"
+        )
+        # Pattern has same first/last lines but completely different middle
+        pattern = (
+            "class Foo:\n"
+            "    x = 1\n"
+            "    y = 2\n"
+            "    z = 3\n"
+            "    pass"
+        )
+        new, count, strategy, err = fuzzy_find_and_replace(content, pattern, "replaced")
+        # With threshold=0.50, this near-zero-similarity middle should not match
+        assert count == 0, (
+            f"Block with unrelated middle should not match under threshold=0.50, "
+            f"but matched via strategy={strategy}"
+        )
+
+
+class TestStrategyNameSurfaced:
+    """Tests for the strategy name in the 4-tuple return (Bug 6)."""
+
+    def test_exact_strategy_name(self):
+        new, count, strategy, err = fuzzy_find_and_replace("hello", "hello", "world")
+        assert strategy == "exact"
+        assert count == 1
+
+    def test_failed_match_returns_none_strategy(self):
+        new, count, strategy, err = fuzzy_find_and_replace("hello", "xyz", "world")
+        assert count == 0
+        assert strategy is None
+        assert err is not None
@@ -104,6 +104,45 @@ class TestStdioPidTracking:
        with _lock:
            assert fake_pid not in _stdio_pids

+    def test_kill_orphaned_uses_sigkill_when_available(self, monkeypatch):
+        """Unix-like platforms should keep using SIGKILL for orphan cleanup."""
+        from tools.mcp_tool import _kill_orphaned_mcp_children, _stdio_pids, _lock
+
+        fake_pid = 424242
+        with _lock:
+            _stdio_pids.clear()
+            _stdio_pids.add(fake_pid)
+
+        fake_sigkill = 9
+        monkeypatch.setattr(signal, "SIGKILL", fake_sigkill, raising=False)
+
+        with patch("tools.mcp_tool.os.kill") as mock_kill:
+            _kill_orphaned_mcp_children()
+
+        mock_kill.assert_called_once_with(fake_pid, fake_sigkill)
+
+        with _lock:
+            assert fake_pid not in _stdio_pids
+
+    def test_kill_orphaned_falls_back_without_sigkill(self, monkeypatch):
+        """Windows-like signal modules without SIGKILL should fall back to SIGTERM."""
+        from tools.mcp_tool import _kill_orphaned_mcp_children, _stdio_pids, _lock
+
+        fake_pid = 434343
+        with _lock:
+            _stdio_pids.clear()
+            _stdio_pids.add(fake_pid)
+
+        monkeypatch.delattr(signal, "SIGKILL", raising=False)
+
+        with patch("tools.mcp_tool.os.kill") as mock_kill:
+            _kill_orphaned_mcp_children()
+
+        mock_kill.assert_called_once_with(fake_pid, signal.SIGTERM)
+
+        with _lock:
+            assert fake_pid not in _stdio_pids
+

 # ---------------------------------------------------------------------------
 # Fix 3: MCP reload timeout (cli.py)
@@ -159,7 +159,7 @@ class TestApplyUpdate:
            def __init__(self):
                self.written = None

-            def read_file(self, path, offset=1, limit=500):
+            def read_file_raw(self, path):
                return SimpleNamespace(
                    content=(
                        'def run():\n'
@@ -211,7 +211,7 @@ class TestAdditionOnlyHunks:
        # Apply to a file that contains the context hint
        class FakeFileOps:
            written = None
-            def read_file(self, path, **kw):
+            def read_file_raw(self, path):
                return SimpleNamespace(
                    content="def main():\n    pass\n",
                    error=None,
@@ -239,7 +239,7 @@ class TestAdditionOnlyHunks:

        class FakeFileOps:
            written = None
-            def read_file(self, path, **kw):
+            def read_file_raw(self, path):
                return SimpleNamespace(
                    content="existing = True\n",
                    error=None,
@@ -253,3 +253,259 @@ class TestAdditionOnlyHunks:
        assert result.success is True
        assert file_ops.written.endswith("def new_func():\n    return True\n")
        assert "existing = True" in file_ops.written
+
+
+class TestReadFileRaw:
+    """Bug 1 regression tests — files > 2000 lines and lines > 2000 chars."""
+
+    def test_apply_update_file_over_2000_lines(self):
+        """A hunk targeting line 2200 must not truncate the file to 2000 lines."""
+        patch = """\
+*** Begin Patch
+*** Update File: big.py
+@@ marker_at_2200 @@
+ line_2200
+-old_value
+new_value
+*** End Patch"""
+        ops, err = parse_v4a_patch(patch)
+        assert err is None
+
+        # Build a 2500-line file; the hunk targets a region at line 2200
+        lines = [f"line_{i}" for i in range(1, 2501)]
+        lines[2199] = "line_2200"   # index 2199 = line 2200
+        lines[2200] = "old_value"
+        file_content = "\n".join(lines)
+
+        class FakeFileOps:
+            written = None
+            def read_file_raw(self, path):
+                return SimpleNamespace(content=file_content, error=None)
+            def write_file(self, path, content):
+                self.written = content
+                return SimpleNamespace(error=None)
+
+        file_ops = FakeFileOps()
+        result = apply_v4a_operations(ops, file_ops)
+        assert result.success is True
+        written_lines = file_ops.written.split("\n")
+        assert len(written_lines) == 2500, (
+            f"Expected 2500 lines, got {len(written_lines)}"
+        )
+        assert "new_value" in file_ops.written
+        assert "old_value" not in file_ops.written
+
+    def test_apply_update_preserves_long_lines(self):
+        """A line > 2000 chars must be preserved verbatim after an unrelated hunk."""
+        long_line = "x" * 3000
+        patch = """\
+*** Begin Patch
+*** Update File: wide.py
+@@ short_func @@
+ def short_func():
+-    return 1
+    return 2
+*** End Patch"""
+        ops, err = parse_v4a_patch(patch)
+        assert err is None
+
+        file_content = f"def short_func():\n    return 1\n{long_line}\n"
+
+        class FakeFileOps:
+            written = None
+            def read_file_raw(self, path):
+                return SimpleNamespace(content=file_content, error=None)
+            def write_file(self, path, content):
+                self.written = content
+                return SimpleNamespace(error=None)
+
+        file_ops = FakeFileOps()
+        result = apply_v4a_operations(ops, file_ops)
+        assert result.success is True
+        assert long_line in file_ops.written, "Long line was truncated"
+        assert "... [truncated]" not in file_ops.written
+
+
+class TestValidationPhase:
+    """Bug 2 regression tests — validation prevents partial apply."""
+
+    def test_validation_failure_writes_nothing(self):
+        """If one hunk is invalid, no files should be written."""
+        patch = """\
+*** Begin Patch
+*** Update File: a.py
+ def good():
+-    return 1
+    return 2
+*** Update File: b.py
+ THIS LINE DOES NOT EXIST
+-    old
+    new
+*** End Patch"""
+        ops, err = parse_v4a_patch(patch)
+        assert err is None
+
+        written = {}
+
+        class FakeFileOps:
+            def read_file_raw(self, path):
+                files = {
+                    "a.py": "def good():\n    return 1\n",
+                    "b.py": "completely different content\n",
+                }
+                content = files.get(path)
+                if content is None:
+                    return SimpleNamespace(content=None, error=f"File not found: {path}")
+                return SimpleNamespace(content=content, error=None)
+
+            def write_file(self, path, content):
+                written[path] = content
+                return SimpleNamespace(error=None)
+
+        result = apply_v4a_operations(ops, FakeFileOps())
+        assert result.success is False
+        assert written == {}, f"No files should have been written, got: {list(written.keys())}"
+        assert "validation failed" in result.error.lower()
+
+    def test_all_valid_operations_applied(self):
+        """When all operations are valid, all files are written."""
+        patch = """\
+*** Begin Patch
+*** Update File: a.py
+ def foo():
+-    return 1
+    return 2
+*** Update File: b.py
+ def bar():
+-    pass
+    return True
+*** End Patch"""
+        ops, err = parse_v4a_patch(patch)
+        assert err is None
+
+        written = {}
+
+        class FakeFileOps:
+            def read_file_raw(self, path):
+                files = {
+                    "a.py": "def foo():\n    return 1\n",
+                    "b.py": "def bar():\n    pass\n",
+                }
+                return SimpleNamespace(content=files[path], error=None)
+
+            def write_file(self, path, content):
+                written[path] = content
+                return SimpleNamespace(error=None)
+
+        result = apply_v4a_operations(ops, FakeFileOps())
+        assert result.success is True
+        assert set(written.keys()) == {"a.py", "b.py"}
+
+
+class TestApplyDelete:
+    """Tests for _apply_delete producing a real unified diff."""
+
+    def test_delete_diff_contains_removed_lines(self):
+        """_apply_delete must embed the actual file content in the diff, not a placeholder."""
+        patch = """\
+*** Begin Patch
+*** Delete File: old/stuff.py
+*** End Patch"""
+        ops, err = parse_v4a_patch(patch)
+        assert err is None
+
+        class FakeFileOps:
+            deleted = False
+
+            def read_file_raw(self, path):
+                return SimpleNamespace(
+                    content="def old_func():\n    return 42\n",
+                    error=None,
+                )
+
+            def delete_file(self, path):
+                self.deleted = True
+                return SimpleNamespace(error=None)
+
+        file_ops = FakeFileOps()
+        result = apply_v4a_operations(ops, file_ops)
+
+        assert result.success is True
+        assert file_ops.deleted is True
+        # Diff must contain the actual removed lines, not a bare comment
+        assert "-def old_func():" in result.diff
+        assert "-    return 42" in result.diff
+        assert "/dev/null" in result.diff
+
+    def test_delete_diff_fallback_on_empty_file(self):
+        """An empty file should produce the fallback comment diff."""
+        patch = """\
+*** Begin Patch
+*** Delete File: empty.py
+*** End Patch"""
+        ops, err = parse_v4a_patch(patch)
+        assert err is None
+
+        class FakeFileOps:
+            def read_file_raw(self, path):
+                return SimpleNamespace(content="", error=None)
+
+            def delete_file(self, path):
+                return SimpleNamespace(error=None)
+
+        result = apply_v4a_operations(ops, FakeFileOps())
+        assert result.success is True
+        # unified_diff produces nothing for two empty inputs — fallback comment expected
+        assert "Deleted" in result.diff or result.diff.strip() == ""
+
+
+class TestCountOccurrences:
+    def test_basic(self):
+        from tools.patch_parser import _count_occurrences
+        assert _count_occurrences("aaa", "a") == 3
+        assert _count_occurrences("aaa", "aa") == 2
+        assert _count_occurrences("hello world", "xyz") == 0
+        assert _count_occurrences("", "x") == 0
+
+
+class TestParseErrorSignalling:
+    """Bug 3 regression tests — parse_v4a_patch must signal errors, not swallow them."""
+
+    def test_update_with_no_hunks_returns_error(self):
+        """An UPDATE with no hunk lines is a malformed patch and should error."""
+        patch = """\
+*** Begin Patch
+*** Update File: foo.py
+*** End Patch"""
+        ops, err = parse_v4a_patch(patch)
+        assert err is not None, "Expected a parse error for hunk-less UPDATE"
+        assert ops == []
+
+    def test_move_without_destination_returns_error(self):
+        """A MOVE without '->' syntax should not silently produce a broken operation."""
+        # The move regex requires '->' so this will be treated as an unrecognised
+        # line and the op is never created.  Confirm nothing crashes and ops is empty.
+        patch = """\
+*** Begin Patch
+*** Move File: src/foo.py
+*** End Patch"""
+        ops, err = parse_v4a_patch(patch)
+        # Either parse sees zero ops (fine) or returns an error (also fine).
+        # What is NOT acceptable is ops=[MOVE op with empty new_path] + err=None.
+        if ops:
+            assert err is not None, (
+                "MOVE with missing destination must either produce empty ops or an error"
+            )
+
+    def test_valid_patch_returns_no_error(self):
+        """A well-formed patch must still return err=None."""
+        patch = """\
+*** Begin Patch
+*** Update File: f.py
+ ctx
+-old
+new
+*** End Patch"""
+        ops, err = parse_v4a_patch(patch)
+        assert err is None
+        assert len(ops) == 1
@@ -0,0 +1,274 @@
+"""Tests for zombie process cleanup — verifies processes spawned by tools
+are properly reaped when agent sessions end.
+
+Reproduction for issue #7131: zombie process accumulation on long-running
+gateway deployments.
+"""
+
+import os
+import signal
+import subprocess
+import sys
+import time
+import threading
+
+import pytest
+
+
+def _spawn_sleep(seconds: float = 60) -> subprocess.Popen:
+    """Spawn a portable long-lived Python sleep process (no shell wrapper)."""
+    return subprocess.Popen(
+        [sys.executable, "-c", f"import time; time.sleep({seconds})"],
+    )
+
+
+def _pid_alive(pid: int) -> bool:
+    """Return True if a process with the given PID is still running."""
+    try:
+        os.kill(pid, 0)
+        return True
+    except (ProcessLookupError, PermissionError):
+        return False
+
+
+class TestZombieReproduction:
+    """Demonstrate that subprocesses survive when cleanup is not called."""
+
+    def test_orphaned_processes_survive_without_cleanup(self):
+        """REPRODUCTION: processes spawned directly survive if no one kills
+        them — this models the gap that causes zombie accumulation when
+        the gateway drops agent references without calling close()."""
+        pids = []
+
+        try:
+            for _ in range(3):
+                proc = _spawn_sleep(60)
+                pids.append(proc.pid)
+
+            for pid in pids:
+                assert _pid_alive(pid), f"PID {pid} should be alive after spawn"
+
+            # Simulate "session end" by just dropping the reference
+            del proc  # noqa: F821
+
+            # BUG: processes are still alive after reference is dropped
+            for pid in pids:
+                assert _pid_alive(pid), (
+                    f"PID {pid} died after ref drop — "
+                    f"expected it to survive (demonstrating the bug)"
+                )
+        finally:
+            for pid in pids:
+                try:
+                    os.kill(pid, signal.SIGKILL)
+                except (ProcessLookupError, PermissionError):
+                    pass
+
+    def test_explicit_terminate_reaps_processes(self):
+        """Explicitly terminating+waiting on Popen handles works.
+        This models what ProcessRegistry.kill_process does internally."""
+        procs = []
+
+        try:
+            for _ in range(3):
+                proc = _spawn_sleep(60)
+                procs.append(proc)
+
+            for proc in procs:
+                assert _pid_alive(proc.pid)
+
+            for proc in procs:
+                proc.terminate()
+                proc.wait(timeout=5)
+
+            for proc in procs:
+                assert proc.returncode is not None, (
+                    f"PID {proc.pid} should have exited after terminate+wait"
+                )
+        finally:
+            for proc in procs:
+                try:
+                    proc.kill()
+                    proc.wait(timeout=1)
+                except Exception:
+                    pass
+
+
+class TestAgentCloseMethod:
+    """Verify AIAgent.close() exists, is idempotent, and calls cleanup."""
+
+    def test_close_calls_cleanup_functions(self):
+        """close() should call kill_all, cleanup_vm, cleanup_browser."""
+        from unittest.mock import patch
+
+        with patch("run_agent.AIAgent.__init__", return_value=None):
+            from run_agent import AIAgent
+            agent = AIAgent.__new__(AIAgent)
+            agent.session_id = "test-close-cleanup"
+            agent._active_children = []
+            agent._active_children_lock = threading.Lock()
+            agent.client = None
+
+            with patch("tools.process_registry.process_registry") as mock_registry, \
+                 patch("tools.terminal_tool.cleanup_vm") as mock_cleanup_vm, \
+                 patch("tools.browser_tool.cleanup_browser") as mock_cleanup_browser:
+                agent.close()
+
+                mock_registry.kill_all.assert_called_once_with(
+                    task_id="test-close-cleanup"
+                )
+                mock_cleanup_vm.assert_called_once_with("test-close-cleanup")
+                mock_cleanup_browser.assert_called_once_with("test-close-cleanup")
+
+    def test_close_is_idempotent(self):
+        """close() can be called multiple times without error."""
+        from unittest.mock import patch
+
+        with patch("run_agent.AIAgent.__init__", return_value=None):
+            from run_agent import AIAgent
+            agent = AIAgent.__new__(AIAgent)
+            agent.session_id = "test-close-idempotent"
+            agent._active_children = []
+            agent._active_children_lock = threading.Lock()
+            agent.client = None
+
+            agent.close()
+            agent.close()
+            agent.close()
+
+    def test_close_propagates_to_children(self):
+        """close() should call close() on all active child agents."""
+        from unittest.mock import MagicMock, patch
+
+        with patch("run_agent.AIAgent.__init__", return_value=None):
+            from run_agent import AIAgent
+            agent = AIAgent.__new__(AIAgent)
+            agent.session_id = "test-close-children"
+            agent._active_children_lock = threading.Lock()
+            agent.client = None
+
+            child_1 = MagicMock()
+            child_2 = MagicMock()
+            agent._active_children = [child_1, child_2]
+
+            agent.close()
+
+            child_1.close.assert_called_once()
+            child_2.close.assert_called_once()
+            assert agent._active_children == []
+
+    def test_close_survives_partial_failures(self):
+        """close() continues cleanup even if one step fails."""
+        from unittest.mock import patch
+
+        with patch("run_agent.AIAgent.__init__", return_value=None):
+            from run_agent import AIAgent
+            agent = AIAgent.__new__(AIAgent)
+            agent.session_id = "test-close-partial"
+            agent._active_children = []
+            agent._active_children_lock = threading.Lock()
+            agent.client = None
+
+            with patch(
+                "tools.process_registry.process_registry"
+            ) as mock_reg, patch(
+                "tools.terminal_tool.cleanup_vm"
+            ) as mock_vm, patch(
+                "tools.browser_tool.cleanup_browser"
+            ) as mock_browser:
+                mock_reg.kill_all.side_effect = RuntimeError("boom")
+
+                agent.close()
+
+                mock_vm.assert_called_once()
+                mock_browser.assert_called_once()
+
+
+class TestGatewayCleanupWiring:
+    """Verify gateway lifecycle calls close() on agents."""
+
+    def test_gateway_stop_calls_close(self):
+        """gateway stop() should call close() on all running agents."""
+        import asyncio
+        from unittest.mock import MagicMock, patch
+
+        runner = MagicMock()
+        runner._running = True
+        runner._running_agents = {}
+        runner.adapters = {}
+        runner._background_tasks = set()
+        runner._pending_messages = {}
+        runner._pending_approvals = {}
+        runner._shutdown_event = asyncio.Event()
+        runner._exit_reason = None
+
+        mock_agent_1 = MagicMock()
+        mock_agent_2 = MagicMock()
+        runner._running_agents = {
+            "session-1": mock_agent_1,
+            "session-2": mock_agent_2,
+        }
+
+        from gateway.run import GatewayRunner
+
+        loop = asyncio.new_event_loop()
+        try:
+            with patch("gateway.status.remove_pid_file"), \
+                 patch("gateway.status.write_runtime_status"), \
+                 patch("tools.terminal_tool.cleanup_all_environments"), \
+                 patch("tools.browser_tool.cleanup_all_browsers"):
+                loop.run_until_complete(GatewayRunner.stop(runner))
+        finally:
+            loop.close()
+
+        mock_agent_1.close.assert_called()
+        mock_agent_2.close.assert_called()
+
+    def test_evict_does_not_call_close(self):
+        """_evict_cached_agent() should NOT call close() — it's also used
+        for non-destructive refreshes (model switch, branch, fallback)."""
+        import threading
+        from unittest.mock import MagicMock
+
+        from gateway.run import GatewayRunner
+
+        runner = object.__new__(GatewayRunner)
+        runner._agent_cache_lock = threading.Lock()
+
+        mock_agent = MagicMock()
+        runner._agent_cache = {"session-key": (mock_agent, 12345)}
+
+        GatewayRunner._evict_cached_agent(runner, "session-key")
+
+        mock_agent.close.assert_not_called()
+        assert "session-key" not in runner._agent_cache
+
+
+class TestDelegationCleanup:
+    """Verify subagent delegation cleans up child agents."""
+
+    def test_run_single_child_calls_close(self):
+        """_run_single_child finally block should call close() on child."""
+        from unittest.mock import MagicMock
+        from tools.delegate_tool import _run_single_child
+
+        parent = MagicMock()
+        parent._active_children = []
+        parent._active_children_lock = threading.Lock()
+
+        child = MagicMock()
+        child._delegate_saved_tool_names = ["tool1"]
+        child.run_conversation.side_effect = RuntimeError("test abort")
+
+        parent._active_children.append(child)
+
+        result = _run_single_child(
+            task_index=0,
+            goal="test goal",
+            child=child,
+            parent_agent=parent,
+        )
+
+        child.close.assert_called_once()
+        assert child not in parent._active_children
+        assert result["status"] == "error"
@@ -64,14 +64,15 @@ def _scan_cron_prompt(prompt: str) -> str:


 def _origin_from_env() -> Optional[Dict[str, str]]:
-    origin_platform = os.getenv("HERMES_SESSION_PLATFORM")
-    origin_chat_id = os.getenv("HERMES_SESSION_CHAT_ID")
+    from gateway.session_context import get_session_env
+    origin_platform = get_session_env("HERMES_SESSION_PLATFORM")
+    origin_chat_id = get_session_env("HERMES_SESSION_CHAT_ID")
    if origin_platform and origin_chat_id:
        return {
            "platform": origin_platform,
            "chat_id": origin_chat_id,
-            "chat_name": os.getenv("HERMES_SESSION_CHAT_NAME"),
-            "thread_id": os.getenv("HERMES_SESSION_THREAD_ID"),
+            "chat_name": get_session_env("HERMES_SESSION_CHAT_NAME") or None,
+            "thread_id": get_session_env("HERMES_SESSION_THREAD_ID") or None,
        }
    return None

@@ -578,6 +578,15 @@ def _run_single_child(
            except (ValueError, UnboundLocalError) as e:
                logger.debug("Could not remove child from active_children: %s", e)

+        # Close tool resources (terminal sandboxes, browser daemons,
+        # background processes, httpx clients) so subagent subprocesses
+        # don't outlive the delegation.
+        try:
+            if hasattr(child, 'close'):
+                child.close()
+        except Exception:
+            logger.debug("Failed to close child agent after delegation")
+
 def delegate_task(
    goal: Optional[str] = None,
    context: Optional[str] = None,
@@ -409,11 +409,12 @@ class DockerEnvironment(BaseEnvironment):
        container_name = f"hermes-{uuid.uuid4().hex[:8]}"
        run_cmd = [
            self._docker_exe, "run", "-d",
+            "--init",           # tini/catatonit as PID 1 — reaps zombie children
            "--name", container_name,
            "-w", cwd,
            *all_run_args,
            image,
-            "sleep", "2h",
+            "sleep", "infinity",  # no fixed lifetime — idle reaper handles cleanup
        ]
        logger.debug(f"Starting container: {' '.join(run_cmd)}")
        result = subprocess.run(
@@ -252,23 +252,43 @@ class FileOperations(ABC):
    def read_file(self, path: str, offset: int = 1, limit: int = 500) -> ReadResult:
        """Read a file with pagination support."""
        ...
-    
+
+    @abstractmethod
+    def read_file_raw(self, path: str) -> ReadResult:
+        """Read the complete file content as a plain string.
+
+        No pagination, no line-number prefixes, no per-line truncation.
+        Returns ReadResult with .content = full file text, .error set on
+        failure. Always reads to EOF regardless of file size.
+        """
+        ...
+
    @abstractmethod
    def write_file(self, path: str, content: str) -> WriteResult:
        """Write content to a file, creating directories as needed."""
        ...
-    
+
    @abstractmethod
-    def patch_replace(self, path: str, old_string: str, new_string: str, 
+    def patch_replace(self, path: str, old_string: str, new_string: str,
                      replace_all: bool = False) -> PatchResult:
        """Replace text in a file using fuzzy matching."""
        ...
-    
+
    @abstractmethod
    def patch_v4a(self, patch_content: str) -> PatchResult:
        """Apply a V4A format patch."""
        ...
-    
+
+    @abstractmethod
+    def delete_file(self, path: str) -> WriteResult:
+        """Delete a file. Returns WriteResult with .error set on failure."""
+        ...
+
+    @abstractmethod
+    def move_file(self, src: str, dst: str) -> WriteResult:
+        """Move/rename a file from src to dst. Returns WriteResult with .error set on failure."""
+        ...
+
    @abstractmethod
    def search(self, pattern: str, path: str = ".", target: str = "content",
               file_glob: Optional[str] = None, limit: int = 50, offset: int = 0,
@@ -561,10 +581,62 @@ class ShellFileOperations(FileOperations):
            similar_files=similar[:5]  # Limit to 5 suggestions
        )
    
+    def read_file_raw(self, path: str) -> ReadResult:
+        """Read the complete file content as a plain string.
+
+        No pagination, no line-number prefixes, no per-line truncation.
+        Uses cat so the full file is returned regardless of size.
+        """
+        path = self._expand_path(path)
+        stat_cmd = f"wc -c < {self._escape_shell_arg(path)} 2>/dev/null"
+        stat_result = self._exec(stat_cmd)
+        if stat_result.exit_code != 0:
+            return self._suggest_similar_files(path)
+        try:
+            file_size = int(stat_result.stdout.strip())
+        except ValueError:
+            file_size = 0
+        if self._is_image(path):
+            return ReadResult(is_image=True, is_binary=True, file_size=file_size)
+        sample_result = self._exec(f"head -c 1000 {self._escape_shell_arg(path)} 2>/dev/null")
+        if self._is_likely_binary(path, sample_result.stdout):
+            return ReadResult(
+                is_binary=True, file_size=file_size,
+                error="Binary file — cannot display as text."
+            )
+        cat_result = self._exec(f"cat {self._escape_shell_arg(path)}")
+        if cat_result.exit_code != 0:
+            return ReadResult(error=f"Failed to read file: {cat_result.stdout}")
+        return ReadResult(content=cat_result.stdout, file_size=file_size)
+
+    def delete_file(self, path: str) -> WriteResult:
+        """Delete a file via rm."""
+        path = self._expand_path(path)
+        if _is_write_denied(path):
+            return WriteResult(error=f"Delete denied: {path} is a protected path")
+        result = self._exec(f"rm -f {self._escape_shell_arg(path)}")
+        if result.exit_code != 0:
+            return WriteResult(error=f"Failed to delete {path}: {result.stdout}")
+        return WriteResult()
+
+    def move_file(self, src: str, dst: str) -> WriteResult:
+        """Move a file via mv."""
+        src = self._expand_path(src)
+        dst = self._expand_path(dst)
+        for p in (src, dst):
+            if _is_write_denied(p):
+                return WriteResult(error=f"Move denied: {p} is a protected path")
+        result = self._exec(
+            f"mv {self._escape_shell_arg(src)} {self._escape_shell_arg(dst)}"
+        )
+        if result.exit_code != 0:
+            return WriteResult(error=f"Failed to move {src} -> {dst}: {result.stdout}")
+        return WriteResult()
+
    # =========================================================================
    # WRITE Implementation
    # =========================================================================
-    
+
    def write_file(self, path: str, content: str) -> WriteResult:
        """
        Write content to a file, creating parent directories as needed.
@@ -656,7 +728,7 @@ class ShellFileOperations(FileOperations):
        # Import and use fuzzy matching
        from tools.fuzzy_match import fuzzy_find_and_replace
        
-        new_content, match_count, error = fuzzy_find_and_replace(
+        new_content, match_count, _strategy, error = fuzzy_find_and_replace(
            content, old_string, new_string, replace_all
        )
        
@@ -21,7 +21,7 @@ Multi-occurrence matching is handled via the replace_all flag.
 Usage:
    from tools.fuzzy_match import fuzzy_find_and_replace
    
-    new_content, match_count, error = fuzzy_find_and_replace(
+    new_content, match_count, strategy, error = fuzzy_find_and_replace(
        content="def foo():\\n    pass",
        old_string="def foo():",
        new_string="def bar():",
@@ -48,27 +48,27 @@ def _unicode_normalize(text: str) -> str:


 def fuzzy_find_and_replace(content: str, old_string: str, new_string: str,
-                            replace_all: bool = False) -> Tuple[str, int, Optional[str]]:
+                            replace_all: bool = False) -> Tuple[str, int, Optional[str], Optional[str]]:
    """
    Find and replace text using a chain of increasingly fuzzy matching strategies.
-    
+
    Args:
        content: The file content to search in
        old_string: The text to find
        new_string: The replacement text
        replace_all: If True, replace all occurrences; if False, require uniqueness
-    
+
    Returns:
-        Tuple of (new_content, match_count, error_message)
-        - If successful: (modified_content, number_of_replacements, None)
-        - If failed: (original_content, 0, error_description)
+        Tuple of (new_content, match_count, strategy_name, error_message)
+        - If successful: (modified_content, number_of_replacements, strategy_used, None)
+        - If failed: (original_content, 0, None, error_description)
    """
    if not old_string:
-        return content, 0, "old_string cannot be empty"
-    
+        return content, 0, None, "old_string cannot be empty"
+
    if old_string == new_string:
-        return content, 0, "old_string and new_string are identical"
-    
+        return content, 0, None, "old_string and new_string are identical"
+
    # Try each matching strategy in order
    strategies: List[Tuple[str, Callable]] = [
        ("exact", _strategy_exact),
@@ -77,27 +77,28 @@ def fuzzy_find_and_replace(content: str, old_string: str, new_string: str,
        ("indentation_flexible", _strategy_indentation_flexible),
        ("escape_normalized", _strategy_escape_normalized),
        ("trimmed_boundary", _strategy_trimmed_boundary),
+        ("unicode_normalized", _strategy_unicode_normalized),
        ("block_anchor", _strategy_block_anchor),
        ("context_aware", _strategy_context_aware),
    ]
-    
-    for _strategy_name, strategy_fn in strategies:
+
+    for strategy_name, strategy_fn in strategies:
        matches = strategy_fn(content, old_string)
-        
+
        if matches:
            # Found matches with this strategy
            if len(matches) > 1 and not replace_all:
-                return content, 0, (
+                return content, 0, None, (
                    f"Found {len(matches)} matches for old_string. "
                    f"Provide more context to make it unique, or use replace_all=True."
                )
-            
+
            # Perform replacement
            new_content = _apply_replacements(content, matches, new_string)
-            return new_content, len(matches), None
-    
+            return new_content, len(matches), strategy_name, None
+
    # No strategy found a match
-    return content, 0, "Could not find a match for old_string in the file"
+    return content, 0, None, "Could not find a match for old_string in the file"


 def _apply_replacements(content: str, matches: List[Tuple[int, int]], new_string: str) -> str:
@@ -258,9 +259,90 @@ def _strategy_trimmed_boundary(content: str, pattern: str) -> List[Tuple[int, in
    return matches


+def _build_orig_to_norm_map(original: str) -> List[int]:
+    """Build a list mapping each original character index to its normalized index.
+
+    Because UNICODE_MAP replacements may expand characters (e.g. em-dash → '--',
+    ellipsis → '...'), the normalised string can be longer than the original.
+    This map lets us convert positions in the normalised string back to the
+    corresponding positions in the original string.
+
+    Returns a list of length ``len(original) + 1``; entry ``i`` is the
+    normalised index that character ``i`` maps to.
+    """
+    result: List[int] = []
+    norm_pos = 0
+    for char in original:
+        result.append(norm_pos)
+        repl = UNICODE_MAP.get(char)
+        norm_pos += len(repl) if repl is not None else 1
+    result.append(norm_pos)  # sentinel: one past the last character
+    return result
+
+
+def _map_positions_norm_to_orig(
+    orig_to_norm: List[int],
+    norm_matches: List[Tuple[int, int]],
+) -> List[Tuple[int, int]]:
+    """Convert (start, end) positions in the normalised string to original positions."""
+    # Invert the map: norm_pos -> first original position with that norm_pos
+    norm_to_orig_start: dict[int, int] = {}
+    for orig_pos, norm_pos in enumerate(orig_to_norm[:-1]):
+        if norm_pos not in norm_to_orig_start:
+            norm_to_orig_start[norm_pos] = orig_pos
+
+    results: List[Tuple[int, int]] = []
+    orig_len = len(orig_to_norm) - 1  # number of original characters
+
+    for norm_start, norm_end in norm_matches:
+        if norm_start not in norm_to_orig_start:
+            continue
+        orig_start = norm_to_orig_start[norm_start]
+
+        # Walk forward until orig_to_norm[orig_end] >= norm_end
+        orig_end = orig_start
+        while orig_end < orig_len and orig_to_norm[orig_end] < norm_end:
+            orig_end += 1
+
+        results.append((orig_start, orig_end))
+
+    return results
+
+
+def _strategy_unicode_normalized(content: str, pattern: str) -> List[Tuple[int, int]]:
+    """Strategy 7: Unicode normalisation.
+
+    Normalises smart quotes, em/en-dashes, ellipsis, and non-breaking spaces
+    to their ASCII equivalents in both *content* and *pattern*, then runs
+    exact and line_trimmed matching on the normalised copies.
+
+    Positions are mapped back to the *original* string via
+    ``_build_orig_to_norm_map`` — necessary because some UNICODE_MAP
+    replacements expand a single character into multiple ASCII characters,
+    making a naïve position copy incorrect.
+    """
+    # Normalize both sides. Either the content or the pattern (or both) may
+    # carry unicode variants — e.g. content has an em-dash that should match
+    # the LLM's ASCII '--', or vice-versa.  Skip only when neither changes.
+    norm_pattern = _unicode_normalize(pattern)
+    norm_content = _unicode_normalize(content)
+    if norm_content == content and norm_pattern == pattern:
+        return []
+
+    norm_matches = _strategy_exact(norm_content, norm_pattern)
+    if not norm_matches:
+        norm_matches = _strategy_line_trimmed(norm_content, norm_pattern)
+
+    if not norm_matches:
+        return []
+
+    orig_to_norm = _build_orig_to_norm_map(content)
+    return _map_positions_norm_to_orig(orig_to_norm, norm_matches)
+
+
 def _strategy_block_anchor(content: str, pattern: str) -> List[Tuple[int, int]]:
    """
-    Strategy 7: Match by anchoring on first and last lines.
+    Strategy 8: Match by anchoring on first and last lines.
    Adjusted with permissive thresholds and unicode normalization.
    """
    # Normalize both strings for comparison while keeping original content for offset calculation
@@ -290,8 +372,10 @@ def _strategy_block_anchor(content: str, pattern: str) -> List[Tuple[int, int]]:
    matches = []
    candidate_count = len(potential_matches)
    
-    # Thresholding logic: 0.10 for unique matches (max flexibility), 0.30 for multiple candidates
-    threshold = 0.10 if candidate_count == 1 else 0.30
+    # Thresholding logic: 0.50 for unique matches, 0.70 for multiple candidates.
+    # Previous values (0.10 / 0.30) were dangerously loose — a 10% middle-section
+    # similarity could match completely unrelated blocks.
+    threshold = 0.50 if candidate_count == 1 else 0.70

    for i in potential_matches:
        if pattern_line_count <= 2:
@@ -314,7 +398,7 @@ def _strategy_block_anchor(content: str, pattern: str) -> List[Tuple[int, int]]:

 def _strategy_context_aware(content: str, pattern: str) -> List[Tuple[int, int]]:
    """
-    Strategy 8: Line-by-line similarity with 50% threshold.
+    Strategy 9: Line-by-line similarity with 50% threshold.
    
    Finds blocks where at least 50% of lines have high similarity.
    """
@@ -2160,6 +2160,7 @@ def _kill_orphaned_mcp_children() -> None:
    Only kills PIDs tracked in ``_stdio_pids`` — never arbitrary children.
    """
    import signal as _signal
+    kill_signal = getattr(_signal, "SIGKILL", _signal.SIGTERM)

    with _lock:
        pids = list(_stdio_pids)
@@ -2167,7 +2168,7 @@ def _kill_orphaned_mcp_children() -> None:

    for pid in pids:
        try:
-            os.kill(pid, _signal.SIGKILL)
+            os.kill(pid, kill_signal)
            logger.debug("Force-killed orphaned MCP stdio process %d", pid)
        except (ProcessLookupError, PermissionError, OSError):
            pass  # Already exited or inaccessible
@@ -28,6 +28,7 @@ Usage:
        result = apply_v4a_operations(operations, file_ops)
 """

+import difflib
 import re
 from dataclasses import dataclass, field
 from typing import List, Optional, Tuple, Any
@@ -202,31 +203,162 @@ def parse_v4a_patch(patch_content: str) -> Tuple[List[PatchOperation], Optional[
        if current_hunk and current_hunk.lines:
            current_op.hunks.append(current_hunk)
        operations.append(current_op)
-    
+
+    # Validate the parsed result
+    if not operations:
+        # Empty patch is not an error — callers get [] and can decide
+        return operations, None
+
+    parse_errors: List[str] = []
+    for op in operations:
+        if not op.file_path:
+            parse_errors.append("Operation with empty file path")
+        if op.operation == OperationType.UPDATE and not op.hunks:
+            parse_errors.append(f"UPDATE {op.file_path!r}: no hunks found")
+        if op.operation == OperationType.MOVE and not op.new_path:
+            parse_errors.append(f"MOVE {op.file_path!r}: missing destination path (expected 'src -> dst')")
+
+    if parse_errors:
+        return [], "Parse error: " + "; ".join(parse_errors)
+
    return operations, None


-def apply_v4a_operations(operations: List[PatchOperation], 
-                          file_ops: Any) -> 'PatchResult':
+def _count_occurrences(text: str, pattern: str) -> int:
+    """Count non-overlapping occurrences of *pattern* in *text*."""
+    count = 0
+    start = 0
+    while True:
+        pos = text.find(pattern, start)
+        if pos == -1:
+            break
+        count += 1
+        start = pos + 1
+    return count
+
+
+def _validate_operations(
+    operations: List[PatchOperation],
+    file_ops: Any,
+) -> List[str]:
+    """Validate all operations without writing any files.
+
+    Returns a list of error strings; an empty list means all operations
+    are valid and the apply phase can proceed safely.
+
+    For UPDATE operations, hunks are simulated in order so that later
+    hunks validate against post-earlier-hunk content (matching apply order).
    """
-    Apply V4A patch operations using a file operations interface.
-    
+    # Deferred import: breaks the patch_parser ↔ fuzzy_match circular dependency
+    from tools.fuzzy_match import fuzzy_find_and_replace
+
+    errors: List[str] = []
+
+    for op in operations:
+        if op.operation == OperationType.UPDATE:
+            read_result = file_ops.read_file_raw(op.file_path)
+            if read_result.error:
+                errors.append(f"{op.file_path}: {read_result.error}")
+                continue
+
+            simulated = read_result.content
+            for hunk in op.hunks:
+                search_lines = [l.content for l in hunk.lines if l.prefix in (' ', '-')]
+                if not search_lines:
+                    # Addition-only hunk: validate context hint uniqueness
+                    if hunk.context_hint:
+                        occurrences = _count_occurrences(simulated, hunk.context_hint)
+                        if occurrences == 0:
+                            errors.append(
+                                f"{op.file_path}: addition-only hunk context hint "
+                                f"'{hunk.context_hint}' not found"
+                            )
+                        elif occurrences > 1:
+                            errors.append(
+                                f"{op.file_path}: addition-only hunk context hint "
+                                f"'{hunk.context_hint}' is ambiguous "
+                                f"({occurrences} occurrences)"
+                            )
+                    continue
+
+                search_pattern = '\n'.join(search_lines)
+                replace_lines = [l.content for l in hunk.lines if l.prefix in (' ', '+')]
+                replacement = '\n'.join(replace_lines)
+
+                new_simulated, count, _strategy, match_error = fuzzy_find_and_replace(
+                    simulated, search_pattern, replacement, replace_all=False
+                )
+                if count == 0:
+                    label = f"'{hunk.context_hint}'" if hunk.context_hint else "(no hint)"
+                    errors.append(
+                        f"{op.file_path}: hunk {label} not found"
+                        + (f" — {match_error}" if match_error else "")
+                    )
+                else:
+                    # Advance simulation so subsequent hunks validate correctly.
+                    # Reuse the result from the call above — no second fuzzy run.
+                    simulated = new_simulated
+
+        elif op.operation == OperationType.DELETE:
+            read_result = file_ops.read_file_raw(op.file_path)
+            if read_result.error:
+                errors.append(f"{op.file_path}: file not found for deletion")
+
+        elif op.operation == OperationType.MOVE:
+            if not op.new_path:
+                errors.append(f"{op.file_path}: MOVE operation missing destination path")
+                continue
+            src_result = file_ops.read_file_raw(op.file_path)
+            if src_result.error:
+                errors.append(f"{op.file_path}: source file not found for move")
+            dst_result = file_ops.read_file_raw(op.new_path)
+            if not dst_result.error:
+                errors.append(
+                    f"{op.new_path}: destination already exists — move would overwrite"
+                )
+
+        # ADD: parent directory creation handled by write_file; no pre-check needed.
+
+    return errors
+
+
+def apply_v4a_operations(operations: List[PatchOperation],
+                          file_ops: Any) -> 'PatchResult':
+    """Apply V4A patch operations using a file operations interface.
+
+    Uses a two-phase validate-then-apply approach:
+    - Phase 1: validate all operations against current file contents without
+      writing anything. If any validation error is found, return immediately
+      with no filesystem changes.
+    - Phase 2: apply all operations. A failure here (e.g. a race between
+      validation and apply) is reported with a note to run ``git diff``.
+
    Args:
        operations: List of PatchOperation from parse_v4a_patch
-        file_ops: Object with read_file, write_file methods
-    
+        file_ops: Object with read_file_raw, write_file methods
+
    Returns:
        PatchResult with results of all operations
    """
    # Import here to avoid circular imports
    from tools.file_operations import PatchResult
-    
+
+    # ---- Phase 1: validate ----
+    validation_errors = _validate_operations(operations, file_ops)
+    if validation_errors:
+        return PatchResult(
+            success=False,
+            error="Patch validation failed (no files were modified):\n"
+                  + "\n".join(f"  • {e}" for e in validation_errors),
+        )
+
+    # ---- Phase 2: apply ----
    files_modified = []
    files_created = []
    files_deleted = []
    all_diffs = []
    errors = []
-    
+
    for op in operations:
        try:
            if op.operation == OperationType.ADD:
@@ -236,7 +368,7 @@ def apply_v4a_operations(operations: List[PatchOperation],
                    all_diffs.append(result[1])
                else:
                    errors.append(f"Failed to add {op.file_path}: {result[1]}")
-                    
+
            elif op.operation == OperationType.DELETE:
                result = _apply_delete(op, file_ops)
                if result[0]:
@@ -244,7 +376,7 @@ def apply_v4a_operations(operations: List[PatchOperation],
                    all_diffs.append(result[1])
                else:
                    errors.append(f"Failed to delete {op.file_path}: {result[1]}")
-                    
+
            elif op.operation == OperationType.MOVE:
                result = _apply_move(op, file_ops)
                if result[0]:
@@ -252,7 +384,7 @@ def apply_v4a_operations(operations: List[PatchOperation],
                    all_diffs.append(result[1])
                else:
                    errors.append(f"Failed to move {op.file_path}: {result[1]}")
-                    
+
            elif op.operation == OperationType.UPDATE:
                result = _apply_update(op, file_ops)
                if result[0]:
@@ -260,19 +392,19 @@ def apply_v4a_operations(operations: List[PatchOperation],
                    all_diffs.append(result[1])
                else:
                    errors.append(f"Failed to update {op.file_path}: {result[1]}")
-                    
+
        except Exception as e:
            errors.append(f"Error processing {op.file_path}: {str(e)}")
-    
+
    # Run lint on all modified/created files
    lint_results = {}
    for f in files_modified + files_created:
        if hasattr(file_ops, '_check_lint'):
            lint_result = file_ops._check_lint(f)
            lint_results[f] = lint_result.to_dict()
-    
+
    combined_diff = '\n'.join(all_diffs)
-    
+
    if errors:
        return PatchResult(
            success=False,
@@ -281,16 +413,17 @@ def apply_v4a_operations(operations: List[PatchOperation],
            files_created=files_created,
            files_deleted=files_deleted,
            lint=lint_results if lint_results else None,
-            error='; '.join(errors)
+            error="Apply phase failed (state may be inconsistent — run `git diff` to assess):\n"
+                  + "\n".join(f"  • {e}" for e in errors),
        )
-    
+
    return PatchResult(
        success=True,
        diff=combined_diff,
        files_modified=files_modified,
        files_created=files_created,
        files_deleted=files_deleted,
-        lint=lint_results if lint_results else None
+        lint=lint_results if lint_results else None,
    )


@@ -317,68 +450,56 @@ def _apply_add(op: PatchOperation, file_ops: Any) -> Tuple[bool, str]:

 def _apply_delete(op: PatchOperation, file_ops: Any) -> Tuple[bool, str]:
    """Apply a delete file operation."""
-    # Read file first for diff
-    read_result = file_ops.read_file(op.file_path)
-    
-    if read_result.error and "not found" in read_result.error.lower():
-        # File doesn't exist, nothing to delete
-        return True, f"# {op.file_path} already deleted or doesn't exist"
-    
-    # Delete directly via shell command using the underlying environment
-    rm_result = file_ops._exec(f"rm -f {file_ops._escape_shell_arg(op.file_path)}")
-    
-    if rm_result.exit_code != 0:
-        return False, rm_result.stdout
-    
-    diff = f"--- a/{op.file_path}\n+++ /dev/null\n# File deleted"
-    return True, diff
+    # Read before deleting so we can produce a real unified diff.
+    # Validation already confirmed existence; this guards against races.
+    read_result = file_ops.read_file_raw(op.file_path)
+    if read_result.error:
+        return False, f"Cannot delete {op.file_path}: file not found"
+
+    result = file_ops.delete_file(op.file_path)
+    if result.error:
+        return False, result.error
+
+    removed_lines = read_result.content.splitlines(keepends=True)
+    diff = ''.join(difflib.unified_diff(
+        removed_lines, [],
+        fromfile=f"a/{op.file_path}",
+        tofile="/dev/null",
+    ))
+    return True, diff or f"# Deleted: {op.file_path}"


 def _apply_move(op: PatchOperation, file_ops: Any) -> Tuple[bool, str]:
    """Apply a move file operation."""
-    # Use shell mv command
-    mv_result = file_ops._exec(
-        f"mv {file_ops._escape_shell_arg(op.file_path)} {file_ops._escape_shell_arg(op.new_path)}"
-    )
-    
-    if mv_result.exit_code != 0:
-        return False, mv_result.stdout
-    
+    result = file_ops.move_file(op.file_path, op.new_path)
+    if result.error:
+        return False, result.error
+
    diff = f"# Moved: {op.file_path} -> {op.new_path}"
    return True, diff


 def _apply_update(op: PatchOperation, file_ops: Any) -> Tuple[bool, str]:
    """Apply an update file operation."""
-    # Read current content
-    read_result = file_ops.read_file(op.file_path, limit=10000)
-    
+    # Deferred import: breaks the patch_parser ↔ fuzzy_match circular dependency
+    from tools.fuzzy_match import fuzzy_find_and_replace
+
+    # Read current content — raw so no line-number prefixes or per-line truncation
+    read_result = file_ops.read_file_raw(op.file_path)
+
    if read_result.error:
        return False, f"Cannot read file: {read_result.error}"
-    
-    # Parse content (remove line numbers)
-    current_lines = []
-    for line in read_result.content.split('\n'):
-        if re.match(r'^\s*\d+\|', line):
-            # Line format: "    123|content"
-            parts = line.split('|', 1)
-            if len(parts) == 2:
-                current_lines.append(parts[1])
-            else:
-                current_lines.append(line)
-        else:
-            current_lines.append(line)
-    
-    current_content = '\n'.join(current_lines)
-    
+
+    current_content = read_result.content
+
    # Apply each hunk
    new_content = current_content
-    
+
    for hunk in op.hunks:
        # Build search pattern from context and removed lines
        search_lines = []
        replace_lines = []
-        
+
        for line in hunk.lines:
            if line.prefix == ' ':
                search_lines.append(line.content)
@@ -387,17 +508,15 @@ def _apply_update(op: PatchOperation, file_ops: Any) -> Tuple[bool, str]:
                search_lines.append(line.content)
            elif line.prefix == '+':
                replace_lines.append(line.content)
-        
+
        if search_lines:
            search_pattern = '\n'.join(search_lines)
            replacement = '\n'.join(replace_lines)
-            
-            # Use fuzzy matching
-            from tools.fuzzy_match import fuzzy_find_and_replace
-            new_content, count, error = fuzzy_find_and_replace(
+
+            new_content, count, _strategy, error = fuzzy_find_and_replace(
                new_content, search_pattern, replacement, replace_all=False
            )
-            
+
            if error and count == 0:
                # Try with context hint if available
                if hunk.context_hint:
@@ -408,8 +527,8 @@ def _apply_update(op: PatchOperation, file_ops: Any) -> Tuple[bool, str]:
                        window_start = max(0, hint_pos - 500)
                        window_end = min(len(new_content), hint_pos + 2000)
                        window = new_content[window_start:window_end]
-                        
-                        window_new, count, error = fuzzy_find_and_replace(
+
+                        window_new, count, _strategy, error = fuzzy_find_and_replace(
                            window, search_pattern, replacement, replace_all=False
                        )
                        
@@ -424,16 +543,23 @@ def _apply_update(op: PatchOperation, file_ops: Any) -> Tuple[bool, str]:
            # Insert at the location indicated by the context hint, or at end of file.
            insert_text = '\n'.join(replace_lines)
            if hunk.context_hint:
-                hint_pos = new_content.find(hunk.context_hint)
-                if hint_pos != -1:
+                occurrences = _count_occurrences(new_content, hunk.context_hint)
+                if occurrences == 0:
+                    # Hint not found — append at end as a safe fallback
+                    new_content = new_content.rstrip('\n') + '\n' + insert_text + '\n'
+                elif occurrences > 1:
+                    return False, (
+                        f"Addition-only hunk: context hint '{hunk.context_hint}' is ambiguous "
+                        f"({occurrences} occurrences) — provide a more unique hint"
+                    )
+                else:
+                    hint_pos = new_content.find(hunk.context_hint)
                    # Insert after the line containing the context hint
                    eol = new_content.find('\n', hint_pos)
                    if eol != -1:
                        new_content = new_content[:eol + 1] + insert_text + '\n' + new_content[eol + 1:]
                    else:
                        new_content = new_content + '\n' + insert_text
-                else:
-                    new_content = new_content.rstrip('\n') + '\n' + insert_text + '\n'
            else:
                new_content = new_content.rstrip('\n') + '\n' + insert_text + '\n'
    
@@ -443,7 +569,6 @@ def _apply_update(op: PatchOperation, file_ops: Any) -> Tuple[bool, str]:
        return False, write_result.error
    
    # Generate diff
-    import difflib
    diff_lines = difflib.unified_diff(
        current_content.splitlines(keepends=True),
        new_content.splitlines(keepends=True),
@@ -585,7 +585,10 @@ class ProcessRegistry:
        from tools.ansi_strip import strip_ansi
        from tools.terminal_tool import _interrupt_event

-        default_timeout = int(os.getenv("TERMINAL_TIMEOUT", "180"))
+        try:
+            default_timeout = int(os.getenv("TERMINAL_TIMEOUT", "180"))
+        except (ValueError, TypeError):
+            default_timeout = 180
        max_timeout = default_timeout
        requested_timeout = timeout
        timeout_note = None
@@ -212,7 +212,8 @@ def _handle_send(args):
        if isinstance(result, dict) and result.get("success") and mirror_text:
            try:
                from gateway.mirror import mirror_to_session
-                source_label = os.getenv("HERMES_SESSION_PLATFORM", "cli")
+                from gateway.session_context import get_session_env
+                source_label = get_session_env("HERMES_SESSION_PLATFORM", "cli")
                if mirror_to_session(platform_name, chat_id, mirror_text, source_label=source_label, thread_id=thread_id):
                    result["mirrored"] = True
            except Exception:
@@ -689,7 +690,10 @@ async def _send_email(extra, chat_id, message):
    address = extra.get("address") or os.getenv("EMAIL_ADDRESS", "")
    password = os.getenv("EMAIL_PASSWORD", "")
    smtp_host = extra.get("smtp_host") or os.getenv("EMAIL_SMTP_HOST", "")
-    smtp_port = int(os.getenv("EMAIL_SMTP_PORT", "587"))
+    try:
+        smtp_port = int(os.getenv("EMAIL_SMTP_PORT", "587"))
+    except (ValueError, TypeError):
+        smtp_port = 587

    if not all([address, password, smtp_host]):
        return {"error": "Email not configured (EMAIL_ADDRESS, EMAIL_PASSWORD, EMAIL_SMTP_HOST required)"}
@@ -1020,7 +1024,8 @@ async def _send_feishu(pconfig, chat_id, message, media_files=None, thread_id=No

 def _check_send_message():
    """Gate send_message on gateway running (always available on messaging platforms)."""
-    platform = os.getenv("HERMES_SESSION_PLATFORM", "")
+    from gateway.session_context import get_session_env
+    platform = get_session_env("HERMES_SESSION_PLATFORM", "")
    if platform and platform != "local":
        return True
    try:
@@ -426,7 +426,7 @@ def _patch_skill(
    # from exact-match failures on minor formatting mismatches.
    from tools.fuzzy_match import fuzzy_find_and_replace

-    new_content, match_count, match_error = fuzzy_find_and_replace(
+    new_content, match_count, _strategy, match_error = fuzzy_find_and_replace(
        content, old_string, new_string, replace_all
    )
    if match_error:
@@ -1788,7 +1788,10 @@ class ClawHubSource(SkillSource):
                    follow_redirects=True,
                )
                if resp.status_code == 429:
-                    retry_after = int(resp.headers.get("retry-after", "5"))
+                    try:
+                        retry_after = int(resp.headers.get("retry-after", "5"))
+                    except (ValueError, TypeError):
+                        retry_after = 5
                    retry_after = min(retry_after, 15)  # Cap wait time
                    logger.debug(
                        "ClawHub download rate-limited for %s, retrying in %ds (attempt %d/%d)",
@@ -347,7 +347,8 @@ def _capture_required_environment_variables(
 def _is_gateway_surface() -> bool:
    if os.getenv("HERMES_GATEWAY_SESSION"):
        return True
-    return bool(os.getenv("HERMES_SESSION_PLATFORM"))
+    from gateway.session_context import get_session_env
+    return bool(get_session_env("HERMES_SESSION_PLATFORM"))


 def _get_terminal_backend_name() -> str:
@@ -1420,10 +1420,11 @@ def terminal_tool(
                    # In gateway mode, auto-register a fast watcher so the
                    # gateway can detect completion and trigger a new agent
                    # turn.  CLI mode uses the completion_queue directly.
-                    _gw_platform = os.getenv("HERMES_SESSION_PLATFORM", "")
+                    from gateway.session_context import get_session_env as _gse
+                    _gw_platform = _gse("HERMES_SESSION_PLATFORM", "")
                    if _gw_platform and not check_interval:
-                        _gw_chat_id = os.getenv("HERMES_SESSION_CHAT_ID", "")
-                        _gw_thread_id = os.getenv("HERMES_SESSION_THREAD_ID", "")
+                        _gw_chat_id = _gse("HERMES_SESSION_CHAT_ID", "")
+                        _gw_thread_id = _gse("HERMES_SESSION_THREAD_ID", "")
                        proc_session.watcher_platform = _gw_platform
                        proc_session.watcher_chat_id = _gw_chat_id
                        proc_session.watcher_thread_id = _gw_thread_id
@@ -1445,9 +1446,10 @@ def terminal_tool(
                        result_data["check_interval_note"] = (
                            f"Requested {check_interval}s raised to minimum 30s"
                        )
-                    watcher_platform = os.getenv("HERMES_SESSION_PLATFORM", "")
-                    watcher_chat_id = os.getenv("HERMES_SESSION_CHAT_ID", "")
-                    watcher_thread_id = os.getenv("HERMES_SESSION_THREAD_ID", "")
+                    from gateway.session_context import get_session_env as _gse2
+                    watcher_platform = _gse2("HERMES_SESSION_PLATFORM", "")
+                    watcher_chat_id = _gse2("HERMES_SESSION_CHAT_ID", "")
+                    watcher_thread_id = _gse2("HERMES_SESSION_THREAD_ID", "")

                    # Store on session for checkpoint persistence
                    proc_session.watcher_platform = watcher_platform
@@ -480,7 +480,8 @@ def text_to_speech_tool(
    # Telegram voice bubbles require Opus (.ogg); OpenAI and ElevenLabs can
    # produce Opus natively (no ffmpeg needed).  Edge TTS always outputs MP3
    # and needs ffmpeg for conversion.
-    platform = os.getenv("HERMES_SESSION_PLATFORM", "").lower()
+    from gateway.session_context import get_session_env
+    platform = get_session_env("HERMES_SESSION_PLATFORM", "").lower()
    want_opus = (platform == "telegram")

    # Determine output path
@@ -1661,7 +1661,7 @@ dependencies = [
    { name = "fal-client" },
    { name = "fire" },
    { name = "firecrawl-py" },
-    { name = "httpx" },
+    { name = "httpx", extra = ["socks"] },
    { name = "jinja2" },
    { name = "openai" },
    { name = "parallel-web" },
@@ -1691,6 +1691,8 @@ all = [
    { name = "faster-whisper" },
    { name = "honcho-ai" },
    { name = "lark-oapi" },
+    { name = "markdown", marker = "sys_platform == 'linux'" },
+    { name = "matrix-nio", extra = ["e2e"], marker = "sys_platform == 'linux'" },
    { name = "mcp" },
    { name = "mistralai" },
    { name = "modal" },
@@ -1827,6 +1829,7 @@ requires-dist = [
    { name = "hermes-agent", extras = ["homeassistant"], marker = "extra == 'all'" },
    { name = "hermes-agent", extras = ["honcho"], marker = "extra == 'all'" },
    { name = "hermes-agent", extras = ["honcho"], marker = "extra == 'termux'" },
+    { name = "hermes-agent", extras = ["matrix"], marker = "sys_platform == 'linux' and extra == 'all'" },
    { name = "hermes-agent", extras = ["mcp"], marker = "extra == 'all'" },
    { name = "hermes-agent", extras = ["mcp"], marker = "extra == 'termux'" },
    { name = "hermes-agent", extras = ["messaging"], marker = "extra == 'all'" },
@@ -1839,7 +1842,7 @@ requires-dist = [
    { name = "hermes-agent", extras = ["tts-premium"], marker = "extra == 'all'" },
    { name = "hermes-agent", extras = ["voice"], marker = "extra == 'all'" },
    { name = "honcho-ai", marker = "extra == 'honcho'", specifier = ">=2.0.1,<3" },
-    { name = "httpx", specifier = ">=0.28.1,<1" },
+    { name = "httpx", extras = ["socks"], specifier = ">=0.28.1,<1" },
    { name = "jinja2", specifier = ">=3.1.5,<4" },
    { name = "lark-oapi", marker = "extra == 'feishu'", specifier = ">=1.5.3,<2" },
    { name = "markdown", marker = "extra == 'matrix'", specifier = ">=3.6,<4" },
@@ -2033,6 +2036,9 @@ wheels = [
 http2 = [
    { name = "h2" },
 ]
+socks = [
+    { name = "socksio" },
+]

 [[package]]
 name = "httpx-sse"
@@ -4500,6 +4506,15 @@ wheels = [
    { url = "https://files.pythonhosted.org/packages/e9/44/75a9c9421471a6c4805dbf2356f7c181a29c1879239abab1ea2cc8f38b40/sniffio-1.3.1-py3-none-any.whl", hash = "sha256:2f6da418d1f1e0fddd844478f41680e794e6051915791a034ff65e5f100525a2", size = 10235, upload-time = "2024-02-25T23:20:01.196Z" },
 ]

+[[package]]
+name = "socksio"
+version = "1.0.0"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/f8/5c/48a7d9495be3d1c651198fd99dbb6ce190e2274d0f28b9051307bdec6b85/socksio-1.0.0.tar.gz", hash = "sha256:f88beb3da5b5c38b9890469de67d0cb0f9d494b78b106ca1845f96c10b91c4ac", size = 19055, upload-time = "2020-04-17T15:50:34.664Z" }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/37/c3/6eeb6034408dac0fa653d126c9204ade96b819c936e136c5e8a6897eee9c/socksio-1.0.0-py3-none-any.whl", hash = "sha256:95dc1f15f9b34e8d7b16f06d74b8ccf48f609af32ab33c608d08761c5dcbb1f3", size = 12763, upload-time = "2020-04-17T15:50:31.878Z" },
+]
+
 [[package]]
 name = "sounddevice"
 version = "0.5.5"
@@ -262,6 +262,7 @@ For cloud sandbox backends, persistence is filesystem-oriented. `TERMINAL_LIFETI
 | `MATRIX_REQUIRE_MENTION` | Require `@mention` in rooms (default: `true`). Set to `false` to respond to all messages. |
 | `MATRIX_FREE_RESPONSE_ROOMS` | Comma-separated room IDs where bot responds without `@mention` |
 | `MATRIX_AUTO_THREAD` | Auto-create threads for room messages (default: `true`) |
+| `MATRIX_DM_MENTION_THREADS` | Create a thread when bot is `@mentioned` in a DM (default: `false`) |
 | `HASS_TOKEN` | Home Assistant Long-Lived Access Token (enables HA platform + tools) |
 | `HASS_URL` | Home Assistant URL (default: `http://homeassistant.local:8123`) |
 | `WEBHOOK_ENABLED` | Enable the webhook platform adapter (`true`/`false`) |
@@ -16,7 +16,7 @@ Before setup, here's the part most people want to know: how Hermes behaves once

 | Context | Behavior |
 |---------|----------|
-| **DMs** | Hermes responds to every message. No `@mention` needed. Each DM has its own session. |
+| **DMs** | Hermes responds to every message. No `@mention` needed. Each DM has its own session. Set `MATRIX_DM_MENTION_THREADS=true` to start a thread when the bot is `@mentioned` in a DM. |
 | **Rooms** | By default, Hermes requires an `@mention` to respond. Set `MATRIX_REQUIRE_MENTION=false` or add room IDs to `MATRIX_FREE_RESPONSE_ROOMS` for free-response rooms. Room invites are auto-accepted. |
 | **Threads** | Hermes supports Matrix threads (MSC3440). If you reply in a thread, Hermes keeps the thread context isolated from the main room timeline. Threads where the bot has already participated do not require a mention. |
 | **Auto-threading** | By default, Hermes auto-creates a thread for each message it responds to in a room. This keeps conversations isolated. Set `MATRIX_AUTO_THREAD=false` to disable. |
@@ -62,6 +62,7 @@ matrix:
  free_response_rooms:            # Rooms exempt from mention requirement
    - "!abc123:matrix.org"
  auto_thread: true               # Auto-create threads for responses (default: true)
+  dm_mention_threads: false       # Create thread when @mentioned in DM (default: false)
 ```

 Or via environment variables:
@@ -70,6 +71,7 @@ Or via environment variables:
 MATRIX_REQUIRE_MENTION=true
 MATRIX_FREE_RESPONSE_ROOMS=!abc123:matrix.org,!def456:matrix.org
 MATRIX_AUTO_THREAD=true
+MATRIX_DM_MENTION_THREADS=false
 ```

 :::note
Author	SHA1	Message	Date
Teknium	c4c3a57a3a	fix: restore agent.close() cleanup and correct /restart category - Add agent.close() call to _finalize_shutdown_agents() to prevent zombie processes (terminal sandboxes, browser daemons, httpx clients) - Global cleanup (process_registry, environments, browsers) preserved in _stop_impl() during conflict resolution - Move /restart CommandDef from 'Info' to 'Session' category to match /stop and /status	2026-04-10 18:55:28 -07:00
Kenny Xie	eb3f021e2a	fix(gateway): address restart review feedback	2026-04-10 18:54:48 -07:00
aquaright1	b4cb803954	fix(gateway): self-request service restarts when invoked in-process	2026-04-10 18:54:48 -07:00
Kenny Xie	b5928530d3	fix(gateway): tolerate partial runner construction	2026-04-10 18:54:48 -07:00
Kenny Xie	ef120b5422	fix(gateway): drain in-flight work before restart	2026-04-10 18:54:48 -07:00
Siddharth Balyan	9a0c44f908	fix(nix): gate matrix extra to Linux in [all] profile (#7461 ) * fix(nix): gate matrix extra to Linux in [all] profile matrix-nio[e2e] depends on python-olm which is upstream-broken on modern macOS (Clang 21+, archived libolm). Previously the [matrix] extra was completely excluded from [all], meaning NixOS users (who install via [all]) had no Matrix support at all. Add a sys_platform == 'linux' marker so [all] pulls in [matrix] on Linux (where python-olm builds fine) while still skipping it on macOS. This fixes the NixOS setup path without breaking macOS installs. Update the regression test to verify the Linux-gated marker is present rather than just checking matrix is absent from [all]. Fixes #4594 * chore: regenerate uv.lock with matrix-on-linux in [all]	2026-04-11 05:59:56 +05:30
Teknium	baddb6f717	fix(gateway): derive channel directory platforms from enum instead of hardcoded list (#7450 ) Six platforms (matrix, mattermost, dingtalk, feishu, wecom, homeassistant) were missing from the session-based discovery loop, causing /channels and send_message to return empty results on those platforms. Instead of adding them to the hardcoded tuple (which would break again when new platforms are added), derive the list dynamically from the Platform enum. Only infrastructure entries (local, api_server, webhook) are excluded; Discord and Slack are skipped automatically because their direct builders already populate the platforms dict. Reported by sprmn24 in PR #7416.	2026-04-10 17:27:32 -07:00
0xFrank-eth	e8034e2f6a	fix(gateway): replace os.environ session state with contextvars for concurrency safety When two gateway messages arrived concurrently, _set_session_env wrote HERMES_SESSION_PLATFORM/CHAT_ID/CHAT_NAME/THREAD_ID into the process-global os.environ. Because asyncio tasks share the same process, Message B would overwrite Message A's values mid-flight, causing background-task notifications and tool calls to route to the wrong thread/chat. Replace os.environ with Python's contextvars.ContextVar. Each asyncio task (and any run_in_executor thread it spawns) gets its own copy, so concurrent messages never interfere. Changes: - New gateway/session_context.py with ContextVar definitions, set/clear/get helpers, and os.environ fallback for CLI/cron/test backward compatibility - gateway/run.py: _set_session_env returns reset tokens, _clear_session_env accepts them for proper cleanup in finally blocks - All tool consumers updated: cronjob_tools, send_message_tool, skills_tool, terminal_tool (both notify_on_complete AND check_interval blocks), tts_tool, agent/skill_utils, agent/prompt_builder - Tests updated for new contextvar-based API Fixes #7358 Co-authored-by: teknium1 <127238744+teknium1@users.noreply.github.com>	2026-04-10 17:04:38 -07:00
Dylan Socolobsky	dab5ec8245	test(e2e): add Slack to parametrized e2e platform tests	2026-04-10 16:51:44 -07:00
Dylan Socolobsky	79565630b0	refactor(e2e): unify Telegram and Discord e2e tests into parametrized platform fixtures	2026-04-10 16:51:44 -07:00
Dylan Socolobsky	7033dbf5d6	test(e2e): add Discord e2e integration tests	2026-04-10 16:51:44 -07:00
pefontana	9555a0cf31	fix(gateway): look up expired agents in _agent_cache, add global kill_all Two fixes from PR review: 1. Session expiry was looking in _running_agents for the cached agent, but idle expired sessions live in _agent_cache. Now checks _agent_cache first, falls back to _running_agents. 2. Global cleanup in stop() was missing process_registry.kill_all(), so background processes from agents evicted without close() (branch, fallback) survived shutdown.	2026-04-10 16:51:44 -07:00
pefontana	f00dd3169f	fix(gateway): guard _agent_cache_lock access in reset handler Use getattr guard for _agent_cache_lock in _handle_reset_command because test fixtures may create GatewayRunner without calling __init__, leaving the attribute unset. Fixes e2e test failure: test_new_resets_session, test_new_then_status_reflects_reset, test_new_is_idempotent.	2026-04-10 16:51:44 -07:00
pefontana	8414f41856	test: add zombie process cleanup tests Add 9 tests covering the full zombie process prevention chain: - TestZombieReproduction: demonstrates that processes survive when references are dropped without explicit cleanup (the original bug) - TestAgentCloseMethod: verifies close() calls all cleanup functions, is idempotent, propagates to children, and continues cleanup even when individual steps fail - TestGatewayCleanupWiring: verifies stop() calls close() and that _evict_cached_agent() does NOT call close() (since it's also used for non-destructive cache refreshes) - TestDelegationCleanup: calls the real _run_single_child function and verifies close() is called on the child agent Ref: #7131	2026-04-10 16:51:44 -07:00
pefontana	672cc80915	fix(delegate): close child agent after delegation completes Call child.close() in the _run_single_child finally block after unregistering the child from the parent's active children list. Previously child AIAgent instances were only removed from the tracking list but never had their resources released — the OpenAI/httpx client and any tool subprocesses relied entirely on garbage collection. Ref: #7131	2026-04-10 16:51:44 -07:00
pefontana	fbe28352e4	fix(gateway): call agent.close() on session end to prevent zombies Wire AIAgent.close() into every gateway code path where an agent's session is actually ending: - stop(): close all running agents after interrupt + memory shutdown, then call cleanup_all_environments() and cleanup_all_browsers() as a global catch-all - _session_expiry_watcher(): close agents when sessions expire after the 5-minute idle timeout - _handle_reset_command(): close the old agent before evicting it from cache on /new or /reset Note: _evict_cached_agent() intentionally does NOT call close() because it is also used for non-destructive cache refreshes (model switch, branch, fallback) where tool resources should persist. Ref: #7131	2026-04-10 16:51:44 -07:00
pefontana	5b42aecfa7	feat(agent): add AIAgent.close() for subprocess cleanup Add a close() method to AIAgent that acts as a single entry point for releasing all resources held by an agent instance. This prevents zombie process accumulation on long-running gateway deployments by explicitly cleaning up: - Background processes tracked in ProcessRegistry - Terminal sandbox environments - Browser daemon sessions - Active child agents (subagent delegation) - OpenAI/httpx client connections Each cleanup step is independently guarded so a failure in one does not prevent the rest. The method is idempotent and safe to call multiple times. Also simplifies the background review cleanup to use close() instead of manually closing the OpenAI client. Ref: #7131	2026-04-10 16:51:44 -07:00
entropidelic	989b950fbc	fix(security): enforce API_SERVER_KEY for non-loopback binding Add is_network_accessible() helper using Python's ipaddress module to robustly classify bind addresses (IPv4/IPv6 loopback, wildcards, mapped addresses, hostname resolution with DNS-failure-fails-closed). The API server connect() now refuses to start when the bind address is network-accessible and no API_SERVER_KEY is set, preventing RCE from other machines on the network. Co-authored-by: entropidelic <entropidelic@users.noreply.github.com>	2026-04-10 16:51:44 -07:00
Devorun	2a6cbf52d0	fix(cron): prevent silent data loss by raising exceptions on unrecoverable jobs.json read failures (#6797 )	2026-04-10 16:51:35 -07:00
coffee	c5ab760528	fix(cron): missing field init, unnecessary save, and shutdown cleanup 1. Add missing `last_delivery_error` field initialization in `create_job()`. `mark_job_run()` sets this field on line 596 but it was never initialized, causing inconsistent job schemas between new and executed jobs. 2. Replace unnecessary `save_jobs()` call with a warning log when `mark_job_run()` is called with a non-existent job_id. Previously the function would silently write unchanged data to disk. 3. Add `cancel_futures=True` to the `finally` block in cron scheduler's thread pool shutdown. The `except` path already passes this flag but the normal exit path did not, leaving futures running after inactivity timeout detection.	2026-04-10 16:51:35 -07:00
Teknium	a4fc38c5b1	test: remove dead TestResolveForcedProvider tests (function doesn't exist on main)	2026-04-10 16:47:44 -07:00
KUSH42	0e939af7c2	fix(patch): harden V4A patch parser and fuzzy match — 9 correctness bugs - Bug 1: replace read_file(limit=10000) with read_file_raw in _apply_update, preventing silent truncation of files >2000 lines and corruption of lines >2000 chars; add read_file_raw to FileOperations abstract interface and ShellFileOperations - Bug 2: split apply_v4a_operations into validate-then-apply phases; if any hunk fails validation, zero writes occur (was: continue after failure, leaving filesystem partially modified) - Bug 3: parse_v4a_patch now returns an error for begin-marker-with-no-ops, empty file paths, and moves missing a destination (was: always returned error=None) - Bug 4: raise strategy 7 (block anchor) single-candidate similarity threshold from 0.10 to 0.50, eliminating false-positive matches in repetitive code - Bug 5: add _strategy_unicode_normalized (new strategy 7) with position mapping via _build_orig_to_norm_map; smart quotes and em-dashes in LLM-generated patches now match via strategies 1-6 before falling through to fuzzy strategies - Bug 6: extend fuzzy_find_and_replace to return 4-tuple (content, count, error, strategy); update all 5 call sites across patch_parser.py, file_operations.py, and skill_manager_tool.py - Bug 7: guard in _apply_update returns error when addition-only context hint is ambiguous (>1 occurrences); validation phase errors on both 0 and >1 - Bug 8: _apply_delete returns error (not silent success) on missing file - Bug 9: _validate_operations checks source existence and destination absence for MOVE operations before any write occurs	2026-04-10 16:47:44 -07:00
Billard	475cbce775	fix(aux): honor api_mode for custom auxiliary endpoints	2026-04-10 16:47:44 -07:00
coffee	c1f832a610	fix(tools): guard against ValueError on int() env var and header parsing Three locations perform `int()` conversion on environment variables or HTTP headers without error handling, causing unhandled `ValueError` crashes when the values are non-numeric: 1. `send_message_tool.py` — `EMAIL_SMTP_PORT` env var parsed outside the try/except block; a non-numeric value crashes `_send_email()` instead of returning a user-friendly error. 2. `process_registry.py` — `TERMINAL_TIMEOUT` env var parsed without protection; a non-numeric value crashes the `wait()` method. 3. `skills_hub.py` — HTTP `Retry-After` header can contain date strings per RFC 7231; `int()` conversion crashes on non-numeric values. All three now fall back to their default values on `ValueError`/`TypeError`.	2026-04-10 16:47:44 -07:00
Awsh1	6f63ba9c8f	fix(mcp): fall back when SIGKILL is unavailable	2026-04-10 16:47:44 -07:00
Fran Fitzpatrick	3e24ba1656	feat(matrix): add MATRIX_DM_MENTION_THREADS env var When enabled, @mentioning the bot in a DM creates a thread (default: false). Supports both env var and YAML config (matrix.dm_mention_threads). 6 new tests, docs updated. From #6957	2026-04-10 15:46:20 -07:00
buray	d8cd7974d8	fix(feishu): register group chat member event handlers Bot-added and bot-removed events were silently dropped because _on_bot_added_to_chat and _on_bot_removed_from_chat were not registered in _build_event_handler(). From #6975	2026-04-10 15:46:20 -07:00
Teknium	e8f16f7432	fix(docker): add missing skins/plans/workspace dirs to entrypoint The profile system expects these directories but they weren't being created on container startup. Adds them to the mkdir list alongside the existing dirs. Co-authored-by: Tranquil-Flow <tranquil_flow@protonmail.com>	2026-04-10 15:42:30 -07:00
duerzy	e1167c5c07	fix(deps): add socks extra to httpx for SOCKS proxy support Add the [socks] extra to the httpx dependency to include the required 'socksio' package. This fixes the error: "Using SOCKS proxy, but the 'socksio' package is not installed" when users configure SOCKS proxy settings.	2026-04-10 15:42:30 -07:00
angelos	8254b820ec	fix(docker): --init for zombie reaping + sleep infinity for idle-based lifetime Two issues with sandbox container spawning: 1. PID 1 was `sleep 2h` which doesn't call wait() — every background process that exited became a zombie (<defunct>), and the process tool reported them as "running" because zombie PIDs still exist in the process table. Fix: add --init to docker run, which uses tini (Docker) or catatonit (Podman) as PID 1 to reap children automatically. Both runtimes support --init natively. 2. The fixed 2-hour lifetime was arbitrary and sometimes too short for long agent sessions. Fix: replace 'sleep 2h' with 'sleep infinity'. The idle reaper (_cleanup_inactive_envs, gated by terminal.lifetime_seconds, default 300s) already handles cleanup based on last activity timestamp — there's no need for the container itself to have a fixed death timer. Fixes #6908.	2026-04-10 15:42:30 -07:00
Tranquil-Flow	2b0912ab18	fix(install): handle Playwright deps correctly on non-apt systems Playwright's --with-deps flag only supports apt-based dependency installation. The install script previously ran it on all non-Arch systems, failing silently on Gentoo, Fedora, openSUSE, and others. - Restrict --with-deps to known apt-based distributions - Add explicit guidance for RPM-based (dnf) and zypper-based systems - Show visible warnings instead of suppressing failures with \|\| true - Correct misleading comment that claimed dnf/zypper support Fixes #6865	2026-04-10 15:42:30 -07:00
Teknium	ea81aa2eec	fix: guard api_kwargs in except handler to prevent UnboundLocalError (#7376 ) When _build_api_kwargs() throws an exception, the except handler in the retry loop referenced api_kwargs before it was assigned. This caused an UnboundLocalError that masked the real error, making debugging impossible for the user. Two _dump_api_request_debug() calls in the except block (non-retryable client error path and max-retries-exhausted path) both accessed api_kwargs without checking if it was assigned. Fix: initialize api_kwargs = None before the retry loop and guard both dump calls. Now the real error surfaces instead of the masking UnboundLocalError. Reported by Discord user gruman0.	2026-04-10 15:12:00 -07:00