Files

4.8 KiB

Stasis Warden - Testing Strategy

Initial evaluation performed by Quinn (QA Architect) based on prd.md v1.4 and architecture.md.

This document outlines the initial testing strategy for the Stasis Warden project, focusing on high-risk areas identified during the review of core product and architecture documents. The goal is to establish a robust testing architecture early to ensure we can build and iterate with confidence.


1. High Risk: Save/Load System Integrity

The data persistence logic in GameStateManager is the most critical system from a quality perspective. Data corruption in a save file can permanently halt a player's progress and ruin their experience.

Testing Strategy:

  • Unit Tests: Each manager (ResourceManager, CrewManager, etc.) must have unit tests for its get_data() and load_data() methods. We need to verify that the data serialization and deserialization are perfectly symmetrical.
  • Full-Cycle Integration Tests: We must create a dedicated test scene that orchestrates a full save/load cycle.
    1. Programmatically set up a complex game state (e.g., multiple crew with specific stats, some assigned to tasks, specific resources, unlocked rooms).
    2. Trigger GameStateManager.save_game().
    3. Reset the entire game state.
    4. Trigger GameStateManager.load_game().
    5. Assert with precision that the restored state is identical to the state before saving.
  • Corruption/Fuzz Testing: We need tests that attempt to load invalid, malformed, or empty savegame.dat files. The game must handle these errors gracefully (e.g., by showing a "corrupted save" message and returning to the main menu) rather than crashing.

2. High Risk: State Machine Logic (GameStateManager)

The game's flow is controlled by a state machine (IN_GAME, ROOM_SELECTION, etc.). A bug in state transitions can easily lead to a soft-lock where the player is stuck and cannot provide input.

Testing Strategy:

  • State Transition Tests: Each possible state transition must be explicitly tested. For example, a test should confirm that when the game enters the ROOM_SELECTION state, player inputs related to the IN_GAME state (like trying to assign a crew member) are ignored. We must verify not only that the state changes correctly, but that the game's behavior changes with it.

3. High Risk: Resource & Economic Balance (ResourceManager)

The core gameplay loop depends on the Power economy. Bugs in resource generation, spending, or the signal-driven UI updates could make the game unplayable or trivial.

Testing Strategy:

  • Transactional Unit Tests: The ResourceManager's methods (add_resource, spend_resource) must be tested like database transactions. We need to validate edge cases like spending exactly all available power, attempting to spend more than available, and ensuring the power_updated signal fires with the correct payload every time.
  • Signal Listener Integration Tests: We should have tests that simulate a UI action and include a test-double that listens for the resulting signal from the ResourceManager. This verifies that our core signal-driven architecture is working as intended from end-to-end.

4. Medium Risk: Procedural Generation (Crew & Rooms)

The random generation of crew stats/traits and room cards makes testing difficult. We cannot rely on chance for quality assurance.

Testing Strategy:

  • Isolate and Seed the RNG: The logic for generating crew and selecting room cards must be refactored to accept an optional seed for the Random Number Generator.
  • Deterministic Unit Tests: By providing a known seed, our tests can assert that the "random" outcomes are perfectly predictable. For a given seed, we should always get the exact same crew stats and the exact same set of three room cards. This makes testing repeatable and reliable.
  • Property-Based Testing: For a more advanced approach, we can verify properties of the output. For example, a test can assert that a generated crew member's "Engineering" stat is always within the valid range (e.g., 1-10), regardless of the seed.

5. Medium Risk: Performance of the CRT Shader

The PRD has specific performance targets (60/120 FPS), and the architecture document correctly identifies the full-screen CRT shader as a potential bottleneck.

Testing Strategy:

  • Automated Benchmarking: This cannot be covered by traditional unit tests. We should create a dedicated benchmark scene in Godot that represents an average late-game state. A script will run this scene for a fixed duration (e.g., 10,000 frames) and log the FPS. This test should be run automatically under different configurations (shader on/off, different quality settings) to ensure we do not have performance regressions as we add features.