4.8 KiB
Stasis Warden - Testing Strategy
Initial evaluation performed by Quinn (QA Architect) based on prd.md v1.4 and architecture.md.
This document outlines the initial testing strategy for the Stasis Warden project, focusing on high-risk areas identified during the review of core product and architecture documents. The goal is to establish a robust testing architecture early to ensure we can build and iterate with confidence.
1. High Risk: Save/Load System Integrity
The data persistence logic in GameStateManager is the most critical system from a quality perspective. Data corruption in a save file can permanently halt a player's progress and ruin their experience.
Testing Strategy:
- Unit Tests: Each manager (
ResourceManager,CrewManager, etc.) must have unit tests for itsget_data()andload_data()methods. We need to verify that the data serialization and deserialization are perfectly symmetrical. - Full-Cycle Integration Tests: We must create a dedicated test scene that orchestrates a full save/load cycle.
- Programmatically set up a complex game state (e.g., multiple crew with specific stats, some assigned to tasks, specific resources, unlocked rooms).
- Trigger
GameStateManager.save_game(). - Reset the entire game state.
- Trigger
GameStateManager.load_game(). - Assert with precision that the restored state is identical to the state before saving.
- Corruption/Fuzz Testing: We need tests that attempt to load invalid, malformed, or empty
savegame.datfiles. The game must handle these errors gracefully (e.g., by showing a "corrupted save" message and returning to the main menu) rather than crashing.
2. High Risk: State Machine Logic (GameStateManager)
The game's flow is controlled by a state machine (IN_GAME, ROOM_SELECTION, etc.). A bug in state transitions can easily lead to a soft-lock where the player is stuck and cannot provide input.
Testing Strategy:
- State Transition Tests: Each possible state transition must be explicitly tested. For example, a test should confirm that when the game enters the
ROOM_SELECTIONstate, player inputs related to theIN_GAMEstate (like trying to assign a crew member) are ignored. We must verify not only that the state changes correctly, but that the game's behavior changes with it.
3. High Risk: Resource & Economic Balance (ResourceManager)
The core gameplay loop depends on the Power economy. Bugs in resource generation, spending, or the signal-driven UI updates could make the game unplayable or trivial.
Testing Strategy:
- Transactional Unit Tests: The
ResourceManager's methods (add_resource,spend_resource) must be tested like database transactions. We need to validate edge cases like spending exactly all available power, attempting to spend more than available, and ensuring thepower_updatedsignal fires with the correct payload every time. - Signal Listener Integration Tests: We should have tests that simulate a UI action and include a test-double that listens for the resulting signal from the
ResourceManager. This verifies that our core signal-driven architecture is working as intended from end-to-end.
4. Medium Risk: Procedural Generation (Crew & Rooms)
The random generation of crew stats/traits and room cards makes testing difficult. We cannot rely on chance for quality assurance.
Testing Strategy:
- Isolate and Seed the RNG: The logic for generating crew and selecting room cards must be refactored to accept an optional seed for the Random Number Generator.
- Deterministic Unit Tests: By providing a known seed, our tests can assert that the "random" outcomes are perfectly predictable. For a given seed, we should always get the exact same crew stats and the exact same set of three room cards. This makes testing repeatable and reliable.
- Property-Based Testing: For a more advanced approach, we can verify properties of the output. For example, a test can assert that a generated crew member's "Engineering" stat is always within the valid range (e.g., 1-10), regardless of the seed.
5. Medium Risk: Performance of the CRT Shader
The PRD has specific performance targets (60/120 FPS), and the architecture document correctly identifies the full-screen CRT shader as a potential bottleneck.
Testing Strategy:
- Automated Benchmarking: This cannot be covered by traditional unit tests. We should create a dedicated benchmark scene in Godot that represents an average late-game state. A script will run this scene for a fixed duration (e.g., 10,000 frames) and log the FPS. This test should be run automatically under different configurations (shader on/off, different quality settings) to ensure we do not have performance regressions as we add features.