Files
AlarmAnalysis/ProjectPlan.md
andy f08a1a9bf5 Initial commit: alarm analysis project
Python project for analyzing alarm data from building monitoring systems.
Includes alarm analyzer, plotting, tests, and source data files.
2026-02-26 09:03:54 -05:00

8.2 KiB

Alarm Data Analysis Project Plan

Overview

This project will develop a Python script to analyze alarm data from CSV files, cross-referencing with sensor descriptions from an XLSX file. The script will provide comprehensive statistics and insights about alarm events across monitoring points.

Data Structure Analysis

Based on the CSV file structure:

  • Alarm_Id: Unique identifier for each alarm event
  • Sensor_Id: Identifies the monitoring point
  • Date: Timestamp when the alarm/warning/error occurred
  • Description: Details about the alarm event (e.g., "Hi Alarm: 51.3>=46.0F", "Normal 42.5F", "Error: Comm Loss Error")
  • LogTime: Timestamp when the event was logged

Implementation Plan

Phase 1: Data Loading and Preprocessing

  1. Load the CSV alarm data using pandas
  2. Load the sensor report XLSX file to get sensor descriptions
  3. Parse alarm descriptions to categorize events (Normal, Alarm, Warning, Error)
  4. Extract numeric values and thresholds from alarm descriptions
  5. Identify alarm start and end events to calculate durations

Phase 2: Data Processing and Pairing

  1. Pair start events (Alarm/Warning/Error) with corresponding end events (Normal)
  2. Calculate duration for each alarm event
  3. Handle edge cases (unpaired events, overlapping events)
  4. Create a structured dataset of complete alarm events

Phase 3: Basic Analysis

  1. Count alarm events by type (Alarm, Warning, Error) for each sensor
  2. Calculate min/max/average duration for each alarm type per sensor
  3. Generate summary statistics across all sensors
  4. Identify most problematic sensors (highest number of events, longest durations)

Phase 4: Advanced Analysis

  1. Time-based analysis:
    • Frequency of events by hour of day, day of week
    • Trend analysis over time periods
    • Seasonal patterns if data spans multiple months
  2. Alarm correlation analysis:
    • Identify sensors that frequently alarm together
    • Determine if specific alarm types lead to others
  3. Severity analysis:
    • Weighted scoring based on alarm type and duration
    • Ranking sensors by overall impact

Phase 5: Additional Valuable Metrics

  1. MTBF (Mean Time Between Failures): Average time between consecutive alarm events for each sensor
  2. Alarm Churn: Rate of alarm state changes for each sensor
  3. Recovery Time: Time taken to return to normal state after an alarm
  4. Alarm Escalation: Percentage of warnings that escalate to alarms
  5. Persistence Analysis: How long alarms typically last before being resolved
  6. Peak Time Identification: Time periods with highest alarm frequency
  7. False Alarm Rate: Estimate of alarms that return to normal quickly
  8. Critical Sensor Identification: Sensors with highest frequency of high-severity events

Phase 6: Visualization and Reporting

  1. Generate summary reports in console and optionally save to file
  2. Create visualizations (matplotlib/seaborn):
    • Bar charts for alarm counts by sensor and type
    • Box plots for duration analysis
    • Time series plots for alarm frequency over time
    • Heatmaps for alarm correlation
  3. Export detailed analysis results to CSV files

Phase 7: Output and Export

  1. Create summary tables showing:
    • Sensor-wise breakdown of alarm types and durations
    • Top N problematic sensors
    • Time-based trends
  2. Export processed data for further analysis
  3. Generate a comprehensive report file

Technical Implementation Details

Libraries to Use:

  • pandas: For data manipulation and analysis
  • numpy: For numerical operations
  • matplotlib/seaborn: For visualizations
  • openpyxl: For reading XLSX files
  • re: For parsing alarm descriptions
  • datetime: For time-based analysis

Data Processing Steps:

  1. Parse alarm descriptions using regular expressions to identify:
    • Alarm type (Hi/Lo Alarm/Warning, Error, Normal)
    • Measured value
    • Threshold value
    • Unit of measurement
  2. Create a mapping between Sensor_Id and sensor descriptions from XLSX
  3. For each sensor, pair alarm start events with corresponding normal end events
  4. Calculate duration between paired events
  5. Aggregate statistics by sensor and alarm type

Alarm Type Classification:

  • Error: Events containing "Error" in description
  • Alarm: Events containing "Alarm" but not "Warning"
  • Warning: Events containing "Warning"
  • Normal: Events indicating return to normal state

Key Metrics to Calculate:

For each sensor:

  • Count of each alarm type
  • Min/Max/Average duration for each alarm type
  • Total alarm time percentage
  • Alarm frequency rate
  • Average time to recovery
  • Percentage of events that escalate

Expected Deliverables

  1. Main analysis script (alarm_analyzer.py)
  2. Configuration file for customization
  3. Sample output files demonstrating analysis results
  4. Documentation on how to run the script and interpret results
  5. ProjectPlan.md (this document)

Enhanced Features Implemented

Enhanced Group-Based Analysis

  1. Total Sensors Per Group: Added the total number of sensors in each group according to the sensor report
  2. Alarm Coverage Percentage: Added percentage of monitoring points that experienced alarms
  3. Alarm Time Percentage: Added percentage of time the group's sensors spent in alarm condition

Enhanced Output Files

  1. All sensor-specific output files now include sensor names and group information:
    • sensor_statistics.csv
    • top_sensors_by_alarm_count.csv
    • top_sensors_by_avg_duration.csv
    • top_sensors_by_max_duration.csv
    • top_sensors_by_severity_score.csv

Enhanced Plotting Functionality

  1. All sensor-specific plots now display sensor names instead of just IDs
  2. Added comprehensive group-based visualizations:
    • Group composition analysis
    • Alarm type distribution by group
    • Group alarm intensity metrics

Enhanced Features Implemented

Uptime/Downtime Metrics

  1. Error-based downtime: Calculates the total duration of all "Error" events across all sensors as a percentage of the total time period
  2. Alarm/Warning-based downtime: Calculates the total duration of all "Alarm" and "Warning" events across all sensors as a percentage of the total time period
  3. System-level uptime metrics: Time-based calculation showing the percentage of time that any sensor was in error or alarm/warning state
  4. Per-sensor and per-group metrics: Individual sensor and group uptime/downtime percentages
  5. New output files:
    • system_uptime_summary.csv - Overall system uptime metrics
    • sensor_error_uptime_metrics.csv - Per-sensor error-based uptime metrics
    • sensor_alarm_warning_uptime_metrics.csv - Per-sensor alarm/warning-based uptime metrics
    • group_error_uptime_metrics.csv - Per-group error-based uptime metrics
    • group_alarm_warning_uptime_metrics.csv - Per-group alarm/warning-based uptime metrics
  6. Comprehensive group inclusion: All output files covering groups now include all groups, including those with 0 errors or warnings, allowing for identification of systems with 100% uptime

Optional Group Exclusion Feature

  1. Create an optional configuration file (e.g., exclusion_config.json or groups_to_skip.txt) that allows users to specify groups to exclude from analysis
  2. Implementation approach:
    • Add a new parameter to the AlarmAnalyzer class to accept an exclusion file path
    • Parse the exclusion file to get a list of groups to skip
    • Filter out sensor data belonging to excluded groups before analysis
    • Add logging to indicate which groups were excluded
  3. Configuration file format options:
    • JSON format: {"excluded_groups": ["GroupName1", "GroupName2"]}
    • Simple text format: one group name per line
    • CSV format: for more complex exclusion rules
  4. Benefits:
    • Allows users to exclude groups with known issues or maintenance periods
    • Provides cleaner analysis results when certain groups have anomalous data
    • Maintains flexibility without permanently modifying the source data
  5. Implementation details:
    • Add preprocessing step to filter out excluded groups before any analysis
    • Update all analysis functions to work with the filtered dataset
    • Maintain separate statistics for excluded groups if needed for reference

Future Enhancement Plans