f08a1a9bf5
Python project for analyzing alarm data from building monitoring systems. Includes alarm analyzer, plotting, tests, and source data files.
174 lines
8.2 KiB
Markdown
174 lines
8.2 KiB
Markdown
# Alarm Data Analysis Project Plan
|
|
|
|
## Overview
|
|
This project will develop a Python script to analyze alarm data from CSV files, cross-referencing with sensor descriptions from an XLSX file. The script will provide comprehensive statistics and insights about alarm events across monitoring points.
|
|
|
|
## Data Structure Analysis
|
|
Based on the CSV file structure:
|
|
- **Alarm_Id**: Unique identifier for each alarm event
|
|
- **Sensor_Id**: Identifies the monitoring point
|
|
- **Date**: Timestamp when the alarm/warning/error occurred
|
|
- **Description**: Details about the alarm event (e.g., "Hi Alarm: 51.3>=46.0F", "Normal 42.5F", "Error: Comm Loss Error")
|
|
- **LogTime**: Timestamp when the event was logged
|
|
|
|
## Implementation Plan
|
|
|
|
### Phase 1: Data Loading and Preprocessing
|
|
1. Load the CSV alarm data using pandas
|
|
2. Load the sensor report XLSX file to get sensor descriptions
|
|
3. Parse alarm descriptions to categorize events (Normal, Alarm, Warning, Error)
|
|
4. Extract numeric values and thresholds from alarm descriptions
|
|
5. Identify alarm start and end events to calculate durations
|
|
|
|
### Phase 2: Data Processing and Pairing
|
|
1. Pair start events (Alarm/Warning/Error) with corresponding end events (Normal)
|
|
2. Calculate duration for each alarm event
|
|
3. Handle edge cases (unpaired events, overlapping events)
|
|
4. Create a structured dataset of complete alarm events
|
|
|
|
### Phase 3: Basic Analysis
|
|
1. Count alarm events by type (Alarm, Warning, Error) for each sensor
|
|
2. Calculate min/max/average duration for each alarm type per sensor
|
|
3. Generate summary statistics across all sensors
|
|
4. Identify most problematic sensors (highest number of events, longest durations)
|
|
|
|
### Phase 4: Advanced Analysis
|
|
1. Time-based analysis:
|
|
- Frequency of events by hour of day, day of week
|
|
- Trend analysis over time periods
|
|
- Seasonal patterns if data spans multiple months
|
|
2. Alarm correlation analysis:
|
|
- Identify sensors that frequently alarm together
|
|
- Determine if specific alarm types lead to others
|
|
3. Severity analysis:
|
|
- Weighted scoring based on alarm type and duration
|
|
- Ranking sensors by overall impact
|
|
|
|
### Phase 5: Additional Valuable Metrics
|
|
1. **MTBF (Mean Time Between Failures)**: Average time between consecutive alarm events for each sensor
|
|
2. **Alarm Churn**: Rate of alarm state changes for each sensor
|
|
3. **Recovery Time**: Time taken to return to normal state after an alarm
|
|
4. **Alarm Escalation**: Percentage of warnings that escalate to alarms
|
|
5. **Persistence Analysis**: How long alarms typically last before being resolved
|
|
6. **Peak Time Identification**: Time periods with highest alarm frequency
|
|
7. **False Alarm Rate**: Estimate of alarms that return to normal quickly
|
|
8. **Critical Sensor Identification**: Sensors with highest frequency of high-severity events
|
|
|
|
### Phase 6: Visualization and Reporting
|
|
1. Generate summary reports in console and optionally save to file
|
|
2. Create visualizations (matplotlib/seaborn):
|
|
- Bar charts for alarm counts by sensor and type
|
|
- Box plots for duration analysis
|
|
- Time series plots for alarm frequency over time
|
|
- Heatmaps for alarm correlation
|
|
3. Export detailed analysis results to CSV files
|
|
|
|
### Phase 7: Output and Export
|
|
1. Create summary tables showing:
|
|
- Sensor-wise breakdown of alarm types and durations
|
|
- Top N problematic sensors
|
|
- Time-based trends
|
|
2. Export processed data for further analysis
|
|
3. Generate a comprehensive report file
|
|
|
|
## Technical Implementation Details
|
|
|
|
### Libraries to Use:
|
|
- pandas: For data manipulation and analysis
|
|
- numpy: For numerical operations
|
|
- matplotlib/seaborn: For visualizations
|
|
- openpyxl: For reading XLSX files
|
|
- re: For parsing alarm descriptions
|
|
- datetime: For time-based analysis
|
|
|
|
### Data Processing Steps:
|
|
1. Parse alarm descriptions using regular expressions to identify:
|
|
- Alarm type (Hi/Lo Alarm/Warning, Error, Normal)
|
|
- Measured value
|
|
- Threshold value
|
|
- Unit of measurement
|
|
2. Create a mapping between Sensor_Id and sensor descriptions from XLSX
|
|
3. For each sensor, pair alarm start events with corresponding normal end events
|
|
4. Calculate duration between paired events
|
|
5. Aggregate statistics by sensor and alarm type
|
|
|
|
### Alarm Type Classification:
|
|
- **Error**: Events containing "Error" in description
|
|
- **Alarm**: Events containing "Alarm" but not "Warning"
|
|
- **Warning**: Events containing "Warning"
|
|
- **Normal**: Events indicating return to normal state
|
|
|
|
### Key Metrics to Calculate:
|
|
For each sensor:
|
|
- Count of each alarm type
|
|
- Min/Max/Average duration for each alarm type
|
|
- Total alarm time percentage
|
|
- Alarm frequency rate
|
|
- Average time to recovery
|
|
- Percentage of events that escalate
|
|
|
|
## Expected Deliverables
|
|
1. Main analysis script (alarm_analyzer.py)
|
|
2. Configuration file for customization
|
|
3. Sample output files demonstrating analysis results
|
|
4. Documentation on how to run the script and interpret results
|
|
5. ProjectPlan.md (this document)
|
|
|
|
## Enhanced Features Implemented
|
|
|
|
### Enhanced Group-Based Analysis
|
|
1. **Total Sensors Per Group**: Added the total number of sensors in each group according to the sensor report
|
|
2. **Alarm Coverage Percentage**: Added percentage of monitoring points that experienced alarms
|
|
3. **Alarm Time Percentage**: Added percentage of time the group's sensors spent in alarm condition
|
|
|
|
### Enhanced Output Files
|
|
1. All sensor-specific output files now include sensor names and group information:
|
|
- `sensor_statistics.csv`
|
|
- `top_sensors_by_alarm_count.csv`
|
|
- `top_sensors_by_avg_duration.csv`
|
|
- `top_sensors_by_max_duration.csv`
|
|
- `top_sensors_by_severity_score.csv`
|
|
|
|
### Enhanced Plotting Functionality
|
|
1. All sensor-specific plots now display sensor names instead of just IDs
|
|
2. Added comprehensive group-based visualizations:
|
|
- Group composition analysis
|
|
- Alarm type distribution by group
|
|
- Group alarm intensity metrics
|
|
|
|
## Enhanced Features Implemented
|
|
|
|
### Uptime/Downtime Metrics
|
|
1. **Error-based downtime**: Calculates the total duration of all "Error" events across all sensors as a percentage of the total time period
|
|
2. **Alarm/Warning-based downtime**: Calculates the total duration of all "Alarm" and "Warning" events across all sensors as a percentage of the total time period
|
|
3. **System-level uptime metrics**: Time-based calculation showing the percentage of time that any sensor was in error or alarm/warning state
|
|
4. **Per-sensor and per-group metrics**: Individual sensor and group uptime/downtime percentages
|
|
5. **New output files**:
|
|
- `system_uptime_summary.csv` - Overall system uptime metrics
|
|
- `sensor_error_uptime_metrics.csv` - Per-sensor error-based uptime metrics
|
|
- `sensor_alarm_warning_uptime_metrics.csv` - Per-sensor alarm/warning-based uptime metrics
|
|
- `group_error_uptime_metrics.csv` - Per-group error-based uptime metrics
|
|
- `group_alarm_warning_uptime_metrics.csv` - Per-group alarm/warning-based uptime metrics
|
|
6. **Comprehensive group inclusion**: All output files covering groups now include all groups, including those with 0 errors or warnings, allowing for identification of systems with 100% uptime
|
|
|
|
### Optional Group Exclusion Feature
|
|
1. **Create an optional configuration file** (e.g., `exclusion_config.json` or `groups_to_skip.txt`) that allows users to specify groups to exclude from analysis
|
|
2. **Implementation approach**:
|
|
- Add a new parameter to the AlarmAnalyzer class to accept an exclusion file path
|
|
- Parse the exclusion file to get a list of groups to skip
|
|
- Filter out sensor data belonging to excluded groups before analysis
|
|
- Add logging to indicate which groups were excluded
|
|
3. **Configuration file format options**:
|
|
- JSON format: `{"excluded_groups": ["GroupName1", "GroupName2"]}`
|
|
- Simple text format: one group name per line
|
|
- CSV format: for more complex exclusion rules
|
|
4. **Benefits**:
|
|
- Allows users to exclude groups with known issues or maintenance periods
|
|
- Provides cleaner analysis results when certain groups have anomalous data
|
|
- Maintains flexibility without permanently modifying the source data
|
|
5. **Implementation details**:
|
|
- Add preprocessing step to filter out excluded groups before any analysis
|
|
- Update all analysis functions to work with the filtered dataset
|
|
- Maintain separate statistics for excluded groups if needed for reference
|
|
|
|
## Future Enhancement Plans |