Files
andy f08a1a9bf5 Initial commit: alarm analysis project
Python project for analyzing alarm data from building monitoring systems.
Includes alarm analyzer, plotting, tests, and source data files.
2026-02-26 09:03:54 -05:00

246 lines
10 KiB
Markdown

# Alarm Analysis
Analyze alarm data from building monitoring systems — pair alarm events, calculate durations, compute uptime metrics, and generate visualizations. Built for CSV alarm logs and XLSX sensor reports exported from systems like Cardinal.
## Table of Contents
- [Quick Start](#quick-start)
- [Inputs](#inputs)
- [Outputs](#outputs)
- [How It Works](#how-it-works)
- [Configuration](#configuration)
- [Visualizations](#visualizations)
- [Testing](#testing)
- [Project Structure](#project-structure)
- [Dependencies](#dependencies)
## Quick Start
```bash
# Set up virtual environment
python -m venv alarm_analysis_env
source alarm_analysis_env/Scripts/activate # Windows Git Bash
# or: alarm_analysis_env\Scripts\activate # Windows CMD
# Install dependencies
pip install pandas numpy matplotlib seaborn openpyxl
# Run the full analysis (outputs CSVs to output/)
python run_analysis.py
# Generate plots (outputs PNGs to plots/)
python create_plots.py
```
## Inputs
### 1. Alarm CSV (`CardinalAlarmsDec25.csv`)
Raw alarm log exported from the monitoring system. Required columns:
| Column | Type | Description | Example |
|--------|------|-------------|---------|
| `Alarm_Id` | int | Unique alarm event ID | `486258` |
| `Sensor_Id` | int | Numeric sensor identifier | `9273` |
| `Date` | datetime | When the alarm occurred | `2025-12-01 00:01:27.000` |
| `Description` | string | Alarm condition text | `Lo Warning: 68.0<=68.0F` |
| `LogTime` | datetime | When the event was logged | `2025-12-01 00:01:32.843` |
**Description patterns** the analyzer recognizes:
| Pattern | Example | Parsed As |
|---------|---------|-----------|
| Hi/Lo Alarm | `Hi Alarm: 51.3>=46.0F` | Type=Alarm, Value=51.3, Threshold=46.0, Unit=F |
| Hi/Lo Warning | `Lo Warning: 68.0<=68.0F` | Type=Warning, Value=68.0, Threshold=68.0, Unit=F |
| Error | `Error: Comm Loss Error 20.4>=20 min.` | Type=Error |
| Normal | `Normal 68.1F` | Type=Normal (resolves prior alarm) |
Supported units: `F`, `C`, `%RH`, `"H2O`
### 2. Sensor Report XLSX (`SensorReport Cardinal 2025-12-23_processed.xlsx`)
Sensor metadata exported from the monitoring system. Expected columns:
| Column | Description |
|--------|-------------|
| `ID` | Sensor ID (matches `Sensor_Id` in the alarm CSV) |
| `Group` | Logical grouping (e.g., room, zone, building area) |
| `Remote` | Remote unit identifier |
| `Name` | Human-readable sensor name |
| `Type` | Sensor type (temperature, humidity, etc.) |
| `Serial No` | Hardware serial number |
The XLSX may use a hierarchical layout where `Group` names appear only in the first row of each group. The analyzer handles this automatically via forward-fill. Both `header=0` (new format) and `header=4` (legacy format) are auto-detected.
### 3. Exclusion Config (optional)
Exclude specific sensor groups from analysis. Provide either format:
**JSON** (`exclusion_config.json`):
```json
{
"excluded_groups": [
"Maintenance Sensors",
"Decommissioned Wing"
]
}
```
**Plain text** (`groups_to_skip.txt`):
```
Maintenance Sensors
Decommissioned Wing
```
Pass the file path when creating the analyzer:
```python
analyzer = AlarmAnalyzer('alarms.csv', 'sensors.xlsx', exclusion_file_path='exclusion_config.json')
```
## Outputs
All outputs are generated in `output/` (CSVs) and `plots/` (PNGs).
### Core Analysis CSVs
| File | Description |
|------|-------------|
| `paired_alarm_events.csv` | Every alarm event paired with its resolution — includes sensor name/group, start/end times, duration, alarm type, values, thresholds, and how the alarm ended |
| `summary_by_alarm_type.csv` | Aggregate counts and duration stats (min/max/avg) per alarm type |
| `sensor_statistics.csv` | Per-sensor stats: alarm count, duration stats, with name and group |
### Rankings
| File | Ranked By |
|------|-----------|
| `top_sensors_by_alarm_count.csv` | Total alarm events per sensor |
| `top_sensors_by_avg_duration.csv` | Average alarm duration |
| `top_sensors_by_max_duration.csv` | Longest single alarm event |
| `top_sensors_by_severity_score.csv` | Severity score (type weight x duration) |
| `top_groups_by_alarm_count.csv` | Total alarm events per group |
| `top_groups_by_avg_duration.csv` | Average alarm duration per group |
| `top_groups_by_max_duration.csv` | Longest single alarm event per group |
| `top_groups_by_severity_score.csv` | Severity score per group |
### Time Analysis
| File | Description |
|------|-------------|
| `alarm_frequency_by_hour.csv` | Alarm count for each hour of day (0-23) |
| `alarm_frequency_by_day.csv` | Alarm count for each day of week |
### Group Analysis
| File | Description |
|------|-------------|
| `group_statistics.csv` | Per-group stats including total sensors, percentage of sensors that alarmed, and alarm time percentage |
| `alarm_type_distribution_by_group.csv` | Crosstab of alarm types per group |
### Uptime Metrics
| File | Description |
|------|-------------|
| `system_uptime_summary.csv` | System-wide uptime: total time span, cumulative downtime percentages, time-based uptime (per-hour bucket analysis) |
| `sensor_error_uptime_metrics.csv` | Per-sensor error-based uptime (communication failures) |
| `sensor_alarm_warning_uptime_metrics.csv` | Per-sensor alarm/warning-based uptime (operational issues) |
| `group_error_uptime_metrics.csv` | Per-group error-based uptime |
| `group_alarm_warning_uptime_metrics.csv` | Per-group alarm/warning-based uptime |
## How It Works
### Pipeline Overview
```
CSV + XLSX ──> Load & Map ──> Categorize ──> Pair Events ──> Analyze ──> Export
│ │ │
├─ Sensor ID → Name/Group │ ├─ Basic stats
└─ Exclude groups │ ├─ Advanced (MTBF, correlation, severity)
│ └─ Uptime metrics
Alarm Start ──> Normal (resolved)
Alarm Start ──> Different Alarm (transition)
Alarm Start ──> [nothing] (unresolved)
```
### Step-by-Step
1. **Load Data** — Read the alarm CSV and sensor report XLSX. Build a mapping from sensor IDs to human-readable names and groups. Enrich alarm records with sensor metadata. Filter out excluded groups.
2. **Categorize Alarms** — Parse each alarm's `Description` field with regex to extract the alarm type (Error, Alarm, Warning, Normal), measured value, threshold, and unit.
3. **Pair Events & Calculate Durations** — For each sensor, walk through events chronologically:
- An alarm-start event (Alarm, Warning, or Error) looks forward for resolution
- If a `Normal` event follows → alarm is **resolved**, duration is calculated
- If a different alarm type follows → recorded as a **transition** (e.g., "Transition to Alarm")
- If nothing follows → marked **unresolved**
4. **Basic Analysis** — Count alarms by type, sensor, and group. Compute duration statistics (min, max, average).
5. **Advanced Analysis**:
- **Hourly/daily frequency** — when alarms tend to occur
- **MTBF** (Mean Time Between Failures) — average time between consecutive alarms per sensor
- **Alarm correlation** — sensor pairs that alarm within 1-hour windows of each other
- **Severity scoring** — weighted by type (Error=3x, Alarm=2x, Warning=1x) multiplied by duration
- **Alarm escalation** — warnings that escalate to Alarm or Error within 1 hour
- **Group aggregates** — all metrics rolled up by sensor group
6. **Uptime Metrics** — Calculate downtime from error events (communication failures) and alarm/warning events (operational issues). Compute both cumulative percentages and time-bucketed system uptime using 1-hour intervals. Include all sensors and groups, even those with zero events.
7. **Export** — Write all results to CSV files in `output/`.
## Visualizations
Run `python create_plots.py` to generate PNG plots in `plots/`:
| Plot | Description |
|------|-------------|
| `alarm_dashboard.png` | 4-panel overview: alarm count by type, top 10 sensors, hourly frequency, daily frequency |
| `duration_analysis.png` | Box plots and histograms of alarm durations by type (log scale) |
| `sensor_analysis.png` | 4-panel: top sensors by count, avg duration, max duration, severity |
Additional group-based plots are generated when group data is available (group dashboard, group composition, alarm type distribution by group, alarm intensity per group).
Visualization imports (matplotlib, seaborn) are deferred so `run_analysis.py` can execute headless without a display.
## Testing
```bash
python test_changes.py # Validates code structure (methods, columns, exports exist)
python test_duration_fix.py # Tests event pairing and duration calculation
python test_mapping.py # Verifies sensor ID → name/group mapping
python test_enhanced_plotting.py # Tests plot data preparation logic (no rendering)
```
## Project Structure
```
AlarmAnalysis/
├── alarm_analyzer.py # Core AlarmAnalyzer class (all analysis logic)
├── run_analysis.py # Entry point: run full analysis, export CSVs
├── create_plots.py # Entry point: generate visualization PNGs
├── exclusion_config.json # Group exclusion config (JSON format)
├── groups_to_skip.txt # Group exclusion config (plain text format)
├── CardinalAlarmsDec25.csv # Input: alarm log data
├── SensorReport *.xlsx # Input: sensor metadata
├── test_changes.py # Test: code structure validation
├── test_duration_fix.py # Test: event pairing logic
├── test_mapping.py # Test: sensor ID mapping
├── test_enhanced_plotting.py # Test: plot data preparation
├── output/ # Generated CSV analysis results
└── plots/ # Generated PNG visualizations
```
## Dependencies
- **Python** 3.13+
- **pandas** — data manipulation and analysis
- **numpy** — numerical operations
- **matplotlib** — plotting (only needed for `create_plots.py`)
- **seaborn** — statistical visualizations (only needed for `create_plots.py`)
- **openpyxl** — reading XLSX sensor reports
Install all dependencies:
```bash
pip install pandas numpy matplotlib seaborn openpyxl
```