Adds known source formats (Chase credit card with headers, Wells Fargo checking headerless), description normalization strategy, cross-account transfer detection, source category hints, household income sources, and sample categorization rules based on real transaction data. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
258 lines
9.9 KiB
Markdown
258 lines
9.9 KiB
Markdown
# SpendingAnalysis App Design
|
||
|
||
## Overview
|
||
|
||
A desktop application for importing, categorizing, and analyzing personal spending data from bank and credit card CSV exports. Built with Python, PySide6, and SQLite. Designed for a household context where the primary user tracks their own spending alongside shared accounts and family member attribution.
|
||
|
||
## Architecture
|
||
|
||
Three-layer architecture:
|
||
|
||
- **Data Layer** — SQLite via SQLAlchemy. Normalized schema for transactions, accounts, categories, rules, and household members.
|
||
- **Service Layer** — Python modules for CSV import/detection, categorization, duplicate detection, recurring charge analysis, and forecasting.
|
||
- **UI Layer** — PySide6 with sidebar navigation. Views: Import, Transactions, Analysis, Recurring, Settings.
|
||
|
||
### Project Structure
|
||
|
||
```
|
||
src/
|
||
models/ # SQLAlchemy models (SQLite)
|
||
services/ # Import, categorization, analysis, forecasting
|
||
ui/ # PySide6 views and components
|
||
db.py # Database connection/session management
|
||
```
|
||
|
||
### Dependencies
|
||
|
||
- `PySide6` — Desktop UI framework
|
||
- `sqlalchemy` — ORM / database management
|
||
- `pandas` — CSV handling and analysis
|
||
- `matplotlib` — Charting (embedded in Qt via FigureCanvasQTAgg)
|
||
|
||
## Data Model
|
||
|
||
### household_members
|
||
|
||
- id, name, relationship
|
||
- Seeded with the primary user on first run
|
||
- Initial household: Andrew (self), Donna (wife), son
|
||
|
||
### Known income sources
|
||
|
||
Based on real data:
|
||
- **Andrew's salary** — CIM Techniques via QuickBooks direct deposit (biweekly, ~$2,314)
|
||
- **Donna's salary** — Oasis Batch payroll (biweekly, varies ~$998–$1,185)
|
||
- **Other deposits** — Mobile deposits, Capital One transfers in
|
||
|
||
These are auto-detected as income and attributed to the corresponding household member via categorization rules.
|
||
|
||
### accounts
|
||
|
||
- id, name, institution, type (checking/credit), owner (FK to household_members)
|
||
- Shared accounts (e.g., joint bank account) get a "Shared" designation
|
||
- Linked to CSV mappings so the app remembers which account a file belongs to
|
||
|
||
### transactions
|
||
|
||
- id, date, amount (signed: negative=expense, positive=income), description, account (FK), category (FK), attributed_to (FK to household_members, optional)
|
||
- `attributed_to` auto-fills from account owner by default, overridable per transaction or via rules
|
||
|
||
### categories
|
||
|
||
- id, name, default_tag (needs/wants/savings), icon
|
||
- Ships with standard defaults: Housing, Groceries, Dining Out, Transportation, Utilities, Entertainment, Healthcare, Shopping, Subscriptions, Family, Transfer, Income, etc.
|
||
- Fully user-editable: add, rename, merge, delete
|
||
|
||
### categorization_rules
|
||
|
||
- id, pattern, category (FK), tag_override (optional), attributed_to (FK, optional), priority
|
||
- Matched against transaction description, evaluated in priority order, first match wins
|
||
|
||
### csv_mappings
|
||
|
||
- id, name, header_fingerprint, column_map (JSON), amount_logic, account (FK)
|
||
- Saved per-source so repeat imports are automatic
|
||
|
||
## CSV Import & Mapping
|
||
|
||
### Known Source Formats
|
||
|
||
Based on real sample data:
|
||
|
||
**Chase credit card** (e.g., `Chase0372_Activity*.CSV`):
|
||
- Has headers: Transaction Date, Post Date, Description, Category, Type, Amount, Memo
|
||
- Amount is signed (negative=purchase, positive=payment/return)
|
||
- Type column: Sale, Payment, Return
|
||
- Includes Chase's own category assignments — import as initial suggestions
|
||
|
||
**Wells Fargo checking** (e.g., `Checking1.csv`):
|
||
- No headers — five positional columns: Date, Amount, Flag ("*"), Check Number, Description
|
||
- Amount is signed (negative=debit, positive=credit)
|
||
- Descriptions are very verbose with embedded authorization dates, card numbers, and reference IDs
|
||
- Requires description normalization to extract core merchant/payee
|
||
|
||
### Headerless CSV support
|
||
|
||
The mapping wizard handles files with no headers by showing column previews with positional indices. User assigns meaning by column position rather than header name.
|
||
|
||
### Description normalization
|
||
|
||
The service layer strips noise from transaction descriptions before storage:
|
||
- Authorization dates ("PURCHASE AUTHORIZED ON 02/06")
|
||
- Card numbers ("CARD 5360")
|
||
- Reference IDs ("S586033096695382", "REF #OP0WS99NKQ")
|
||
- Redundant location/phone details
|
||
|
||
The cleaned description is stored for display and rule matching. The raw description is preserved in a separate field for reference.
|
||
|
||
### Source category hints
|
||
|
||
When a CSV includes pre-assigned categories (like Chase does), the import process can use these as initial category suggestions. The user can accept, override, or ignore them.
|
||
|
||
### Cross-account transfer detection
|
||
|
||
The import service detects matching transfer pairs across accounts. For example:
|
||
- Checking: "CHASE CREDIT CRD EPAY 260128 ... -$1,461.35"
|
||
- Chase: "Payment Thank You - Web ... +$1,461.35"
|
||
|
||
These are linked as transfers so they don't double-count as spending or income.
|
||
|
||
### First import from a new source
|
||
|
||
1. App reads CSV, detects delimiter, and checks for headers vs. headerless format
|
||
2. Mapping wizard shows a preview of the first few rows
|
||
3. User maps columns to normalized fields: date, amount, description, and optionally reference number, balance, source category
|
||
4. User specifies amount logic (single signed column, separate debit/credit columns, type column)
|
||
5. User selects or creates the account this CSV belongs to
|
||
6. Mapping saved to `csv_mappings` keyed on column header fingerprint (or column count + sample patterns for headerless files)
|
||
|
||
### Subsequent imports
|
||
|
||
1. App matches CSV headers (or structure) to a saved mapping
|
||
2. Confirmation screen shows: transaction count, date range, duplicates detected
|
||
3. User confirms before committing
|
||
|
||
### Duplicate detection
|
||
|
||
Composite key: date + amount + description + account. Potential duplicates are flagged for user confirmation rather than silently skipped. Handles overlapping date range re-imports.
|
||
|
||
### Normalization
|
||
|
||
Amounts normalized to signed values. The mapping wizard captures the amount logic per source to handle different bank formats.
|
||
|
||
### Post-import
|
||
|
||
Categorization engine runs on new transactions. Unmatched transactions flagged for manual review.
|
||
|
||
## Categorization Engine
|
||
|
||
### Rule-based categorization
|
||
|
||
Rules match patterns against transaction descriptions and assign: category, optional tag override, optional person attribution. Examples from real data:
|
||
|
||
- `CIMTECHNIQUES` -> Income, attributed to Andrew
|
||
- `OASISBATCH PAYROLL` -> Income, attributed to Donna
|
||
- `CHASE CREDIT CRD EPAY` -> Transfer
|
||
- `CAPITAL ONE TRANSFER` -> Transfer
|
||
- `AMEX EPAYMENT` -> Transfer, attributed to Donna
|
||
- `FREEDOM MTG PYMTS` -> Housing, Needs (mortgage)
|
||
- `DOMINION ENERGY` -> Utilities, Needs
|
||
- `HELLOFRESH` -> Groceries, Needs
|
||
- `Netflix.com` -> Subscriptions, Wants
|
||
- `PUBLIX|ALDI|PIGGLY WIGGLY|WAL-MART` -> Groceries, Needs
|
||
- `CHICK-FIL-A|KFC|MCDONALD` -> Dining Out, Wants
|
||
- `VW CREDIT` -> Transportation, Needs (auto loan)
|
||
- `FARM BUREAU INS` -> Insurance, Needs
|
||
- `WSFS LOAN` -> Debt Payment, Needs
|
||
- `WAY2SAVE SAVINGS` -> Transfer (savings)
|
||
|
||
### Auto-rule creation
|
||
|
||
When a user manually categorizes a transaction, the app offers to create a rule from the merchant name. Rule set grows naturally with use.
|
||
|
||
### Extensible interface
|
||
|
||
```python
|
||
class Categorizer(Protocol):
|
||
def categorize(self, transaction) -> CategoryResult | None
|
||
```
|
||
|
||
Rule-based engine implements this. Future AI implementation can be chained as a fallback without changing the rest of the app.
|
||
|
||
### Uncategorized queue
|
||
|
||
Transactions that don't match any rule appear in a review queue in the Transactions view for manual categorization.
|
||
|
||
## Analysis & Visualization
|
||
|
||
### Spending Over Time
|
||
|
||
- Bar/line chart showing total spending by day, week, or month (toggle)
|
||
- Filterable by account, person, category, tag
|
||
- Stacked bar variant for category proportions over time
|
||
- Hover for details
|
||
|
||
### Category Breakdowns
|
||
|
||
- Donut chart for a selected time period
|
||
- Click a slice to drill into that category's transactions
|
||
- Horizontal bar chart ranking categories by total spend
|
||
- Needs/wants/savings summary bar showing the ratio
|
||
|
||
### Recurring Charges
|
||
|
||
- Pattern detection: same merchant, similar amount (within ~10%), regular intervals
|
||
- Results list: merchant, typical amount, frequency, estimated annual cost
|
||
- User confirms or dismisses each detection
|
||
- Summary card showing total monthly/annual subscription burden
|
||
|
||
### Forecasting
|
||
|
||
- Combines historical category averages (weighted toward recent months) with confirmed recurring charges
|
||
- Month-ahead: stacked bar alongside recent actual months
|
||
- Year-ahead: extrapolation with trend line
|
||
- Clearly labels high-confidence (recurring) vs. lower-confidence (historical average) projections
|
||
- "What if" mode: toggle recurring charges on/off, adjust category sliders, live projection updates
|
||
- Explicitly labeled as estimates, no false precision
|
||
|
||
## UI Layout & Styling
|
||
|
||
### Navigation
|
||
|
||
Fixed left sidebar with icon + label: Import, Transactions, Analysis, Recurring, Settings. Collapses to icons-only on narrow windows.
|
||
|
||
### Transactions view
|
||
|
||
Searchable, sortable table. Columns: date, description, amount, account, category, tags, person. Inline category editing with auto-rule prompt. Top filters for date range, account, person, category, uncategorized-only toggle.
|
||
|
||
### Theming
|
||
|
||
Dark theme by default, light theme toggle. Qt stylesheets (QSS), one file per theme. Modern typography, subtle borders, consistent spacing. Accent color for interactive elements.
|
||
|
||
### Charts
|
||
|
||
Matplotlib embedded via FigureCanvasQTAgg. Styled to match app theme. Consistent color palette across all charts (same category = same color everywhere).
|
||
|
||
### Performance
|
||
|
||
Background threads for import and analysis operations with progress indicators. Drag-and-drop CSV file import supported.
|
||
|
||
## Scope
|
||
|
||
### V1
|
||
|
||
- CSV import with mapping wizard
|
||
- Rule-based auto-categorization (extensible for future AI)
|
||
- Household member attribution
|
||
- Spending over time trends
|
||
- Category breakdowns with needs/wants/savings
|
||
- Recurring charge detection
|
||
- Forecasting with "what if" mode
|
||
- Dark/light theming
|
||
|
||
### Deferred
|
||
|
||
- AI-based categorization
|
||
- Budget vs. actual tracking
|
||
- OFX/QFX file format support
|