Adds known source formats (Chase credit card with headers, Wells Fargo checking headerless), description normalization strategy, cross-account transfer detection, source category hints, household income sources, and sample categorization rules based on real transaction data. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
9.9 KiB
SpendingAnalysis App Design
Overview
A desktop application for importing, categorizing, and analyzing personal spending data from bank and credit card CSV exports. Built with Python, PySide6, and SQLite. Designed for a household context where the primary user tracks their own spending alongside shared accounts and family member attribution.
Architecture
Three-layer architecture:
- Data Layer — SQLite via SQLAlchemy. Normalized schema for transactions, accounts, categories, rules, and household members.
- Service Layer — Python modules for CSV import/detection, categorization, duplicate detection, recurring charge analysis, and forecasting.
- UI Layer — PySide6 with sidebar navigation. Views: Import, Transactions, Analysis, Recurring, Settings.
Project Structure
src/
models/ # SQLAlchemy models (SQLite)
services/ # Import, categorization, analysis, forecasting
ui/ # PySide6 views and components
db.py # Database connection/session management
Dependencies
PySide6— Desktop UI frameworksqlalchemy— ORM / database managementpandas— CSV handling and analysismatplotlib— Charting (embedded in Qt via FigureCanvasQTAgg)
Data Model
household_members
- id, name, relationship
- Seeded with the primary user on first run
- Initial household: Andrew (self), Donna (wife), son
Known income sources
Based on real data:
- Andrew's salary — CIM Techniques via QuickBooks direct deposit (biweekly, ~$2,314)
- Donna's salary — Oasis Batch payroll (biweekly, varies ~$998–$1,185)
- Other deposits — Mobile deposits, Capital One transfers in
These are auto-detected as income and attributed to the corresponding household member via categorization rules.
accounts
- id, name, institution, type (checking/credit), owner (FK to household_members)
- Shared accounts (e.g., joint bank account) get a "Shared" designation
- Linked to CSV mappings so the app remembers which account a file belongs to
transactions
- id, date, amount (signed: negative=expense, positive=income), description, account (FK), category (FK), attributed_to (FK to household_members, optional)
attributed_toauto-fills from account owner by default, overridable per transaction or via rules
categories
- id, name, default_tag (needs/wants/savings), icon
- Ships with standard defaults: Housing, Groceries, Dining Out, Transportation, Utilities, Entertainment, Healthcare, Shopping, Subscriptions, Family, Transfer, Income, etc.
- Fully user-editable: add, rename, merge, delete
categorization_rules
- id, pattern, category (FK), tag_override (optional), attributed_to (FK, optional), priority
- Matched against transaction description, evaluated in priority order, first match wins
csv_mappings
- id, name, header_fingerprint, column_map (JSON), amount_logic, account (FK)
- Saved per-source so repeat imports are automatic
CSV Import & Mapping
Known Source Formats
Based on real sample data:
Chase credit card (e.g., Chase0372_Activity*.CSV):
- Has headers: Transaction Date, Post Date, Description, Category, Type, Amount, Memo
- Amount is signed (negative=purchase, positive=payment/return)
- Type column: Sale, Payment, Return
- Includes Chase's own category assignments — import as initial suggestions
Wells Fargo checking (e.g., Checking1.csv):
- No headers — five positional columns: Date, Amount, Flag ("*"), Check Number, Description
- Amount is signed (negative=debit, positive=credit)
- Descriptions are very verbose with embedded authorization dates, card numbers, and reference IDs
- Requires description normalization to extract core merchant/payee
Headerless CSV support
The mapping wizard handles files with no headers by showing column previews with positional indices. User assigns meaning by column position rather than header name.
Description normalization
The service layer strips noise from transaction descriptions before storage:
- Authorization dates ("PURCHASE AUTHORIZED ON 02/06")
- Card numbers ("CARD 5360")
- Reference IDs ("S586033096695382", "REF #OP0WS99NKQ")
- Redundant location/phone details
The cleaned description is stored for display and rule matching. The raw description is preserved in a separate field for reference.
Source category hints
When a CSV includes pre-assigned categories (like Chase does), the import process can use these as initial category suggestions. The user can accept, override, or ignore them.
Cross-account transfer detection
The import service detects matching transfer pairs across accounts. For example:
- Checking: "CHASE CREDIT CRD EPAY 260128 ... -$1,461.35"
- Chase: "Payment Thank You - Web ... +$1,461.35"
These are linked as transfers so they don't double-count as spending or income.
First import from a new source
- App reads CSV, detects delimiter, and checks for headers vs. headerless format
- Mapping wizard shows a preview of the first few rows
- User maps columns to normalized fields: date, amount, description, and optionally reference number, balance, source category
- User specifies amount logic (single signed column, separate debit/credit columns, type column)
- User selects or creates the account this CSV belongs to
- Mapping saved to
csv_mappingskeyed on column header fingerprint (or column count + sample patterns for headerless files)
Subsequent imports
- App matches CSV headers (or structure) to a saved mapping
- Confirmation screen shows: transaction count, date range, duplicates detected
- User confirms before committing
Duplicate detection
Composite key: date + amount + description + account. Potential duplicates are flagged for user confirmation rather than silently skipped. Handles overlapping date range re-imports.
Normalization
Amounts normalized to signed values. The mapping wizard captures the amount logic per source to handle different bank formats.
Post-import
Categorization engine runs on new transactions. Unmatched transactions flagged for manual review.
Categorization Engine
Rule-based categorization
Rules match patterns against transaction descriptions and assign: category, optional tag override, optional person attribution. Examples from real data:
CIMTECHNIQUES-> Income, attributed to AndrewOASISBATCH PAYROLL-> Income, attributed to DonnaCHASE CREDIT CRD EPAY-> TransferCAPITAL ONE TRANSFER-> TransferAMEX EPAYMENT-> Transfer, attributed to DonnaFREEDOM MTG PYMTS-> Housing, Needs (mortgage)DOMINION ENERGY-> Utilities, NeedsHELLOFRESH-> Groceries, NeedsNetflix.com-> Subscriptions, WantsPUBLIX|ALDI|PIGGLY WIGGLY|WAL-MART-> Groceries, NeedsCHICK-FIL-A|KFC|MCDONALD-> Dining Out, WantsVW CREDIT-> Transportation, Needs (auto loan)FARM BUREAU INS-> Insurance, NeedsWSFS LOAN-> Debt Payment, NeedsWAY2SAVE SAVINGS-> Transfer (savings)
Auto-rule creation
When a user manually categorizes a transaction, the app offers to create a rule from the merchant name. Rule set grows naturally with use.
Extensible interface
class Categorizer(Protocol):
def categorize(self, transaction) -> CategoryResult | None
Rule-based engine implements this. Future AI implementation can be chained as a fallback without changing the rest of the app.
Uncategorized queue
Transactions that don't match any rule appear in a review queue in the Transactions view for manual categorization.
Analysis & Visualization
Spending Over Time
- Bar/line chart showing total spending by day, week, or month (toggle)
- Filterable by account, person, category, tag
- Stacked bar variant for category proportions over time
- Hover for details
Category Breakdowns
- Donut chart for a selected time period
- Click a slice to drill into that category's transactions
- Horizontal bar chart ranking categories by total spend
- Needs/wants/savings summary bar showing the ratio
Recurring Charges
- Pattern detection: same merchant, similar amount (within ~10%), regular intervals
- Results list: merchant, typical amount, frequency, estimated annual cost
- User confirms or dismisses each detection
- Summary card showing total monthly/annual subscription burden
Forecasting
- Combines historical category averages (weighted toward recent months) with confirmed recurring charges
- Month-ahead: stacked bar alongside recent actual months
- Year-ahead: extrapolation with trend line
- Clearly labels high-confidence (recurring) vs. lower-confidence (historical average) projections
- "What if" mode: toggle recurring charges on/off, adjust category sliders, live projection updates
- Explicitly labeled as estimates, no false precision
UI Layout & Styling
Navigation
Fixed left sidebar with icon + label: Import, Transactions, Analysis, Recurring, Settings. Collapses to icons-only on narrow windows.
Transactions view
Searchable, sortable table. Columns: date, description, amount, account, category, tags, person. Inline category editing with auto-rule prompt. Top filters for date range, account, person, category, uncategorized-only toggle.
Theming
Dark theme by default, light theme toggle. Qt stylesheets (QSS), one file per theme. Modern typography, subtle borders, consistent spacing. Accent color for interactive elements.
Charts
Matplotlib embedded via FigureCanvasQTAgg. Styled to match app theme. Consistent color palette across all charts (same category = same color everywhere).
Performance
Background threads for import and analysis operations with progress indicators. Drag-and-drop CSV file import supported.
Scope
V1
- CSV import with mapping wizard
- Rule-based auto-categorization (extensible for future AI)
- Household member attribution
- Spending over time trends
- Category breakdowns with needs/wants/savings
- Recurring charge detection
- Forecasting with "what if" mode
- Dark/light theming
Deferred
- AI-based categorization
- Budget vs. actual tracking
- OFX/QFX file format support