# SpendingAnalysis App Design ## Overview A desktop application for importing, categorizing, and analyzing personal spending data from bank and credit card CSV exports. Built with Python, PySide6, and SQLite. Designed for a household context where the primary user tracks their own spending alongside shared accounts and family member attribution. ## Architecture Three-layer architecture: - **Data Layer** — SQLite via SQLAlchemy. Normalized schema for transactions, accounts, categories, rules, and household members. - **Service Layer** — Python modules for CSV import/detection, categorization, duplicate detection, recurring charge analysis, and forecasting. - **UI Layer** — PySide6 with sidebar navigation. Views: Import, Transactions, Analysis, Recurring, Settings. ### Project Structure ``` src/ models/ # SQLAlchemy models (SQLite) services/ # Import, categorization, analysis, forecasting ui/ # PySide6 views and components db.py # Database connection/session management ``` ### Dependencies - `PySide6` — Desktop UI framework - `sqlalchemy` — ORM / database management - `pandas` — CSV handling and analysis - `matplotlib` — Charting (embedded in Qt via FigureCanvasQTAgg) ## Data Model ### household_members - id, name, relationship - Seeded with the primary user on first run - Initial household: Andrew (self), Donna (wife), son ### Known income sources Based on real data: - **Andrew's salary** — CIM Techniques via QuickBooks direct deposit (biweekly, ~$2,314) - **Donna's salary** — Oasis Batch payroll (biweekly, varies ~$998–$1,185) - **Other deposits** — Mobile deposits, Capital One transfers in These are auto-detected as income and attributed to the corresponding household member via categorization rules. ### accounts - id, name, institution, type (checking/credit), owner (FK to household_members) - Shared accounts (e.g., joint bank account) get a "Shared" designation - Linked to CSV mappings so the app remembers which account a file belongs to ### transactions - id, date, amount (signed: negative=expense, positive=income), description, account (FK), category (FK), attributed_to (FK to household_members, optional) - `attributed_to` auto-fills from account owner by default, overridable per transaction or via rules ### categories - id, name, default_tag (needs/wants/savings), icon - Ships with standard defaults: Housing, Groceries, Dining Out, Transportation, Utilities, Entertainment, Healthcare, Shopping, Subscriptions, Family, Transfer, Income, etc. - Fully user-editable: add, rename, merge, delete ### categorization_rules - id, pattern, category (FK), tag_override (optional), attributed_to (FK, optional), priority - Matched against transaction description, evaluated in priority order, first match wins ### csv_mappings - id, name, header_fingerprint, column_map (JSON), amount_logic, account (FK) - Saved per-source so repeat imports are automatic ## CSV Import & Mapping ### Known Source Formats Based on real sample data: **Chase credit card** (e.g., `Chase0372_Activity*.CSV`): - Has headers: Transaction Date, Post Date, Description, Category, Type, Amount, Memo - Amount is signed (negative=purchase, positive=payment/return) - Type column: Sale, Payment, Return - Includes Chase's own category assignments — import as initial suggestions **Wells Fargo checking** (e.g., `Checking1.csv`): - No headers — five positional columns: Date, Amount, Flag ("*"), Check Number, Description - Amount is signed (negative=debit, positive=credit) - Descriptions are very verbose with embedded authorization dates, card numbers, and reference IDs - Requires description normalization to extract core merchant/payee ### Headerless CSV support The mapping wizard handles files with no headers by showing column previews with positional indices. User assigns meaning by column position rather than header name. ### Description normalization The service layer strips noise from transaction descriptions before storage: - Authorization dates ("PURCHASE AUTHORIZED ON 02/06") - Card numbers ("CARD 5360") - Reference IDs ("S586033096695382", "REF #OP0WS99NKQ") - Redundant location/phone details The cleaned description is stored for display and rule matching. The raw description is preserved in a separate field for reference. ### Source category hints When a CSV includes pre-assigned categories (like Chase does), the import process can use these as initial category suggestions. The user can accept, override, or ignore them. ### Cross-account transfer detection The import service detects matching transfer pairs across accounts. For example: - Checking: "CHASE CREDIT CRD EPAY 260128 ... -$1,461.35" - Chase: "Payment Thank You - Web ... +$1,461.35" These are linked as transfers so they don't double-count as spending or income. ### First import from a new source 1. App reads CSV, detects delimiter, and checks for headers vs. headerless format 2. Mapping wizard shows a preview of the first few rows 3. User maps columns to normalized fields: date, amount, description, and optionally reference number, balance, source category 4. User specifies amount logic (single signed column, separate debit/credit columns, type column) 5. User selects or creates the account this CSV belongs to 6. Mapping saved to `csv_mappings` keyed on column header fingerprint (or column count + sample patterns for headerless files) ### Subsequent imports 1. App matches CSV headers (or structure) to a saved mapping 2. Confirmation screen shows: transaction count, date range, duplicates detected 3. User confirms before committing ### Duplicate detection Composite key: date + amount + description + account. Potential duplicates are flagged for user confirmation rather than silently skipped. Handles overlapping date range re-imports. ### Normalization Amounts normalized to signed values. The mapping wizard captures the amount logic per source to handle different bank formats. ### Post-import Categorization engine runs on new transactions. Unmatched transactions flagged for manual review. ## Categorization Engine ### Rule-based categorization Rules match patterns against transaction descriptions and assign: category, optional tag override, optional person attribution. Examples from real data: - `CIMTECHNIQUES` -> Income, attributed to Andrew - `OASISBATCH PAYROLL` -> Income, attributed to Donna - `CHASE CREDIT CRD EPAY` -> Transfer - `CAPITAL ONE TRANSFER` -> Transfer - `AMEX EPAYMENT` -> Transfer, attributed to Donna - `FREEDOM MTG PYMTS` -> Housing, Needs (mortgage) - `DOMINION ENERGY` -> Utilities, Needs - `HELLOFRESH` -> Groceries, Needs - `Netflix.com` -> Subscriptions, Wants - `PUBLIX|ALDI|PIGGLY WIGGLY|WAL-MART` -> Groceries, Needs - `CHICK-FIL-A|KFC|MCDONALD` -> Dining Out, Wants - `VW CREDIT` -> Transportation, Needs (auto loan) - `FARM BUREAU INS` -> Insurance, Needs - `WSFS LOAN` -> Debt Payment, Needs - `WAY2SAVE SAVINGS` -> Transfer (savings) ### Auto-rule creation When a user manually categorizes a transaction, the app offers to create a rule from the merchant name. Rule set grows naturally with use. ### Extensible interface ```python class Categorizer(Protocol): def categorize(self, transaction) -> CategoryResult | None ``` Rule-based engine implements this. Future AI implementation can be chained as a fallback without changing the rest of the app. ### Uncategorized queue Transactions that don't match any rule appear in a review queue in the Transactions view for manual categorization. ## Analysis & Visualization ### Spending Over Time - Bar/line chart showing total spending by day, week, or month (toggle) - Filterable by account, person, category, tag - Stacked bar variant for category proportions over time - Hover for details ### Category Breakdowns - Donut chart for a selected time period - Click a slice to drill into that category's transactions - Horizontal bar chart ranking categories by total spend - Needs/wants/savings summary bar showing the ratio ### Recurring Charges - Pattern detection: same merchant, similar amount (within ~10%), regular intervals - Results list: merchant, typical amount, frequency, estimated annual cost - User confirms or dismisses each detection - Summary card showing total monthly/annual subscription burden ### Forecasting - Combines historical category averages (weighted toward recent months) with confirmed recurring charges - Month-ahead: stacked bar alongside recent actual months - Year-ahead: extrapolation with trend line - Clearly labels high-confidence (recurring) vs. lower-confidence (historical average) projections - "What if" mode: toggle recurring charges on/off, adjust category sliders, live projection updates - Explicitly labeled as estimates, no false precision ## UI Layout & Styling ### Navigation Fixed left sidebar with icon + label: Import, Transactions, Analysis, Recurring, Settings. Collapses to icons-only on narrow windows. ### Transactions view Searchable, sortable table. Columns: date, description, amount, account, category, tags, person. Inline category editing with auto-rule prompt. Top filters for date range, account, person, category, uncategorized-only toggle. ### Theming Dark theme by default, light theme toggle. Qt stylesheets (QSS), one file per theme. Modern typography, subtle borders, consistent spacing. Accent color for interactive elements. ### Charts Matplotlib embedded via FigureCanvasQTAgg. Styled to match app theme. Consistent color palette across all charts (same category = same color everywhere). ### Performance Background threads for import and analysis operations with progress indicators. Drag-and-drop CSV file import supported. ## Scope ### V1 - CSV import with mapping wizard - Rule-based auto-categorization (extensible for future AI) - Household member attribution - Spending over time trends - Category breakdowns with needs/wants/savings - Recurring charge detection - Forecasting with "what if" mode - Dark/light theming ### Deferred - AI-based categorization - Budget vs. actual tracking - OFX/QFX file format support