bus-bank — bank statement import and reconciliation-ready data (SDD)

bus-bank — bank statement import and reconciliation-ready data

Introduction and Overview

Bus Bank imports bank statement evidence into schema-validated datasets, normalizes transactions, and provides review outputs that can be reconciled into the journal. This SDD also defines a profile-driven ERP bank import contract so historical bank data can be imported through reusable mapping profiles instead of generated one-off scripts; that contract is currently specified but not yet implemented as a first-class workflow.

Requirements

FR-BNK-001 Bank import normalization. The module MUST import bank statement data into normalized datasets with deterministic ordering and stable identifiers. Acceptance criteria: imports create or update bank-imports.csv and bank-transactions.csv with schema validation.

FR-BNK-002 Review surface. The module MUST provide list outputs for review and reconciliation. Acceptance criteria: bus bank list emits deterministic transaction listings and fails with clear diagnostics on invalid filters.

FR-BNK-003 Init behavior. The module MUST provide an init command that creates the bank baseline datasets and schemas (bank-imports.csv, bank-transactions.csv and their schemas) when they are absent. When they already exist in full, init MUST print a warning to standard error and exit 0 without modifying anything. When they exist only partially, init MUST fail with a clear error and not write any file (see bus-init FR-INIT-004). Acceptance criteria: bus bank init is available; idempotent and partial-state behavior as specified.

FR-BNK-004 Profile-driven ERP bank import. The module MUST provide a first-class import workflow that maps ERP bank-export tables into canonical bank datasets using an explicit, versioned mapping profile. Acceptance criteria: import runs from a short command invocation that references a profile and source dataset(s), supports deterministic row selection (for example fiscal-year filters), status and direction normalization, counterparty and reference mapping, and appends canonical bank rows in deterministic order. Built-in robust profile mode (--profile erp-tsv) MUST tolerate malformed tab-like separators in free-text fields and emit deterministic parse diagnostics (recovered_rows, ambiguous_rows, dropped_rows) with optional --fail-on-ambiguity.

FR-BNK-005 Reconciliation proposal input contract. The module MUST expose deterministic transaction fields needed by reconciliation proposal generation and batch apply workflows in bus-reconcile. Acceptance criteria: bank transaction identifiers, normalized amount, currency, booking date, and reference fields are stable and queryable; unresolved lookup states are explicit in data and diagnostics so proposal generation can fail deterministically instead of guessing.

FR-BNK-006 Add command. The module MUST provide an add command that allows adding one bank account or one bank transaction at a time. Acceptance criteria: bus bank add account and bus bank add transaction (or equivalent) are available; each invocation adds exactly one record to the corresponding dataset with schema validation; invalid or duplicate input fails with clear diagnostics and does not modify any file.

FR-BNK-007 Statement balance checkpoints. The module MUST provide deterministic extraction and verification of statement opening/closing balances from evidence files. Acceptance criteria: bus bank statement extract ingests statement balance summaries (CSV, or PDF via native extraction with fallback to sidecar CSV+schema), appends normalized checkpoints to bank-statement-checkpoints.csv with provenance, and bus bank statement verify compares checkpoints against bank-transactions.csv running balances with optional failure threshold (--fail-if-diff-over).

FR-BNK-008 Statement evidence parsing and cross-checks. The module MUST provide deterministic parsing of statement evidence into a canonical JSON model and deterministic cross-checking of parsed statement totals against bank transactions. Acceptance criteria: bus bank statement parse emits a JSON model with minimum statement fields, confidence/provenance metadata, and unknown-number warnings, and bus bank statement verify --statement <parsed.json|attachment-id> emits structured mismatch reason codes with optional failure threshold.

NFR-BNK-001 Auditability. Imports MUST preserve source statement identifiers and evidence links. Acceptance criteria: each normalized transaction records a source reference and can be traced to attachments metadata.

NFR-BNK-002 Path exposure via Go library. The module MUST expose a Go library API that returns the workspace-relative path(s) to its owned data file(s) (bank-imports, bank-transactions, bank-statement-checkpoints, and their schemas). Other modules that need read-only access to bank raw file(s) MUST obtain the path(s) from this module’s library, not by hardcoding file names. The API MUST be designed so that future dynamic path configuration can be supported without breaking consumers. Acceptance criteria: the library provides path accessor(s) for the bank datasets; consumers use these accessors for read-only access; no consumer hardcodes bank file names outside this module.

NFR-BNK-003 Import mapping auditability. ERP bank import mappings MUST be reviewable as repository data and import execution MUST emit auditable artifacts. Acceptance criteria: profile files are committed as regular repository files; imports can emit deterministic plan and result artifacts that include source rows, mapping decisions, and produced bank transaction identifiers; rerunning with the same profile and source data yields byte-identical artifacts.

NFR-BNK-004 Deterministic candidate feed semantics. Data consumed by reconciliation proposal generation MUST be deterministic and unambiguous for a given workspace revision. Acceptance criteria: bus bank read outputs used for proposal workflows are stable in ordering and field naming, and diagnostics identify the bank transaction ID when lookup or normalization failures prevent reconciliation planning.

System Architecture

Bus Bank owns the bank import datasets and normalizes raw statement data into repository tables. It integrates with bus reconcile for matching and with bus journal for posting outcomes, using bus accounts, bus entities, and bus invoices as reference data.

Key Decisions

KD-BNK-001 Bank statements are normalized into canonical datasets. The module converts raw statement data into schema-validated tables for deterministic downstream processing.

KD-BNK-002 Path exposure for read-only consumption. The module exposes path accessors in its Go library so that other modules can resolve the location of bank datasets for read-only access. Write access and all bank business logic remain in this module.

KD-BNK-003 ERP history import is profile-driven. Historical bank ingestion is defined as reusable mapping profiles and deterministic import runs, not generated one-off append scripts. Profiles are versioned repository data and import execution remains plain Bus commands with deterministic output artifacts.

KD-BNK-004 Reconciliation proposal workflows consume bank data as-is. Candidate generation and apply logic belong to bus-reconcile, while bus-bank guarantees deterministic transaction identity and normalization surfaces.

Component Design and Interfaces

Interface IF-BNK-001 (module CLI). The module exposes bus bank with subcommands init, import, list, backlog, statement, config, and add and follows BusDK CLI conventions for deterministic output and diagnostics.

The init command creates the baseline bank datasets and schemas (bank-imports.csv, bank-transactions.csv and their beside-the-table schemas) when they are absent. If all owned bank datasets and schemas already exist and are consistent, init prints a warning to standard error and exits 0 without modifying anything. If the data exists only partially, init fails with a clear error to standard error, does not write any file, and exits non-zero (see bus-init FR-INIT-004).

Interface IF-BNK-002 (path accessors, Go library). The module exposes Go library functions that return the workspace-relative path(s) to its owned data file(s) (bank-imports.csv, bank-transactions.csv, bank-statement-checkpoints.csv, and their schemas). Given a workspace root path, the library returns the path(s); resolution MUST allow future override from workspace or data package configuration. Other modules use these accessors for read-only access only; all writes and bank logic remain in this module.

Documented parameters are bus bank import --file <path> and bus bank list filters that constrain the transaction set deterministically. The complete list filter surface is --month <YYYY-M>, --from <YYYY-MM-DD>, --to <YYYY-MM-DD>, --counterparty <entity-id>, and --invoice-ref <text>. Date filters apply to the normalized transaction date in bank-transactions.csv. --month selects the calendar month and is mutually exclusive with --from or --to. --from and --to may be used together or independently and are inclusive bounds. --counterparty filters by the stable counterparty identifier as recorded in the transaction row, matching bus entities identifiers exactly. --invoice-ref filters by the normalized invoice reference string present on the transaction row, matching exactly as stored. When multiple filters are supplied, they are combined with logical AND so every returned row satisfies every filter. Statement balance extraction uses bus bank statement extract --file <path> with optional --profile <name>, --account, --iban, and --attachment-id <uuid> provenance override, plus parsing hints for dates and numbers (--date-format, --decimal-sep, --group-sep, --unicode-minus). When both profile hints and CLI hints are supplied, CLI hints override profile defaults. Statement parsing uses bus bank statement parse --file <path> with the same mapping and parsing hints and emits canonical JSON. Verification uses bus bank statement verify --statement <parsed.json|attachment-id> with optional --bank-rows, --year, --account, and --fail-if-diff-over threshold; when --statement is omitted, verification compares stored checkpoints to bank transactions as before.

Interface IF-BNK-003 (profile import). The module defines a first-class command surface for ERP history import into bank datasets: bus bank import --profile <path> --source <path> with optional deterministic selectors (for example --year <YYYY>) and dry-run support. The profile contract defines source table bindings, column mappings, transaction-direction normalization, status mapping, counterparty lookup, and reference extraction rules. Built-in profile selector erp-tsv is supported for robust ERP TSV parsing with parse diagnostics and optional --fail-on-ambiguity. Execution emits deterministic import artifacts (plan and result) and appends canonical rows through module-owned write paths.

Interface IF-BNK-004 (reconciliation candidate read surface, planned integration). Bank transaction read outputs consumed by reconciliation proposal workflows MUST provide a deterministic field contract, including at minimum bank transaction ID, amount, currency, booking date, reference text, and reconciliation state marker. The module does not generate proposals, but its read contract must let bus-reconcile compute proposals and apply approved rows deterministically.

Interface IF-BNK-005 (add command). The module provides bus bank add account and bus bank add transaction for adding a single bank account or a single bank transaction manually. Each subcommand accepts the fields required by the corresponding schema (e.g. for account: identifier, IBAN, BIC, currency; optional ledger mapping; for transaction: bank_account_id, booking_date, amount, currency, and other required fields per schema). Values are supplied via flags or positional arguments as documented in command help. The command validates input against the module schema, appends exactly one row to the target dataset, and exits 0 on success. On validation failure or duplicate key where applicable, the command exits non-zero, prints a clear error to standard error, and does not modify any file. When the add account subcommand is implemented, the module owns a bank account dataset (e.g. bank-accounts.csv) and its beside-the-table schema at the workspace root; path accessors (IF-BNK-002) include that dataset when present.

Usage examples:

bus bank import --file 202602-bank-statement.csv
bus bank list --month 2026-2
bus bank statement extract --file 2024-12.statement.csv
bus bank -f json statement parse --file 2024-12.statement.csv
bus bank statement verify --statement statement.json --fail-if-diff-over 0.01
bus bank add account --id acct-01 --iban FI... --currency EUR
bus bank add transaction --bank-account acct-01 --date 2026-02-18 --amount 100.00 --currency EUR

Data Design

The module reads and writes bank-imports.csv, bank-transactions.csv, and bank-statement-checkpoints.csv at the repository root, each with a beside-the-table schema file. bank-statement-checkpoints.csv includes deterministic provenance fields (attachment_id, source_path, extracted_at) so extracted balances can be traced back to evidence metadata. Parsed statement JSON output is a read-only artifact produced by bus bank statement parse and is not stored as a workspace dataset. When the add account subcommand is implemented, the module also owns a bank account dataset (e.g. bank-accounts.csv) and its beside-the-table schema at the workspace root; path accessors (IF-BNK-002) then include that dataset. Master data owned by this module is stored in the workspace root only; the module does not create or use a bank/ or other subdirectory for its datasets and schemas. Source bank statement files live in the repository root and may be named with a date prefix such as 202602-bank-statement.csv, and they can be registered as attachments. Statement balance extraction reads a statement summary CSV with a beside-the-file schema; for PDF evidence, the module attempts native text extraction first, then falls back to sibling text exports or sidecars <base>.statement.csv and <base>.statement.schema.json while preserving the PDF path as evidence.

Other modules that need read-only access to bank datasets MUST obtain the path(s) from this module’s Go library (IF-BNK-002). All writes and bank-domain logic remain in this module.

Profile mappings for ERP bank imports are authoritative repository data. A profile describes how source bank-export rows map into bank-imports.csv and bank-transactions.csv, including deterministic filter predicates, normalization rules, and counterparty/reference resolution. Import execution artifacts are stored as reviewable files so reviewers can verify source-to-target behavior without reviewing generated mega-scripts.

Statement extract profiles are stored in statement-extract-profiles.csv at the workspace root. Each profile defines statement extraction mappings (field to selector) and may include optional parsing hints (header_row, date_format, decimal_char, group_char, unicode_minus) that act as defaults for bus bank statement extract unless overridden by CLI flags.

Assumptions and Dependencies

Bus Bank depends on the workspace layout and schema conventions and on reference data from bus entities, bus accounts, and bus invoices when matching. If required datasets or schemas are missing, the module fails with deterministic diagnostics. The profile import workflow depends on deterministic source-table read and normalization helpers from bus-data, but bank ownership and all write logic remain in this module. Reconciliation proposal and apply workflows depend on this module’s deterministic bank transaction identity and normalization contract.

Security Considerations

Bank statements and normalized transactions are sensitive repository data and should be protected by the same access controls as the rest of the workspace. Evidence references must remain intact for auditability.

Observability and Logging

Command results are written to standard output, and diagnostics are written to standard error with deterministic references to dataset paths and identifiers.

Error Handling and Resilience

Invalid usage exits with a non-zero status and a concise usage error. Import and schema violations exit non-zero without modifying datasets.

Testing Strategy

Unit tests cover normalization and schema validation, and command-level tests exercise import, list, backlog, statement, and add against fixture workspaces with sample bank statements. Statement tests MUST verify extraction from summary CSV, append/skip behavior, parsed JSON output, attachment-based parsing, and verification diff thresholds (FR-BNK-007, FR-BNK-008). Add-command tests MUST verify that one account or one transaction is appended with schema validation, and that invalid or duplicate input exits non-zero without modifying any file (FR-BNK-006). Profile-import tests MUST verify deterministic mapping execution, year-filter behavior, direction and status normalization, counterparty lookup outcomes, and byte-identical import artifacts for repeated runs with the same inputs (FR-BNK-004, NFR-BNK-003). Reconciliation-candidate contract tests MUST verify stable transaction ID lookup, deterministic read ordering, and deterministic diagnostics when candidate-required fields are missing or invalid (FR-BNK-005, NFR-BNK-004).

Deployment and Operations

Not Applicable. The module ships as a BusDK CLI component and relies on the standard workspace layout.

Migration/Rollout

Not Applicable. Schema evolution is handled through the standard schema migration workflow for workspace datasets.

Risks

Not Applicable. Module-specific risks are not enumerated beyond the general need for deterministic and audit-friendly bank data handling.

Suggested capabilities (out of current scope)

The following capabilities are not yet requirements; they are recorded as suggested enhancements for classification, reconciliation, and migration workflows.

Counterparty normalization. Classification rules become noisy when the same logical counterparty appears with inconsistent labels (e.g. SENDANOR vs Sendanor, UPCLOUD HELSINKI vs UPCLOUD OY HELSINKI). A suggested extension is configurable counterparty normalization before rule matching. Configuration would define canonical names with alias patterns (exact match and regex), and optional normalization helpers (trim, case fold, Unicode fold, punctuation cleanup). The module would expose a normalized counterparty field in bank datasets and in exports consumed by bus-journal and bus-reconcile, while retaining the original counterparty value for audit. Rule matching in other modules would then key off the canonical value. If this capability is adopted, it would be promoted to a formal requirement and to interface and data design (config format, new fields, and export contract) in a future SDD update; module and workflow docs would then document the config format and the normalized vs original field semantics.

Built-in bank-classification coverage/backlog report. After partial automation, teams need a deterministic “what remains unclassified” view. A suggested command or report would compare bank transactions to journal source links and emit posted vs unposted counts/sums by month, unposted breakdown by counterparty and message code, optional thresholds/fail-on-backlog for CI, and machine-friendly output (tsv/json) with consistent source-link semantics.

Reference extractors from bank message/reference. Bank rows often include embedded hints in free-text message or reference fields (e.g. ERP <id>, invoice numbers). Today such hints are parsed manually in custom scripts. A suggested extension is optional reference extractors in bus-bank: configurable patterns (e.g. regex) on message/reference that populate normalized fields (e.g. erp_id, invoice_number_hint) in bank datasets. Those fields would be exposed in bank list and export so bus-reconcile and other modules can use them without parsing raw text. Optional helper commands could join extracted keys against invoice or purchase-invoice ids deterministically. If this capability is adopted, it would be promoted to a formal requirement and to interface and data design (extractor config, new dataset fields, and export contract) in a future SDD update; module docs would then document the extractor config and the new dataset fields.

Rule-based bank classification and posting. See the same suggested two-phase flow (classify + apply) under bus-journal; bus-bank would supply the bank transaction read surface and, if implemented, counterparty normalization and reference extractors (above) used by the classifier.

Glossary and Terminology

Bank import: a normalized record of a statement ingest run stored in bank-imports.csv.
Bank transaction: a normalized transaction row stored in bank-transactions.csv.