BusDK Software Design Document (SDD)

BusDK Software Design Document (SDD)

Canonical multi-page design spec (original sources)

The canonical source material for this SDD is the existing multi-page BusDK design spec. Start from the design spec entrypoint and follow the section indexes from there. This SDD links directly to the most relevant inner pages inline wherever it is logical for traceability.

Introduction and Overview

BusDK (Business Development Kit), formerly known as Bus, is a modular, CLI-first toolkit for running a business, including accounting and bookkeeping. BusDK is designed for longevity, clarity, and extensibility by storing workspace datasets as transparent, human-readable text files validated by explicit schemas, and by keeping the workspace’s change history reviewable over time. This framing is defined in Purpose and scope and elaborated as design goals in the Design goals and requirements section.

The preferred default is that the workspace lives in a Git repository and that tabular datasets are stored as UTF-8 CSV with beside-the-table schemas expressed as Frictionless Data Table Schemas (JSON). Git and CSV are delivery conventions rather than the goal; the invariant is that the workspace datasets and their change history remain reviewable and exportable. See Git as the canonical, append-only source of truth, Plain-text CSV for longevity, and Schema-driven data contract (Frictionless Table Schema).

This SDD is the single-page “source of truth” view for design review and implementation traceability. The intended audience includes human reviewers validating correctness, AI agents refining documentation from human input, and AI agents implementing and maintaining the Bus codebase. Out of scope includes executing Git operations and making discretionary accounting judgments, as captured in the Non-goals section.

Goals

G-001 Deterministic, script-friendly workflows. The primary interface is a CLI toolchain whose behavior is predictable and automatable.

G-002 Reviewable repository data. Workspace datasets and their change history remain readable, diffable, and exportable over long retention periods.

G-003 Modular, loosely coupled components. Feature areas are implemented as independent modules that integrate through shared datasets rather than internal cross-module APIs.

G-004 Schema-driven data contracts. Schemas function as documentation, validation input, and compatibility guarantees across implementations.

G-005 Auditability through append-only discipline. Corrections are represented as additional records that preserve history rather than overwriting prior bookkeeping material.

These goals are expanded across CLI-first, Unix-style composability, Modularity, Append-only auditability, and Schema contract.

Non-goals

NG-001 BusDK does not execute Git commands or commit changes; Git operations are performed externally by the user or automation.

NG-002 BusDK does not make discretionary accounting judgments (for example classification, valuation, or materiality decisions) on the user’s behalf.

NG-003 AI assistance is an objective for workflow compatibility, not a required dependency for correctness.

Requirements

This section defines requirements with stable identifiers. Each requirement is phrased as an invariant and paired with acceptance criteria that must be testable, derived from the documented CLI, data, and workflow conventions in the source pages.

Functional requirements

FR-001 CLI-first toolchain. BusDK MUST provide a CLI-first interface where workflows are expressed as explicit commands and produce deterministic outputs suitable for scripting and automation. Commands MUST NOT perform interactive operations (see NFR-009). Acceptance criteria: The minimum required command surface for the end-to-end workflow is defined in Minimum required command surface (end-to-end workflow). The CLI’s I/O and determinism conventions are specified in Error handling, dry-run, and diagnostics, Non-interactive use and scripting, Reporting and query commands, and Validation and safety checks. Primary sources: CLI-first, CLI tooling and workflow, and Command structure and discoverability.

FR-002 Modular command surface. BusDK MUST be organized as independent modules (typically bus-*) that plug into the bus dispatcher and operate on shared workspace datasets. Acceptance criteria: Module responsibilities and dataset ownership are documented so cross-module integration is reviewable, and cross-module behavior remains explicit through dataset contracts. When modules are implemented in Go, cross-module reuse happens by importing Go libraries rather than shelling out to other bus-* CLIs, so CLI composition does not become a hidden runtime dependency. Primary sources: Independent modules, Architectural overview, Modules, and Modularity.

FR-003 Workspace initialization. BusDK MUST support a workspace bootstrap workflow where module-owned initialization creates baseline datasets and schemas without a monolithic initializer owning all files. Acceptance criteria: The minimal “must exist after initialization” baseline is defined in Minimal workspace baseline (after initialization). Initialization must result in a schema-valid workspace where the end-to-end workflow can run without implicit dataset creation. Primary sources: Initialize a new repository, Minimal workspace baseline (after initialization), Data directory layout (principles), and bus init.

FR-004 Schema validation as a first-class workflow step. BusDK MUST support schema-based validation and cross-table invariant checks as a repeatable step in day-to-day and period-close workflows. Acceptance criteria: Schema validation MUST check types and referential integrity before any data mutation, and logical validation MUST enforce balanced debits and credits for transactions, valid account references, invoice totals matching line items, and VAT classification completeness when generating VAT reports. Validation failures MUST be deterministic, MUST exit non-zero, and MUST write diagnostics to standard error that cite datasets and stable identifiers. For Finland and EU compliance, validation MUST enforce audit-trail invariants (stable IDs, required voucher references, deterministic ordering fields) and MUST prevent changes that would break a closed period or previously reported data. Primary sources: Shared validation layer, Validation and safety checks, and bus validate.

FR-005 Evidence is first-class repository data. BusDK MUST support registering and linking supporting evidence (receipts, invoice PDFs, exports) so that datasets can reference attachment identifiers for traceability. Acceptance criteria: Attachments MUST be registered in attachments.csv at the repository root with a stable attachment_id and immutable metadata (filename, media type, hash). Attachment files SHOULD be stored under a predictable period directory structure, and metadata MUST remain in the repository even when files are stored outside Git. Vouchers, journal entries, invoices, and bank records MUST link to attachments via attachment_id so the audit trail remains demonstrable. Primary sources: bus attachments, Invoice PDF storage, and Finnish bookkeeping and tax-audit compliance.

FR-006 Library-first module design. Each Go module MUST implement its business logic as a Go library package, and its CLI program MUST be a lightweight wrapper over that library. Acceptance criteria: Domain behavior is testable through library calls without invoking the CLI; the CLI layer contains only argument parsing, I/O wiring, and output formatting; and Go modules may call other BusDK Go libraries directly to reuse behavior or shared mechanics while keeping dataset ownership and invariants explicit. Primary sources: Module repository structure and dependency rules and Independent modules.

Non-functional requirements

NFR-001 Longevity and exportability. Repository data MUST remain exportable and interpretable without requiring a specific runtime or proprietary storage backend. Acceptance criteria: The default representation MUST be UTF-8 CSV with a header row, comma delimiters, ISO dates (YYYY-MM-DD), and predictable numeric formats for monetary values, paired with beside-the-table Frictionless Table Schemas. The canonical datasets MUST remain readable with general-purpose tools. If an alternative storage backend is used, it MUST preserve deterministic, schema-validated tables and MUST provide canonical import and export to the same tabular conventions, preserving Table Schema semantics including types, constraints, primary keys, and foreign keys so audits and reviews remain possible.

NFR-010 Money-safe arithmetic. All currency and monetary calculations MUST use decimal-safe arithmetic and MUST NOT use binary floating-point (float32/float64) for business calculations. Acceptance criteria: monetary math uses exact decimal representations (for example scaled integer cents or exact decimal rational/decimal types), rounding rules are explicit and deterministic, and command outputs are reproducible without floating-point drift.

NFR-002 Deterministic behavior. Human-facing diagnostics and machine-facing outputs MUST be deterministic given the same repository data and configuration inputs. Acceptance criteria: Command results MUST be written to standard output and diagnostics to standard error, with any terminal styling limited to standard error when it is a terminal. Machine-readable output modes MUST document stable formats, column sets, column order, and record ordering based on stable identifiers and explicit sort keys. Diagnostics MUST cite datasets and stable identifiers and show paths relative to the workspace root so that output remains stable across machines.

NFR-003 Maintainability through clear boundaries. Module responsibilities and dataset ownership MUST be explicit so that modules can evolve independently. Acceptance criteria: Each dataset has a clear owning module; schema changes have a documented migration path.

NFR-008 Pluggable storage boundary. Modules MUST depend on the workspace store interface or shared library APIs for persistence rather than shelling out to another CLI as an internal API. Acceptance criteria: Module implementations do not require a generic CRUD CLI to run successfully, and storage access is performed through the documented interface or library APIs so that swapping the backend does not introduce hidden runtime dependencies or change module behavior.

NFR-004 Reliability. Workflows SHOULD fail fast with clear diagnostics when data contracts are violated. Acceptance criteria: Invalid usage MUST exit with status code 2 and a concise usage error on standard error. Failures caused by repository contents, filesystem I/O, or schema and invariant violations MUST exit non-zero and include diagnostics that identify the dataset and stable identifiers involved. Commands MUST refuse to write invalid data when validation fails.

NFR-005 Security and access control. Repository data MUST be protected by explicit access controls appropriate to the deployment context, and auditability MUST be preserved through append-only history. Acceptance criteria: In single-user operation, OS-level permissions MUST be the primary security boundary. In collaborative scenarios, Git permissions and workflow controls (for example branch protections, reviews, and separation of duties) MUST be used to control who can propose and approve changes. If sensitive data must be scrubbed, it MUST be handled via an explicit redaction commit that flags the redaction rather than silently excising history. Primary sources: Append-only discipline and security model, Git as the canonical, append-only source of truth.

NFR-006 Performance. Repository operations SHOULD remain responsive for day-to-day use even as data grows. Acceptance criteria: BusDK MUST support splitting large datasets into multiple files by time period or category so diffs remain focused and Git operations remain performant. The repository root MUST track segmented files through a stable index dataset (for example journals.csv) so tooling can locate period-specific files deterministically. Primary sources: Scaling over decades.

NFR-007 Scalability. Repository data MUST remain manageable over long retention periods without losing auditability or discoverability. Acceptance criteria: Older data MUST be archivable by tagging period-close revisions and, where needed, removing old-period files from active branches while retaining them in history for retrieval. Segmentation by period MUST preserve deterministic ordering and traceability across datasets. Primary sources: Scaling over decades, CSV conventions.

NFR-009 Non-interactive operation. BusDK commands MUST NOT wait for user input. All required input MUST be supplied to each command via arguments, flags, or standard input. Output MUST be produced as soon as the operation is ready; there MUST be no interactive prompts, confirmations, or read-from-TTY behavior. When required parameters are missing, the command MUST fail with a concise usage error on standard error and exit with status code 2. This ensures that AI agents and automation scripts can invoke any BusDK command without blocking. Acceptance criteria: No command prompts for missing options or confirmation; piping or redirecting input and running without a TTY does not change success or failure semantics; help and version remain immediate-exit. Primary source: Non-interactive use and scripting.

System Architecture

BusDK follows a “micro-tool” architecture. Each feature area is implemented as an independent CLI tool that reads and writes shared workspace datasets (tables plus schemas) stored as repository data, while the core business logic for Go modules lives in library packages that the CLI wraps. Modules coordinate by sharing data and by relying on an append-only revision history, and Go modules may reuse behavior by importing BusDK Go libraries rather than shelling out to other bus-* CLIs. The stable integration surface remains the workspace datasets and their schemas, organized in a consistent directory layout, so ownership and auditability stay explicit even when libraries are shared. This is defined in Architectural overview, Independent modules, and CLI as the primary interface (controlled read/modify/write).

Data flows from the CLI commands through deterministic read-validate-modify-write operations against the repository data, with diagnostics emitted on standard error and machine-facing results emitted on standard output. The architectural goal is reviewability and exportability of the workspace datasets and their change history; Git is the preferred default, not the requirement. See Git-backed data repository (the data store) and Git as the canonical, append-only source of truth.

Key Decisions

KD-001 Preferred Git-backed change history. Git is the preferred default for recording a reviewable, append-only change history, but it is a delivery convention rather than the goal.

KD-002 Preferred CSV plus Frictionless Table Schema. UTF-8 CSV datasets with beside-the-table JSON schemas are the preferred default because they remain readable and exportable with general-purpose tools.

KD-003 Module integration through datasets. Modules integrate through shared datasets and schemas so ownership and boundaries remain explicit and reviewable, while Go modules may import other BusDK Go libraries for shared mechanics instead of invoking CLIs. KD-004 Library-first Go modules. Go modules are structured around a library package that implements business logic, with the CLI as a thin wrapper for argument parsing and output formatting, so behavior is reusable across modules and test coverage can focus on library APIs.

Component Design and Interfaces

bus dispatcher

The bus dispatcher is the primary entry point for discovery and execution. It is responsible for listing available modules and routing module commands (for example bus accounts, bus journal, bus vat) to the corresponding module program. The intended command surface and discoverability expectations are described in Command structure and discoverability and the broader CLI conventions in CLI tooling and workflow.

Interface IF-001 (dispatcher routing). The dispatcher provides module discovery and command routing by invoking the selected module program and passing through standard input, standard output, and standard error without transformation.

Modules (bus-*)

Each module owns one or more datasets and their schemas, provides commands to initialize and maintain those datasets, and emits deterministic diagnostics. Modules integrate through shared conventions: stable dataset names, beside-the-dataset schema files, and documented cross-dataset references (primary keys and foreign keys). In Go, each module’s business logic lives in a library package so other BusDK Go modules can import it directly when shared behavior is needed, without using CLI-to-CLI calls as an internal API. Module responsibilities and how modules fit into the end-to-end workflow are captured in Modules and the workflow narrative starting from Accounting workflow overview.

Interface IF-002 (module CLI). Each module exposes a CLI program named after its module directory and reads and writes workspace datasets and schemas using the documented repository layout and schema conventions.

Repository library and CLI layering

Each bus-<module> repository MUST contain a library package that implements the module’s behavior and a CLI entrypoint that is a thin wrapper over that library. The CLI is a presentation layer: it parses arguments, orchestrates I/O, and renders outputs, while the library performs validation, domain rules, and dataset updates and returns structured results to the CLI. Acceptance criteria: module tests target the library directly; the CLI program contains only argument parsing and output formatting; and all observable business behavior is available through library calls without shelling out to other BusDK tools. Modules MUST NOT invoke other bus-* CLIs as internal dependencies for core behavior. If a module needs shared mechanics such as workspace storage, schema parsing, or CSV I/O, it MUST import the shared mechanical library (for example bus-data) or implement the documented storage backend interface rather than calling another module program. The repository layout and dependency rules are defined in Module repository structure and dependency rules.

Workspace store (storage backend) interface

The workspace store interface defines the persistence boundary that modules depend on. It can be backed by the default filesystem implementation (CSV plus schemas) or by a future SQL or other backend, but the interface is mechanical and MUST not embed domain business logic. It must provide deterministic read and write of tables, deterministic record ordering rules, schema load and save, validation preconditions that refuse invalid writes, and canonical export back to the tabular text contract. It must support the append-only and audit-trail discipline required by the rest of the system by refusing destructive changes or representing corrections explicitly, while policy decisions about what is allowed remain in domain modules. See Storage backends and workspace store interface.

Interface IF-003 (workspace store interface). The storage backend provides deterministic table and schema persistence, schema-driven validation preconditions, and canonical import and export to the tabular text contract without implementing domain business rules.

A shared library implementation is allowed and recommended for Go modules to keep behavior consistent, while cross-language interoperability remains guaranteed by the table and schema contract and by required export and import support when a non-file backend is used.

External Git tooling

Git is treated as an external mechanism for recording revisions. BusDK does not commit changes and does not invoke Git commands. Workflows describe when a user should record a revision boundary (for example at period close), but the mechanism is external. This separation is a design goal, not a workflow convenience. See Git as the canonical, append-only source of truth and the operational conventions described in Git commit conventions per operation (external Git).

Interface IF-004 (external version control). Version control actions are performed externally by users or automation, and BusDK’s responsibility ends at deterministic read-modify-write operations on repository data.

Data Design

BusDK’s canonical data model is a set of tables (“workspace datasets”) validated against explicit Table Schema definitions. Table Schemas declare fields, types, constraints, and keys (including primary keys and foreign keys where applicable). Schemas serve as documentation and as validation input, and they are a key mechanism for keeping revisions interpretable as tables evolve over time. See Frictionless Table Schema as the contract, Schema evolution and migration, and CSV conventions.

Corrections are represented as additional bookkeeping that preserves history rather than overwriting prior vouchers or postings. Append-only discipline is treated as a first-class design requirement for long-term auditability. See Append-only updates and soft deletion and Auditability and append-only discipline.

Assumptions and Dependencies

AD-001 Local filesystem workspace. The current design assumes a local filesystem workspace and a toolchain that can read and write structured text data, but the architecture defines a workspace store interface so the persistence layer can be swapped later without changing domain module responsibilities. If a local filesystem is not available or accessible, BusDK cannot operate on repository data with the default backend and the CLI workflows described here are not applicable.

AD-002 Repository layout conventions. Workspace layout assumptions (what lives where in the repository) follow Data directory layout with an explicit baseline example in Minimal example layout. If the layout deviates, modules cannot locate datasets and schemas deterministically and FR-002 and FR-003 cannot be satisfied.

AD-003 Preferred Git-backed repository. The preferred default assumes a Git repository workspace. Git remains external revision tooling and is not part of the storage backend interface contract. If Git is not used, an alternative mechanism MUST preserve an append-only, reviewable change history or NFR-005 is not met.

AD-004 Preferred CSV plus Frictionless Table Schema. The preferred default assumes CSV datasets and JSON schemas using the Frictionless Table Schema specification. If CSV and Frictionless Table Schema are not used, an alternative storage backend MUST still provide deterministic, schema-validated tables and export back to simple, tabular text formats so the workspace datasets and their change history remain reviewable and exportable, or NFR-001 and FR-004 cannot be satisfied.

AD-005 Compliance scope limited to Finland and the EU. The current design targets Finnish and EU compliance requirements only. If additional jurisdictions are added, validation rules, reporting outputs, and acceptance criteria MUST be extended and explicitly documented so the scope remains reviewable.

AD-006 Supported operating environments. BusDK supports Linux and macOS environments and assumes a Bash shell for CLI workflows and test harnesses. If a different OS or shell is used, command behavior, fixtures, and scripts may not be portable and test coverage must be expanded.

Security Considerations

Historical financial data is append-only, and corrections are represented as new records rather than destructive updates. In single-user operation, OS-level access control is the primary security boundary. In collaborative scenarios, Git permissions and review workflows are expected to enforce separation of duties and change approval, keeping the audit trail tamper-evident through the commit history. When redaction is necessary, it must be handled through explicit redaction commits that flag the redaction instead of silently excising history. See Append-only discipline and security model.

Observability and Logging

Diagnostics and logging are designed to be deterministic and script-friendly. Commands write command results to standard output and diagnostics to standard error. Optional logging should provide visibility into validation steps and planned file changes without contaminating structured outputs. See Error handling, dry-run, and diagnostics and Validation and safety checks.

Error Handling and Resilience

BusDK commands must fail with clear diagnostics when inputs are invalid or when repository data violates schema or invariants. Invalid usage exits with status code 2 and a concise usage error on standard error, while repository, filesystem, and validation failures exit non-zero with diagnostics that cite datasets and stable identifiers. Validation failures must prevent data mutation, and diagnostics must remain deterministic and citeable. See Error handling, dry-run, and diagnostics and Validation and safety checks.

Testing Strategy

Each module is tested using standard unit testing rules for its implementation language, with Go modules using go test and idiomatic Go test structure by default. Unit tests focus on deterministic behavior for parsing, validation, schema enforcement, and dataset transformations, and they must be runnable without network dependencies or external services so they remain stable in local and CI environments.

Every command exposed by a module is also covered by a simple end-to-end bash test that executes the command against a fixture workspace, asserts on standard output and standard error, and verifies repository data changes on disk. These end-to-end tests live alongside the module they verify, but they run in an isolated Git repository per command so each test is independent and does not share state or side effects with other tests. The test harness must initialize a fresh repository, apply only the fixtures required for the command under test, and verify exit codes and outputs deterministically. See Testing strategy for the canonical multi-page design spec section.

If a non-file backend such as SQL is implemented, it MUST provide an equivalent deterministic test mode using an ephemeral local instance and fixtures, and the same command-level assertions on standard output, standard error, exit codes, and resulting logical table contents MUST hold. Tests remain isolated and must not rely on external network services; any containerized or local service used for optional backends must be strictly local and used only when that backend is enabled.

Deployment and Operations

For the default filesystem backend, operations are limited to workspace management and external Git workflows. If an alternative backend is enabled, documentation MUST specify configuration (including connection settings and credentials handling), migrations or schema evolution procedures, backup and restore expectations, and concurrency or locking semantics at least at a single-user baseline.

Migration/Rollout

Schema evolution is expected and is handled through versioned schema updates and transparent migrations recorded in the repository history. Adding fields may be handled by defaulting missing historical values, while structural changes such as file splits or renames are acceptable as long as the migration is transparent and recorded. See Schema evolution and migration and Scaling over decades.

Risks

Not Applicable. The current design pages do not enumerate specific project risks beyond the general need to preserve auditability and deterministic workflows.

Open Questions

None at this time.

Glossary and Terminology

Workspace: the repository contents that hold the authoritative datasets, schemas, and supporting evidence for a bookkeeping scope (typically an accounting year).

Repository data: the workspace datasets, schemas, and attachments stored in the repository.

Workspace datasets: the canonical tables (typically CSV) plus their schemas that serve as the primary system of record.

Table schema: the beside-the-table contract (Frictionless Table Schema) declaring fields, types, constraints, and keys.

Module: an independent BusDK component (often bus-*) that owns specific datasets and provides commands over them.

Change history (revision history): the reviewable history of changes to repository data, typically recorded through external version control tooling.

Document control

Title: BusDK Software Design Document (SDD)
Project: BusDK
Document ID: BUSDK-SDD
Version: 2026-02-06
Status: Draft
Last updated: 2026-02-06
Owner: BusDK development team