bus-data — shared tabular data layer and schema-validated I/O (SDD)
bus-data — shared tabular data layer and schema-validated I/O
Introduction and Overview
Bus Data provides the shared tabular data layer for BusDK by implementing deterministic Frictionless Table Schema and Data Package handling for workspace datasets. Its primary surface is a Go library that other modules import directly for schema, data package, and CSV operations. The canonical way to run the module’s CLI is via the BusDK dispatcher as bus data; the bus-data binary remains available for scripts or direct invocation, but end users and documentation should prefer bus data. The module remains library-first, deterministic, and non-interactive, with no Git or network behavior. Modules that need to create or ensure datapackage.json (e.g. bus-config, bus-init) MUST use the bus-data Go library to initialize the empty descriptor first, not by invoking the CLI.
The module may be extended with deterministic read helpers for table-like CSV workbooks: historical accounting data often arrives as spreadsheet-style CSV exports where report totals are driven by formulas and cross-sheet references. Address-based cell and range read, optional header or anchor-based lookup, and consistent numeric normalization for locale-formatted values reduce dependence on ad-hoc external parsing for migration and parity analysis. Optional integration with bus-bfl for formula metadata and evaluation allows formula-driven report totals to be represented and validated in Bus-native datasets, improving reproducibility and auditability when reconciling source reports against ledger-level reconstructions.
Requirements
FR-DAT-001 Deterministic dataset I/O. The module MUST provide deterministic read, write, and validation behavior for workspace datasets. Acceptance criteria: table reads and writes are schema-validated and refuse invalid writes, and the same input files and commands yield byte-for-byte identical outputs.
FR-DAT-002 Library-first integration. The Go library MUST be the primary integration surface for other modules. Acceptance criteria: module integrations rely on the library rather than shelling out to the bus-data CLI.
FR-DAT-003 Table initialization. The module MUST support initializing a new CSV file alongside a beside-the-table schema file using explicit commands. Acceptance criteria: initialization writes a schema file that matches the table and does not overwrite existing data unless explicitly requested.
FR-DAT-004 Schema extension. The module MUST support extending an existing schema by adding columns through explicit commands. Acceptance criteria: added columns are appended deterministically and existing columns retain their order and definitions.
FR-DAT-005 Row append. The module MUST support appending a new row to an existing CSV through an explicit CLI option. Acceptance criteria: the appended row is validated against the beside-the-table schema and appended in canonical column order without modifying existing rows.
FR-DAT-006 Controlled row mutation. The module MUST support row add, update, and delete operations that obey the constraints and mutation policies defined by the table schema. Acceptance criteria: all row mutations validate against the schema and refuse changes that violate schema-defined requirements, and update or delete operations are permitted only when the schema explicitly allows them, including composite primary keys.
FR-DAT-007 Schema inference. The module MUST support initializing a Table Schema by analyzing an existing CSV and inferring field types and constraints. Acceptance criteria: inferred schemas are deterministic for the same input and do not modify the CSV contents.
FR-DAT-008 Type changes with compatibility checks. The module MUST support changing a field type only when the change is non-destructive for existing data. Acceptance criteria: incompatible type changes are rejected with a clear diagnostic, and compatible changes update the schema while leaving table data unchanged.
FR-DAT-009 Data Package management. The module MUST support creating, reading, updating, and patching datapackage.json for workspace datasets. Acceptance criteria: datapackage.json can be initialized deterministically (empty descriptor with profile and empty resources), round-tripped without loss of unknown properties, and updated through explicit commands and JSON patches with no interactive prompts.
FR-DAT-009a Data package init (empty). The module MUST provide an init command that creates an empty datapackage.json at the workspace root when the file is missing (profile tabular-data-package, empty resources array, deterministic formatting). When the file already exists, init MUST be idempotent and MUST NOT scan the workspace for CSV files or add resources. Acceptance criteria: bus data init in an empty directory creates only the minimal descriptor; re-running leaves the file unchanged; no resource entries are added automatically.
FR-DAT-009b Data package discover. The module MUST provide a distinct command (e.g. package discover) that scans the workspace for CSV files with a beside-the-table schema and adds or updates resource entries in an existing datapackage.json. This operation MUST NOT be named init. Acceptance criteria: discover requires an existing datapackage.json (created by bus data init or equivalent); it adds one resource per discovered table with deterministic name and path; resource order is lexicographic by name; running discover when no package exists fails with a clear diagnostic.
FR-DAT-010 Resource management. The module MUST support adding, removing, and renaming resources in datapackage.json while creating or deleting the underlying CSV and schema artifacts when explicitly requested. Acceptance criteria: resource add creates the CSV and schema artifacts in deterministic locations and names, resource remove refuses when the resource is referenced by any foreign key in the workspace, and resource rename updates foreign key references deterministically.
FR-DAT-011 Complete Table Schema coverage. The module MUST support all Table Schema descriptor attributes, including field descriptors, types and formats, constraints, missingValues, primaryKey, foreignKeys, rdfType, and additional properties beyond the spec. Acceptance criteria: schema show and schema patch round-trip every property without loss, and unknown properties are preserved on write.
FR-DAT-012 Foreign key integrity validation. The module MUST validate foreign key references across resources using the Data Package resource definitions. Acceptance criteria: validation fails deterministically when a referenced resource or key is missing or when key values do not match, and diagnostics identify the resource, field, and key values involved.
FR-DAT-013 Workspace-level validation. The module MUST validate the entire workspace data package across all resources. Acceptance criteria: datapackage.json and each resource are validated with deterministic reporting, and any failure returns non-zero without partial writes.
FR-DAT-014 Safe patching and destructive refusal. The module MUST support safe schema and package patching that preserves unknown properties, and it MUST refuse destructive structural operations unless explicitly forced. Acceptance criteria: field removal, resource deletion, and schema changes that would discard data are rejected by default, and only proceed with an explicit force flag after all integrity checks pass, while row-level deletes remain governed by busdk.delete_policy.
FR-DAT-015 Non-interactive, flags-only UX. The module MUST operate without prompts and must express every operation as a single deterministic command with explicit flags and arguments. Acceptance criteria: every mutating command supports --dry-run, produces deterministic log messages to standard error that describe planned file and schema changes, and does not modify files.
FR-DAT-016 Deterministic serialization. The module MUST serialize schemas and data packages deterministically. Acceptance criteria: JSON output uses UTF-8, LF line endings, two-space indentation, and lexicographic object key ordering; resource arrays are ordered lexicographically by resource name; schema field arrays preserve their declared order.
FR-DAT-017 Formula field metadata support. The module MUST recognize BusDK formula metadata on Table Schema field descriptors and treat it as a first-class, deterministic contract for formula storage and evaluation. Acceptance criteria: schema show and schema patch round-trip formula metadata without loss and preserve unknown properties, and validation rejects inconsistent or incomplete formula metadata with deterministic diagnostics.
FR-DAT-018 Formula validation. The module MUST validate BFL expressions for formula-enabled fields during resource, table, and workspace validation. Acceptance criteria: invalid expressions, unknown identifiers, type errors, or invalid rounding policies produce deterministic validation errors that identify the resource, field, and row when applicable.
FR-DAT-019 Formula projection during read. The module MUST compute formula values at read time for formula-enabled fields and return a deterministic projected dataset view without writing back to CSV. Acceptance criteria: computed values are validated against the declared result type and constraints and replace the formula source in the output projection, while the stored CSV remains unchanged.
FR-DAT-020 Opt-in formula source output. The module MUST provide an explicit, deterministic option to include formula source alongside computed values for diagnostics and tooling without colliding with user columns. Acceptance criteria: default output contains computed values only, and the opt-in mode includes the formula source using a deterministic, non-colliding representation.
FR-DAT-021 Range resolution for BFL. When evaluating BFL expressions that contain range syntax, the module MUST provide a deterministic range resolver to the BFL runtime context. Acceptance criteria: range expressions resolve to arrays based on a stable mapping from BFL column and row references to the current resource snapshot, and open-ended ranges resolve a deterministic last row without inspecting external state.
FR-DAT-022 Workbook-style address-based read. The module MUST support deterministic read of cell and range values from CSV resources using address-based notation consistent with the bus-bfl reference and range grammar (e.g. cell J510, range HC513:HD513). Acceptance criteria: a designated command or mode accepts one or more cell or range addresses, resolves them against a CSV resource, and returns values in a deterministic, machine-friendly format (tsv or json); resolution uses the same column-letter and row-number mapping as BFL (1-based rows, A=1 through Z=26, AA=27, etc.) so that workbook read and formula evaluation share a single addressing model.
FR-DAT-023 Header- or anchor-based lookup. The module MUST support optional header or anchor-based lookup for robust extraction from table-like CSV workbooks. Acceptance criteria: when configured, extraction can identify data by column header name and optionally by anchor row or column; resolution is deterministic and documented so migration and parity scripts can rely on stable addressing.
FR-DAT-024 Locale-aware numeric normalization for workbook read. The module MUST support consistent numeric normalization for locale-formatted values when reading workbook-style CSV. Acceptance criteria: configurable decimal and thousands separators (or a documented locale profile) produce deterministic numeric values in output; raw string output remains available when normalization is disabled; behavior is explicit and does not infer locale from the environment.
FR-DAT-025 Formula evaluation for workbook read. When reading CSV workbooks with address-based or range access, the module MUST support optional formula metadata or schema and evaluate cell contents using bus-bfl when the content is treated as a formula. Acceptance criteria: when enabled, formula-driven cells are evaluated deterministically and the output MUST contain the evaluated result (e.g. numeric) for those cells, not the formula source text, unless an explicit opt-in includes formula source; formula source can be included via that opt-in; evaluation uses the same BFL dialect and range resolver contract as schema-validated table read so that formula-driven report totals can be represented and validated in Bus-native workflows. The workbook contract MUST support source-specific formula behavior via explicit dialect selection and locale-aware evaluation (decimal and thousands separators, common functions). When locale flags (--decimal-sep, --thousands-sep) are used, both formula literal parsing and locale-formatted numeric cell values MUST be normalized deterministically in output (e.g. cell value 1 234,56 with space thousands and comma decimal becomes 1234.56). Formula-source behavior and locale handling MUST be documented with the supported function set (see Formula metadata and evaluation for workbook extraction).
FR-DAT-026 Import profile contract and validation. The module MUST provide a deterministic import-profile contract that domain modules can use to map external ERP tables into canonical workspace datasets. Acceptance criteria: the library validates profile descriptors against a documented schema, preserves deterministic profile serialization, and rejects unsupported mapping operators or ambiguous source bindings with deterministic diagnostics.
FR-DAT-027 Deterministic profile execution helpers. The module MUST provide library helpers for executing import-profile primitives without domain accounting logic. Acceptance criteria: helpers support deterministic source row filtering, column mapping, enum/status mapping, key-based lookup joins, and explicit transform steps (for example computed field synthesis) as pure data operations; the same source snapshot and profile produce byte-identical mapped row output.
NFR-DAT-001 Mechanical scope. The module MUST remain a mechanical data layer and MUST NOT implement domain-specific accounting logic. Acceptance criteria: domain invariants are enforced by domain modules, not by bus-data.
NFR-DAT-002 No Git or network behavior. The module MUST NOT perform Git operations or network access. Acceptance criteria: the library and CLI only read and write local workspace files.
NFR-DAT-003 Deterministic diagnostics. The module MUST emit deterministic diagnostics with stable identifiers. Acceptance criteria: error messages mention dataset paths, resource names, and field identifiers consistently and are written only to standard error.
NFR-DAT-004 Security boundaries. The module MUST rely on OS-level filesystem permissions and schema-defined mutation policies, and MUST NOT embed authentication or authorization logic. Acceptance criteria: the library and CLI do not prompt for credentials, do not store secrets, and refuse mutations that are not permitted by schema policy.
NFR-DAT-005 Performance. The module SHOULD remain responsive for day-to-day use on typical workspace datasets. Acceptance criteria: table read, validation, and row mutation operations complete in time proportional to table size, and diagnostics remain deterministic regardless of data volume.
NFR-DAT-006 Scalability. The module MUST support workspaces that segment datasets across multiple files without changing the schema contract. Acceptance criteria: datapackage.json can reference multiple resources for a module, and validation works across all referenced resources deterministically.
NFR-DAT-007 Reliability. The module MUST fail fast without partial writes when validation or filesystem errors occur. Acceptance criteria: any operation that fails leaves the workspace datasets and schemas unchanged and returns a non-zero exit code.
NFR-DAT-008 Maintainability. The module MUST keep the library as the authoritative integration surface with the CLI as a thin wrapper. Acceptance criteria: the CLI delegates all data and schema logic to library calls, and tests can exercise behavior through library APIs without invoking the CLI.
System Architecture
Bus Data implements the workspace store interface and dataset I/O mechanics used by other modules, satisfying FR-DAT-001, FR-DAT-002, FR-DAT-009, FR-DAT-012, and FR-DAT-013. The library is the authoritative integration surface for reading, writing, validating, and patching CSV, Table Schema, and Data Package descriptors, satisfying FR-DAT-002, FR-DAT-011, and FR-DAT-016. The CLI delegates directly to the library for inspection, validation, and explicit, mechanical maintenance of schemas, data packages, resources, and rows, satisfying FR-DAT-015 and NFR-DAT-008.
Bus Data integrates bus-bfl to validate and evaluate formulas declared in Table Schema metadata, satisfying FR-DAT-017, FR-DAT-018, FR-DAT-019, and FR-DAT-021. Formula evaluation is deterministic, row-local, and bounded by the bus-bfl defaults unless a schema-provided rounding policy is present, and read-time projection never writes back to CSV. When formulas include range expressions, bus-data provides a deterministic range resolver that maps column and row references to the current resource snapshot.
A workbook-style read path (FR-DAT-022 through FR-DAT-025) provides address-based cell and range access, optional header or anchor-based lookup, locale-aware numeric normalization, and optional formula evaluation for CSV resources that are used as table-like workbooks (e.g. spreadsheet exports). That path reuses the same BFL reference grammar and range resolution contract so that workbook extraction and formula-driven validation share a single addressing model and output remains machine-friendly (tsv or json) for agent workflows and audit scripts.
For ERP migration workflows, bus-data provides profile parsing, validation, and deterministic execution helpers as a mechanical layer (FR-DAT-026 and FR-DAT-027). Domain modules such as bus-invoices and bus-bank own canonical write behavior and domain invariants, while bus-data provides reusable mapping primitives so profile-driven import remains consistent across modules.
Key Decisions
KD-DAT-001 Shared library for data mechanics. Dataset I/O and schema handling are centralized in a library to keep module behavior consistent.
KD-DAT-002 Frictionless-native data package support. Data Package descriptors are treated as first-class workspace metadata and remain fully compatible with Frictionless Table Schema and Data Package rules.
KD-DAT-003 Init vs discover. Creating an empty datapackage.json is a separate operation (init) from scanning the workspace and adding resource entries (discover). Init does not scan for CSV files; discover requires an existing descriptor. This keeps bootstrap predictable and avoids implicit discovery under the name “init”.
KD-DAT-004 Library integration for descriptor bootstrap. bus-config and bus-init MUST use the bus-data Go library to create or ensure the empty datapackage.json before writing accounting entity or other metadata. They MUST NOT invoke the bus-data CLI; integration is via library calls only so that a single code path owns descriptor creation and formatting.
| KD-DAT-005 Workbook read as optional, deterministic extension. Workbook-style read (address-based cell/range, header or anchor lookup, numeric normalization, optional formula evaluation) is an optional extension of the data layer. It operates on CSV resources with the same BFL addressing grammar as formula range resolution, keeps output machine-friendly (tsv | json), and does not replace or alter the existing schema-validated table read contract. |
KD-DAT-006 Profile execution remains mechanical. Import-profile parsing and mapping execution live in bus-data as deterministic data mechanics, while domain modules keep ownership of canonical dataset writes and domain-specific rules (for example invoice VAT synthesis policy details and bank posting semantics).
Component Design and Interfaces
Interface IF-DAT-001 (data library). The module exposes a Go library interface for reading, validating, and writing tables, schemas, and data packages deterministically, satisfying FR-DAT-001, FR-DAT-007, FR-DAT-009, FR-DAT-011, and FR-DAT-016. The library provides explicit operations for schema inference, schema patching, resource add/remove/rename, and cross-resource validation, satisfying FR-DAT-007, FR-DAT-009, FR-DAT-010, FR-DAT-012, and FR-DAT-013. JSON patching uses a deterministic, safe merge approach and preserves unknown properties on both Table Schema and Data Package descriptors, satisfying FR-DAT-009, FR-DAT-011, and FR-DAT-014.
Interface IF-DAT-002 (module CLI). The module is invoked as bus data via the BusDK dispatcher (or directly as bus-data). The CLI is a thin wrapper over the library for deterministic inspection and maintenance of workspace tables, schemas, and data packages, satisfying FR-DAT-002, FR-DAT-015, NFR-DAT-003, and NFR-DAT-008. It accepts workspace-relative resource names and table paths, resolves beside-the-table schema files by replacing the .csv suffix with .schema.json in the same directory, and never shells out to other CLIs. All commands are explicit, non-interactive, and map directly to library operations so that other modules can call the library without invoking the CLI.
Command bus data init creates an empty datapackage.json at the workspace root when the file is missing: profile tabular-data-package, empty resources array, deterministic JSON formatting. It does not scan for CSV files or add any resources. When the file already exists, init is idempotent and exits 0 without modifying it. This ensures a single, predictable way to bootstrap the descriptor; modules that need to add accounting entity or other metadata (e.g. bus-config, bus-init) MUST call the bus-data library to ensure the descriptor exists before writing their own content — via the Go library interface, not by running the CLI.
Command bus data package discover scans the workspace for CSV files that have a beside-the-table schema file and adds or updates resource entries in the existing datapackage.json. Resource names are derived from the CSV basename without the .csv suffix, paths are stored as workspace-relative CSV paths, and resources are ordered lexicographically by name. Discover requires datapackage.json to exist (e.g. after bus data init); if the file is missing, the command fails with a clear diagnostic. Command bus data package show emits the datapackage.json content as stored on disk, and command bus data package patch applies a JSON merge patch while preserving unknown properties and enforcing deterministic formatting. Command bus data package validate validates all resources and foreign keys defined in the data package and returns a deterministic report.
Command bus data resource list emits a deterministic TSV of resource name and path from datapackage.json, ordered lexicographically by name. Command bus data resource add requires an explicit resource name, CSV path, and schema source, and creates the CSV and beside-the-table schema artifacts before inserting the resource into datapackage.json. Command bus data resource remove refuses to remove a resource if it is referenced by any foreign key in the workspace, and it deletes the CSV and schema artifacts only when --delete-files is provided. Command bus data resource rename updates the resource name and any foreign key references deterministically, and it only renames files when --rename-files is provided. Command bus data resource validate <resource> validates a single resource’s schema and data and reports any errors deterministically without modifying files.
Command bus data table list takes no parameters and emits a deterministic TSV with columns table_path and schema_path, one row per table. A table is any *.csv file that has a beside-the-table schema file. Output ordering is lexicographic by table_path so the results are stable across machines.
Command bus data schema show --table <table> writes the schema file content exactly as stored on disk to standard output. Command bus data schema show --resource <name> resolves the resource in datapackage.json and writes the resolved schema. If the schema file is missing or unreadable, the command exits non-zero with a concise diagnostic.
Command bus data table read <table> takes a required table path, loads the beside-the-table schema, validates the table against the schema, and writes canonical CSV or JSON to standard output. It preserves the row order from the file and performs no normalization beyond validation. On validation failure, the command exits non-zero and does not emit partial output. Read flags may select specific rows, filters, and columns without changing validation behavior.
When a table contains formula-enabled fields, bus data table read computes formula values using the bus-bfl library and returns a projected dataset view. The stored CSV values remain the formula source and are not rewritten. Computed values are validated against the declared formula result type and constraints before output, and formula evaluation errors are reported deterministically with resource, field, and row context. The default projection output contains computed values only for formula-enabled fields, and an explicit opt-in mode includes formula source alongside computed values without colliding with user columns. For formulas that include ranges, bus-data resolves Ref.ColumnIndex to the schema field order (1-based) and Ref.RowIndex to the physical row order (1-based) in the current resource snapshot, and it resolves open-ended ranges using the last row in that snapshot without probing external state.
Workbook-style read (FR-DAT-022 through FR-DAT-025) is provided by the command bus data table workbook. The command accepts a workspace-relative table path (or via --chdir), one or more cell or range addresses in BFL-compatible notation, and optional flags for header or anchor-based lookup and locale-aware numeric normalization. Output is deterministic tsv or json suitable for agent workflows and audit scripts. When optional formula evaluation is enabled, the same bus-bfl dialect and range resolver contract as schema-validated table read apply so that formula-driven cells can be evaluated and included in the output; formula source may be included via an explicit opt-in. The library MUST expose this behavior through a dedicated API so that other modules can invoke workbook read without shelling out to the CLI. The command name, flag set, address forms, and output schema are specified in Table workbook (KD-DAT-005) below.
Table workbook (KD-DAT-005)
Command name: table workbook
Usage:
bus data table workbook <table_path> <address> [address ...]
Positional arguments: <table_path> is the workspace-relative path to the CSV table (.csv suffix optional). <address> [address ...] are one or more cell or range addresses and are required.
Address forms: A1-style single cell (e.g. A1, J510), bounded range (e.g. A1:B2), or open-ended range (e.g. A1:A, A:A). Rows are 1-based; column letters A=1, B=2, …, Z=26, AA=27, etc. (BFL-compatible). With --header, addresses may use ColumnName:RowNumber (e.g. id:1, nimi:2). With --anchor-col, addresses may use ColumnNameOrLetter:RowLabel to resolve the row by the anchor column’s value (e.g. nimi:alice).
Flags:
| Flag | Description |
|---|---|
--decimal-sep <char> |
Decimal separator for locale-aware numeric normalization (e.g. ,). |
--thousands-sep <char> |
Thousands separator for locale-aware numeric normalization (e.g. space). |
--formula |
Evaluate formula-enabled fields when a beside-the-table schema exists. |
--formula-source |
Include formula source in output when --formula is set. |
--formula-dialect <name> |
Source-specific formula dialect profile: spreadsheet, excel_like, or sheets_like. |
--header |
Resolve column by header name; addresses may use ColumnName:RowNumber. |
--anchor-row <n> |
Use row n (1-based) as the column header row; data rows follow. Default 1. |
--anchor-col <col> |
Column (letter or 1-based index) as row labels; addresses may use ColumnNameOrLetter:RowLabel. |
Global flags that apply: --format (tsv |
json), --output, --quiet, --verbose, --color, --chdir. |
Output schema (cell/range results): TSV (default): header row address\trow\tcol\tvalue, one data row per cell, tab-separated columns, order by address then row then column (deterministic). JSON (--format json): a JSON array of objects; each object has exactly four keys — "address" (string), "row" (number, 1-based), "col" (number, 1-based), "value" (string) — with the same ordering as TSV.
Workbook formula evaluation uses deterministic BFL delegation with the supported function set SUM, IF, and ROUND. Locale options (--decimal-sep, --thousands-sep) apply both to output normalization and to formula parsing/evaluation for workbook extraction. When --formula-source is enabled, formula-source columns remain raw source text (not locale-normalized).
Exit codes: 0 on success; 2 on invalid usage (e.g. missing table path or addresses, unknown --format); non-zero on missing file or validation error.
Command bus data schema init <table> creates a new CSV file and beside-the-table schema. It writes a header row that matches the schema field order and refuses to overwrite existing files unless explicitly forced.
Command bus data schema infer <table> reads an existing CSV and writes a beside-the-table schema inferred from the data. It does not modify the CSV and refuses to overwrite an existing schema unless explicitly forced.
Command bus data schema field add --resource <name> appends a new field definition to the schema and updates the CSV by appending a new column. Existing rows receive the field’s default value when provided, or an empty value when no default is specified.
Command bus data schema field set-type --resource <name> changes a field type only when the existing values are compatible with the new type. The command updates the schema and does not rewrite table data.
Interface IF-DAT-003 (import profile library contract). The bus-data library exposes a deterministic import-profile API for domain modules. The API validates profile descriptors, loads source resources using workspace-relative paths, and executes profile steps into an intermediate mapped-row stream with deterministic ordering. Supported primitive operations are mechanical and explicit: row filters, field maps, enum maps, keyed lookups, and deterministic transforms. The API does not write domain datasets and does not infer accounting semantics; consumers are responsible for canonical writes and domain validation.
Schema field remove and rename commands update both the schema and CSV deterministically. Field removal is refused unless --force is provided, and even when forced it must still refuse if the change would break primary key or foreign key integrity. Primary key and foreign key commands validate existing data before applying changes, and failures produce deterministic diagnostics without writing.
Command bus data row add <table> appends a new row. Row input is provided as repeated --set col=value flags or as a JSON object via --json. The row is validated against the schema and written in canonical column order.
Command bus data row update <table> replaces or updates a row identified by the primary key only when schema mutation policy allows in-place updates. Row selection uses repeated --key field=value flags in the same order as the schema’s primaryKey, and all primary key fields must be provided. It revalidates the resulting row and writes changes only when the schema permits in-place updates.
Command bus data row delete <table> removes a row identified by the primary key only when the schema permits deletion. Row selection uses repeated --key field=value flags in the same order as the schema’s primaryKey, and all primary key fields must be provided. Soft deletion uses the schema’s configured soft-delete field and value, while hard deletion removes the row entirely.
Initialization, schema extension, package and resource mutation, and row mutation commands write or modify files only when explicitly invoked and operate in the same workspace-relative path conventions as the inspection commands. Schema extension only adds columns and must not reorder or delete existing columns unless explicitly forced and compatible. Row mutation operations validate against the schema and mutation policy and write changes without altering unrelated rows.
Usage:
bus data [global flags] <command> [args]
Commands:
init Create empty datapackage.json at workspace root (no discovery).
package discover Scan workspace for tables with schemas; add resources to datapackage.json.
package show Print datapackage.json as stored.
package patch Apply a JSON merge patch to datapackage.json.
package validate Validate the full workspace data package.
resource list List resources from datapackage.json.
resource validate <resource> Validate a single resource.
resource add Add a resource and create CSV and schema artifacts.
resource remove <resource> Remove a resource; refuse if referenced by foreign keys.
resource rename <resource> Rename a resource and update references.
table list List tables with beside-the-table schemas.
table read <table_path> Validate a table and emit CSV or JSON.
schema show --table <table_path> Print the Table Schema JSON for a table.
schema show --resource <resource> Print the Table Schema JSON for a resource.
schema init <table_path> Initialize a CSV and beside-the-table schema.
schema infer <table_path> Infer a schema from an existing CSV.
schema patch --resource <resource> Apply a JSON merge patch to a schema.
schema field add --resource <resource>
schema field remove --resource <resource>
schema field rename --resource <resource>
schema field set-type --resource <resource>
schema field set-format --resource <resource>
schema field set-constraints --resource <resource>
schema field set-missing-values --resource <resource>
schema key set --resource <resource>
schema foreign-key add --resource <resource>
schema foreign-key remove --resource <resource>
row add <table_path> Append a new row.
row update <table_path> Update a row by primary key when allowed.
row delete <table_path> Delete a row by primary key when allowed.
<table_path> may omit the .csv suffix (e.g. "accounts" for accounts.csv).
Global flags:
-h, --help Show help and exit.
-V, --version Show version and exit.
-v, --verbose Increase verbosity (repeatable, e.g. -vv).
-q, --quiet Suppress non-error output.
-C, --chdir <dir> Use <dir> as the workspace root.
-o, --output <file> Write command output to <file>.
-f, --format <format> Output format: list tsv|json (default tsv), read csv|json (default csv),
resource validate tsv|json (default tsv), package validate tsv|json (default tsv).
--row <n> (read only) Emit only the nth data row (1-based). Use N:NN for a range.
--key <field=value> (read only) Emit only the row matching primary key fields; repeat for composites.
--filter <col=val> (read only) Keep rows where column equals value; repeat for AND.
--column <name> (read only) Emit only selected columns; repeat to keep multiple.
Read flags (--row, --key, --filter, --column) may appear before or after the table path.
--dry-run Show planned file and schema changes as stderr logs without writing.
--color <mode> auto|always|never for stderr messages (default: auto).
--no-color Alias for --color=never.
-- Stop parsing flags.
Write flags:
--schema <file> (schema init, resource add) Source schema JSON to write beside the table.
--sample <n> (schema infer) Limit inference to the first n data rows.
--field <name> (schema field commands) Field name to append or update.
--type <type> (schema field commands) Field type to append or apply.
--format <format> (schema field commands) Field format to apply.
--constraints <json> (schema field commands) JSON object for field constraints.
--missing-values <json> (schema field commands) JSON array for missingValues.
--required (schema field add) Mark the field as required.
--description <text> (schema field add) Field description text.
--rdf-type <uri> (schema field add) rdfType to apply.
--default <value> (schema field add) Default value written to existing rows.
--key <field=value> (row update/delete) Select a row by primary key; repeat for composites.
--set <col=val> (row add/update) Set a column value; repeatable.
--json <file> (row add/update) JSON object row input; use - for stdin.
--patch <file> (package patch, schema patch) JSON merge patch file.
--resource <name> (schema, resource commands) Resource name in datapackage.json.
--name <name> (resource add/rename) Resource name to add or set.
--path <path> (resource add) CSV path relative to workspace root.
--delete-files (resource remove) Delete CSV and schema artifacts.
--rename-files (resource rename) Rename CSV and schema to match resource name.
--primary <fields> (schema key set) Comma-separated primary key fields.
--reference <resource> (schema foreign-key add) Referenced resource name.
--reference-fields <fields>
(schema foreign-key add) Comma-separated referenced fields.
--force Allow destructive operations and overwrites.
Examples:
bus data init
bus data -vv table list
bus data --format json resource list
bus data table read people
bus data --format json --filter name=alice --row 1 table read people
bus data --key id=p-001 --column name --column age table read people
bus data package discover
bus data resource add --name people --path people.csv --schema people.schema.json
bus data schema init people --schema people.schema.json
bus data schema infer people
bus data schema field add --resource people --field nickname --type string
bus data schema field set-type --resource people --field age --type integer
bus data row add people --set id=p-001 --set name=Alice
bus data row update people --key id=p-001 --set name=Alice A.
bus data row delete people --key id=p-001
bus data -- table read --weird.csv (use -- when the table path starts with '-')
Data Design
The module operates on workspace datasets as CSV resources with beside-the-table Table Schema JSON files. The canonical schema for a table is the beside-the-table file with the .schema.json suffix. A workspace datapackage.json is stored at the workspace root and references resources by a deterministic name and a relative path to the CSV file. Resource names default to the CSV basename without the .csv suffix, and paths are stored as workspace-relative CSV paths. Each resource embeds its Table Schema descriptor inline, sourced from the beside-the-table schema file; when schema changes are applied, the schema file and the embedded resource schema are updated together to remain consistent.
BusDK extends Table Schema metadata with a busdk object used by bus-data to determine whether in-place updates and deletions are permitted. The busdk.update_policy field may be forbid or in_place, and the busdk.delete_policy field may be forbid, soft, or hard. When busdk.delete_policy is soft, busdk.soft_delete_field and busdk.soft_delete_value must be set so the command can apply a deterministic soft deletion update, and the updated row must still satisfy schema constraints and key uniqueness. The default is that updates and deletions are forbidden unless the schema explicitly enables them, and the CLI provides explicit schema commands to set or change these policies. The busdk extension coexists with Frictionless descriptors and is preserved verbatim when schemas and data packages are rewritten.
BusDK also extends Table Schema field descriptors with busdk.formula metadata to declare BFL-enabled fields and their evaluation rules. The canonical representation uses the following keys under each field descriptor’s busdk.formula object: language set to bfl, mode set to inline or constant, expression as a string when mode is constant, result as an object describing the computed logical type, prefix as an optional string used only when the consumer enables a prefix stripping policy, on_error set to fail or null, and rounding as an optional object with scale and mode (half_up or half_even). Formula-enabled fields are stored physically as standard Frictionless types that can hold the UTF-8 formula source, while the computed result type is enforced only at read-time projection and validation.
Bus Data owns mechanical concerns only: reading and writing CSV, reading and writing schema JSON, creating and patching datapackage.json, enforcing Table Schema constraints, and validating foreign key integrity. Domain modules own business rules, domain invariants, and any accounting classification decisions; bus-data does not infer or enforce those semantics.
Path ownership lies with domain modules. When a consumer needs to read or write a domain table (e.g. accounts, periods, journal), it MUST obtain the path from the owning module’s Go library (see Data path contract for read-only cross-module access). Bus-data accepts table paths as input and performs schema-validated I/O on them; it does not define or hardcode which path is “accounts” or “periods.” This keeps the data layer mechanical and allows future dynamic path configuration to be implemented in the owning modules without changing bus-data’s contract.
Workbook-style CSV is a read-only view over a CSV resource treated as a grid of cells. Addressing uses the same column-letter and row-number mapping as bus-bfl (1-based rows; column A=1 through Z=26, AA=27, etc.) so that address-based extraction and formula range resolution are consistent. Workbook read does not require a beside-the-table schema; when a schema is present and formula evaluation is enabled, formula metadata may be used, and when no schema is present, formula treatment is opt-in and implementation-defined (e.g. heuristic detection of leading = or explicit cell-set configuration). Locale-aware numeric normalization applies only in workbook read and does not change the stored CSV or schema-validated table read behavior.
Import profiles are repository files that declare source resources, target resource intent, and deterministic mapping steps. A profile descriptor must include stable identifiers for the profile itself and source bindings, explicit transform ordering, and mapping-step metadata sufficient for audit artifacts. Bus-data validates this descriptor and executes only supported mechanical operations; domain modules own any domain-specific enrichments and final append semantics.
Assumptions and Dependencies
Bus Data depends on the workspace layout conventions for CSV, beside-the-table schema files, and datapackage.json at the workspace root. If datasets or schemas are missing or invalid, the library and CLI return deterministic diagnostics and do not modify files.
Bus Data depends on the bus-bfl library for parsing, validation, and evaluation of BFL expressions. If the library surface or error contracts change, Bus Data must update its integration while preserving deterministic diagnostics and stable validation behavior. For range evaluation, Bus Data assumes that the current resource snapshot provides a stable ordering of rows and fields so range resolution remains deterministic for a given read operation.
Security Considerations
Bus Data does not perform network or Git operations. It preserves auditability by refusing invalid or destructive writes, requiring explicit force flags for destructive operations, and supporting --dry-run for all mutating commands.
Observability and Logging
Command results are written to standard output, and diagnostics are written to standard error with deterministic references to dataset paths, resource names, and identifiers. Verbose output is written to standard error so it does not interfere with command results. Diagnostics are stable and citeable so tests and automated tools can rely on them.
Error Handling and Resilience
Invalid usage exits with status code 2 and a concise usage error. Schema violations, foreign key integrity failures, disallowed mutation attempts, or filesystem errors exit non-zero without modifying datasets, schemas, or datapackage.json, and without emitting partial output.
Testing Strategy
Unit tests cover Table Schema and Data Package parsing, safe patching with preservation of unknown properties, deterministic JSON serialization, deterministic CSV write behavior, schema inference determinism, and foreign key validation logic. Command-level end-to-end tests validate outputs, exit codes, and on-disk changes using fixture workspaces, including at least one test that verifies cross-resource foreign key integrity and one test that proves resource deletion is refused when referenced.
Integration tests cover formula metadata round-tripping in schemas and data packages, formula validation failures with deterministic error context, and read-time projection of computed values without modifying stored CSV. Tests include at least one case for inline formulas, one case for constant formulas, one case that verifies the configured rounding policy is applied or rejected deterministically, and at least one case that evaluates a range expression using the bus-data range resolver against a fixed dataset snapshot.
Import-profile tests MUST cover descriptor validation failures, deterministic execution ordering, stable handling of filters and lookup joins, enum/status mapping behavior, and byte-identical mapped-row output for repeated runs with identical source snapshots. Tests MUST also verify that bus-data profile helpers do not write domain datasets directly and return deterministic diagnostics that identify profile ID, source binding, and failing step when execution fails.
Deployment and Operations
Not Applicable. The module ships as a BusDK CLI component and library and relies on the standard workspace layout.
Migration/Rollout
Not Applicable. Schema evolution is handled through the standard schema migration workflow for workspace datasets.
Risks
Not Applicable. Module-specific risks are not enumerated beyond the general need for deterministic data handling.
Implementation status (workbook read)
Workbook-style read is implemented at the command surface via bus data table workbook. The command name, flags, address forms, and output schema are specified in Table workbook (KD-DAT-005) in Component Design and Interfaces. A1 and range reads (e.g. A1, A1:B2), workbook-specific options for locale and formulas (--decimal-sep, --thousands-sep, --formula, --formula-source, --formula-dialect, --header, anchors), and machine-friendly output (tsv/json) are implemented and verified. FR-DAT-022 through FR-DAT-025 are covered at the command level, including source-specific formula token behavior and locale-aware formula evaluation with the supported function set (SUM, IF, ROUND). The Formula metadata and evaluation for workbook extraction delegation doc is the normative integration guide for formula options, supported functions, and locale handling.
Verification (formula and locale parity). Implementations and reviewers can confirm parity as follows. (1) In a workspace with a CSV that has formula cells and a beside-the-table schema defining formula fields, run bus data table workbook <path> A1:C3 --formula -f tsv and assert the output contains evaluated numeric values for formula cells, not formula text. (2) With the same setup, assert that non-formula cells and formula results appear in the same machine-friendly columns. (3) For locale: run with --decimal-sep "," --thousands-sep " " against a cell whose raw value is 1 234,56 (space thousands, comma decimal). (4) Assert the output value for that cell is normalized to 1234.56 (canonical decimal form). These steps verify FR-DAT-024 and FR-DAT-025 acceptance criteria for workbook read.
Glossary and Terminology
Workspace store interface: the persistence boundary for deterministic table and schema operations.
Mechanical data layer: functionality that handles storage and validation without domain rules.
Schema operation policy: the optional busdk metadata in a Table Schema that declares whether in-place update and delete operations are permitted.
Formula-enabled field: a field descriptor that declares busdk.formula metadata and is evaluated using BFL during read-time projection.
Package discover: the operation that scans the workspace for CSV files with beside-the-table schemas and adds or updates resource entries in datapackage.json; distinct from init, which only creates an empty descriptor.
Workbook-style CSV: a CSV resource treated as a grid of cells for address-based or range extraction, optionally with header or anchor-based lookup and formula evaluation, without requiring schema-validated table read.
Cell address: a BFL-compatible reference to a single cell (e.g. J510), using column letters and 1-based row number.
Open Questions
OQ-DAT-003 (resolved). The command name, flag set, and output schema for workbook-style read are specified in Table workbook (KD-DAT-005).
OQ-DAT-001 What is the exact CLI flag name and output shape for the opt-in mode that includes formula source alongside computed values, and which output formats must support it?
OQ-DAT-002 Should bus data table read fail the entire command on the first formula evaluation error, or should it report all formula errors and then fail without emitting partial output?