bus-data — shared tabular data layer and schema-validated I/O

bus data — inspect and maintain workspace datasets, schemas, and data packages

Overview

Command names follow CLI command naming.

Bus Data provides shared deterministic Frictionless Table Schema and Data Package handling for workspace datasets. Its primary surface is a Go library that other modules import for schema, data package, and CSV operations.

The canonical CLI entry is bus data. The standalone bus-data binary remains available for direct script usage.

bus data reads tables, schemas, and data packages, validates records and foreign keys, and performs schema-governed changes. It is a mechanical data layer and does not implement domain accounting logic.

Workbook-style read is available via bus data table workbook. It supports cell/range addresses (for example A1, A1:B2), optional header or anchor lookup, locale options, and formula controls. Output is deterministic and machine-friendly (tsv/json).

For workbook formula details and parity notes, see Formula metadata and evaluation for workbook extraction and Module SDD: bus-data.

For ERP migration workflows, bus-data provides a mechanical import-profile library contract used by domain modules such as bus-invoices and bus-bank.

Synopsis

bus data init [--chdir <dir>] [global flags]
bus data schema init <table> --schema <file> [--force] [--chdir <dir>] [global flags]
bus data schema show <table> | schema show --resource <name> [--chdir <dir>] [global flags]
bus data schema infer <table> [--sample <n>] [--chdir <dir>] [global flags]
bus data schema field add [--resource <name>] --field <name> --type <type> [--default <value>] [--required] [--description <text>] [--chdir <dir>] [global flags]
bus data schema field set-type [--resource <name>] --field <name> --type <type> [--chdir <dir>] [global flags]
bus data schema patch [--resource <name>] --patch <file> [--chdir <dir>] [global flags]
bus data package discover | package show | package patch --patch <file> | package validate [--chdir <dir>] [global flags]
bus data resource list | resource validate <resource> [--chdir <dir>] [global flags]
bus data resource add --name <name> --path <path> --schema <file> [--chdir <dir>] [global flags]
bus data resource remove <resource> [--delete-files] [--chdir <dir>] [global flags]
bus data row add <table> (--set <key>=<value> ... | --json <file>) [--chdir <dir>] [global flags]
bus data row update <table> --key <key>=<value> ... (--set <key>=<value> ... | --json <file>) [--chdir <dir>] [global flags]
bus data row delete <table> --key <key>=<value> ... [--chdir <dir>] [global flags]
bus data table read <table> [--row <index|start:end>] [--column <name>] ... [--filter <field>=<value>] ... [--key <key>=<value>] [--formula-source] [--chdir <dir>] [-o <file>] [-f <format>] [global flags]
bus data table list [--chdir <dir>] [-o <file>] [-f <format>] [global flags]
bus data table workbook <table_path> <address> [address ...] [workbook and global flags]

Getting started

To create an empty workspace data package descriptor, run bus data init. It creates datapackage.json at the workspace root with the standard profile and an empty resources array. Init does not scan the workspace for CSV files or add resources. When the file already exists, init is idempotent and leaves it unchanged. Adding resource entries is a separate operation: run bus data package discover to scan the workspace for CSV files that have a beside-the-table schema and add or update resource entries in the existing datapackage.json. Discover requires datapackage.json to exist (e.g. after bus data init); if the file is missing, the command fails with a clear diagnostic.

bus data init

Start by defining a schema and letting bus data create the table and its beside-the-table schema file. The table path may omit the .csv suffix, and schema init writes both the CSV and the .schema.json in one deterministic step. Schema metadata such as primaryKey, foreignKeys, and missingValues is preserved when you initialize the table. Use --force when you need to overwrite an existing table and schema with a new definition.

bus data schema init customers --schema customers.schema.json
bus data schema init customers --schema customers.schema.json --force

If you manage a workspace data package, run bus data package discover after tables exist so that each table with a beside-the-table schema is added as a resource. Discover requires datapackage.json to exist (e.g. after bus data init).

bus data package discover

Add rows with either repeated --set assignments or a JSON file. Each row is validated against the schema before it is written, and duplicate primary keys are rejected for both single-column and composite keys.

bus data row add customers --set id=1 --set name=Ada --set active=true
bus data row add customers --json row2.json

List tables in the workspace to confirm what exists. The output is a deterministic, workspace-relative list of table and schema paths, and you can request JSON format when you need structured output.

bus data table list

Read a table to get its canonical CSV. bus data always validates against the schema before it emits rows, and JSON output is available for downstream tooling.

bus data table read customers

Read only what you need

When you want a narrower view, you can select rows and columns without changing validation. The filters are applied after validation, so the table still must pass its schema. Use --row with a single index or an inclusive start:end range, and repeat --column or --filter to refine the output.

bus data table read customers --row 1 --column id --column name
bus data table read customers --row 1:2 --column id
bus data table read customers --filter status=active --filter name=Ada

If your tables use a primary key, use --key field=value for a single-column key. Repeat --key for composite keys in the same order as the schema’s primaryKey. Primary key uniqueness is enforced on row add for both single-column and composite keys.

bus data table read customers --key id=1

Table workbook

bus data table workbook reads cells or ranges by address from a CSV table, with optional header or anchor-based lookup and formula evaluation. This gives api/sheets and scripts a stable contract for workbook-style extraction.

Usage:

bus data table workbook <table_path> <address> [address ...]

Positional arguments: <table_path> is the workspace-relative path to the CSV table (.csv suffix optional). <address> [address ...] are one or more cell or range addresses and are required.

Address forms: A1-style single cell (e.g. A1, J510), bounded range (e.g. A1:B2), or open-ended range (e.g. A1:A, A:A). Rows are 1-based; column letters A=1, B=2, …, Z=26, AA=27, etc. (BFL-compatible). With --header, addresses may use ColumnName:RowNumber (e.g. id:1, nimi:2). With --anchor-col, addresses may use ColumnNameOrLetter:RowLabel to resolve the row by the anchor column’s value (e.g. nimi:alice).

Flags:

Flag	Description
`--decimal-sep <char>`	Decimal separator for locale-aware numeric normalization (e.g. `,`).
`--thousands-sep <char>`	Thousands separator for locale-aware numeric normalization (e.g. space).
`--formula`	Evaluate formula-enabled fields when a beside-the-table schema exists.
`--formula-source`	Include formula source in output when `--formula` is set.
`--formula-dialect <name>`	Source-specific formula dialect profile: `spreadsheet`, `excel_like`, or `sheets_like`.
`--header`	Resolve column by header name; addresses may use `ColumnName:RowNumber`.
`--anchor-row <n>`	Use row n (1-based) as the column header row; data rows follow. Default 1.
`--anchor-col <col>`	Column (letter or 1-based index) as row labels; addresses may use `ColumnNameOrLetter:RowLabel`.

Global flags that apply: --format (tsv json), --output, --quiet, --verbose, --color, --chdir.

Output schema (cell/range results): TSV (default): header row address\trow\tcol\tvalue, one data row per cell, tab-separated columns, order by address then row then column (deterministic). JSON (--format json): a JSON array of objects; each object has exactly four keys — "address" (string), "row" (number, 1-based), "col" (number, 1-based), "value" (string) — with the same ordering as TSV.

Workbook formula evaluation uses deterministic BFL delegation with the minimal supported function set SUM, IF, and ROUND. Locale options (--decimal-sep, --thousands-sep) are applied both to value normalization and to formula parsing/evaluation for workbook extraction. When --formula-source is enabled, formula source columns remain raw source text (not locale-normalized). With --formula, output contains evaluated values (e.g. numeric) for formula cells, not formula text; locale-formatted cell values (e.g. 1 234,56) are normalized to canonical form (e.g. 1234.56) when locale flags are set.

Example with formula evaluation and locale (decimal comma, space thousands):

bus data table workbook source.csv A1:C10 \
  --formula \
  --decimal-sep "," \
  --thousands-sep " " \
  -f tsv

Verification steps for formula output and locale normalization are in Formula metadata and evaluation for workbook extraction — Verification.

Exit codes: 0 on success; 2 on invalid usage (e.g. missing table path or addresses, unknown --format); non-zero on missing file or validation error.

Update and delete rows

Row updates only succeed when the schema allows in-place edits. The schema’s busdk.update_policy must be set to in_place for updates to work. Updates accept either repeated --set assignments or a JSON file payload. Updates use the primary key to identify a single row, and composite keys are provided by repeating --key field=value.

bus data row update customers --key id=1 --set balance=15.00
bus data row update customers --key id=1 --json row_update.json

Row deletes follow the schema’s delete policy. When the policy is soft, bus data writes the configured soft-delete field and value rather than removing the row. The schema’s busdk.delete_policy, busdk.soft_delete_field, and busdk.soft_delete_value control that behavior.

bus data row delete customers --key id=2

Manage data packages and resources

Use bus data init to create an empty datapackage.json at the workspace root. Use package discover to find tables with beside-the-table schemas and add or update resources.

Use package show to inspect the descriptor. Use package patch for JSON merge patches. Use package validate to validate the full package, including foreign keys.

Use resource list for deterministic resource order. Use resource validate to validate one resource without modifying files.

bus data init
bus data package discover
bus data package show
bus data package patch --patch package.patch.json
bus data package validate
bus data resource list
bus data resource validate customers

Add resources explicitly with a name and CSV path and provide --schema to supply the Table Schema to write beside the table. This creates the CSV and beside-the-table schema artifacts. Remove a resource with --delete-files to delete its CSV and schema; the command refuses removal when any foreign key references the resource.

bus data resource add \
  --name customers \
  --path customers.csv \
  --schema customers.schema.json
bus data resource remove customers --delete-files

Inspect and evolve schemas

Use schema show when you need the exact schema JSON. It prints the schema file as-is, either by table path or by resource name, so the output matches the bytes on disk. When datapackage.json is present, --resource resolves the schema path from the package.

bus data schema show customers
bus data schema show --resource customers

If you already have a CSV, you can infer a schema from existing data and write it beside the table.

bus data schema infer products --sample 2

Adding a field extends both the schema and the CSV. A default value is written to existing rows, and you can mark the field as required and add a description. When you need formula metadata inline, use schema field add with formula flags so the schema and table stay in sync.

bus data schema field add \
  --resource products \
  --field category \
  --type string \
  --default general \
  --required \
  --description "category"
bus data schema field add \
  --resource products \
  --field total \
  --type string \
  --formula-mode inline \
  --formula-prefix "=" \
  --formula-result-type number \
  --default "=a + b"

Changing a field type updates the schema only when existing values are compatible with the new type.

bus data schema field set-type --resource products --field price --type number

Schema metadata can include primaryKey, foreignKeys, and missingValues. These are preserved by schema init, and primaryKey can be either a single field name or an ordered list for composite keys. Foreign key definitions follow the Table Schema format and are enforced during resource and package validation.

When you need full control over schema metadata, apply a JSON merge patch that preserves unknown properties. Use --resource to target a schema by resource name when datapackage.json is present.

bus data schema patch --resource products --patch schema.patch.json

Formula-driven columns

Schemas can declare formula columns by adding a busdk.formula block under a field. Inline formulas are stored in the CSV cell values and evaluated at read time. Constant formulas ignore cell values and use the schema expression as the source of truth. Formula results are typed and may include rounding rules for numeric results.

Formulas use the BFL language; evaluation is delegated to the BFL library, and locale and function set affect parity with source workbooks. Inline formulas typically set a prefix such as = and accept per-cell expressions. Constant formulas set mode to constant and provide an expression string. When on_error is set to null, formula errors yield empty output values instead of failing the read. See Formula metadata and evaluation for workbook extraction for the integration contract and recommended function set.

To get started with inline formulas, define a column with a busdk.formula block, add rows with the formula expression in the cell, and read the table to see computed values. The formula source stays in the CSV, while table read emits the computed value.

cat > laskelmat.schema.json <<'JSON'
{
  "fields": [
    {"name": "a", "type": "integer"},
    {"name": "b", "type": "integer"},
    {
      "name": "total",
      "type": "string",
      "busdk": {
        "formula": {
          "language": "bfl",
          "mode": "inline",
          "prefix": "=",
          "result": {"type": "integer"}
        }
      }
    }
  ]
}
JSON
bus data schema init laskelmat --schema laskelmat.schema.json
bus data row add laskelmat --set a=2 --set b=3 --set total==a + b
bus data table read laskelmat

If you need to inspect the raw formula sources, --formula-source adds an extra column that captures the original formula expression for each formula field without colliding with existing column names. The source column name is the formula field name with the __formula_source suffix.

bus data table read laskelmat --formula-source

Constant formulas are driven entirely by the schema. This is useful when you want a computed column that does not depend on per-row input, such as a fixed ratio with controlled rounding.

cat > laskelmat_const.schema.json <<'JSON'
{
  "fields": [
    {"name": "label", "type": "string"},
    {
      "name": "total",
      "type": "string",
      "busdk": {
        "formula": {
          "language": "bfl",
          "mode": "constant",
          "expression": "1 / 8",
          "result": {"type": "number"},
          "rounding": {"scale": 2, "mode": "half_even"}
        }
      }
    }
  ]
}
JSON
bus data schema init laskelmat_const --schema laskelmat_const.schema.json
bus data row add laskelmat_const --set label=ok
bus data table read laskelmat_const

Formula evaluation uses a table snapshot, so range expressions resolve against the same read and do not depend on row-by-row mutation. Invalid formula metadata is rejected at read time and reports a formula error to standard error. To treat formula errors as empty values, set on_error to null in the formula block.

Import mapping profiles (planned)

The import-profile contract is a library surface, not an end-user accounting command. Domain modules provide target semantics and profiles; bus-data provides deterministic profile validation and mapping primitives.

End-user imports are run via domain modules such as bus invoices import and bus bank import. Detailed contract and roadmap details are maintained in Module SDD: bus-data.

Output formats and files

bus data can emit JSON where a command supports structured output. Table and resource listings default to TSV, while table reads default to CSV. Package and resource validation report one row per resource and use TSV by default, or JSON when --format json is set. JSON outputs preserve the CSV string values and keep ordering deterministic.

bus data --format json table list
bus data --format json resource list
bus data --format json table read customers
bus data --format json package validate
bus data --format json resource validate customers

To capture output in a file, use --output. If you also use --quiet, output is suppressed and the file is not written.

bus data --output out_list.tsv table list
bus data --quiet --output out_list.tsv table list

Workspace and safety flags

Global flags (including --chdir) are defined in Standard global flags. Use --chdir to set the workspace root before resolving any paths. This is useful when you run bus data from another directory.

bus data --chdir /path/to/workspace table list

Use --dry-run on mutating commands to see what would change without writing files. This keeps existing CSV and schema files unchanged.

bus data --dry-run row add products --set id=P-4 --set name="Product D"

For replay performance diagnostics, set BUSDK_PERF=1 when running row add. The command emits deterministic stderr diagnostics:

BUSDK_PERF=1 bus data row add customers --set id=3 --set name=Celia
# bus-data: perf op=row.add table=customers rows=1 elapsed_ms=... rows_per_sec=...

If a table name starts with -, place -- before the command arguments to stop flag parsing.

bus data -- table read -taulu

Help and version output are printed to standard output. Diagnostics and validation failures are printed to standard error. You can disable or force colored output for diagnostics with --color auto|always|never or --no-color, and the flags are accepted even when you only need help or version output.

Files

The module operates on workspace datasets as CSV resources with beside-the-table Table Schema JSON files (same directory, .csv replaced by .schema.json). A workspace datapackage.json is stored at the workspace root and references resources by name and workspace-relative CSV path. Path ownership lies with domain modules: when a consumer needs to read or write a domain table (e.g. accounts, periods, journal), it obtains the path from the owning module’s Go library. Bus-data accepts table paths as input and performs schema-validated I/O on them; it does not define or hardcode which path is “accounts” or “periods” (see Data path contract).

Examples

bus data init
bus data resource list
bus data schema init customers --schema ./schemas/customers.schema.json
bus data row add customers --set id=C-100 --set name="Northwind Oy" --set status=active
bus data table read customers --filter status=active --column id --column name
bus data package validate --format json --output ./out/package-validate.json

Exit status

0 on success. 2 on invalid usage. Non-zero on missing files, schema validation failure, or foreign key integrity failure.

Using from `.bus` files

Inside a .bus file, write this module target without the bus prefix.

# same as: bus data schema show --resource customers
data schema show --resource customers

# same as: bus data table workbook reports/sales.csv A1:C8 --header --format json
data table workbook reports/sales.csv A1:C8 --header --format json

# same as: bus data row update customers --key id=C-100 --set status=archived
data row update customers --key id=C-100 --set status=archived

Development state

Value promise: Inspect and maintain workspace datasets, schemas, and data packages with schema-governed row and schema operations so tables stay valid and reviewable without running domain CLIs.

Use cases: Workbook and validated tabular editing.

Completeness: 80% — Package/resource/schema/table/workbook/row and formula surface verified by e2e and unit tests; table workbook documented in SDD and CLI reference (KD-DAT-005).

Use case readiness: Workbook and validated tabular editing: 80% — User can complete package/resource lifecycle, schema evolution, table and workbook-style read (cell/range, –header, –anchor-col/–anchor-row, –decimal-sep, –formula), and row mutate.

Current: Package/resource/schema/table/workbook and row operations are verified by e2e and unit tests, including formula and foreign-key behavior. For detailed test matrix and implementation notes, see Module SDD: bus-data.

Planned next: Add and document the import-profile library contract (descriptor validation and deterministic mapping primitives) for bus-invoices and bus-bank. Advances Workbook and validated tabular editing and ERP migration workflows.

Blockers: None known.

Depends on: None.

Used by: bus-api and bus-sheets for workspace endpoints and the embedded UI backend.

See Development status.

← bus-init Module CLI reference bus-dev →