FAQ: Workspaces, datasets, and compliance boundaries
FAQ: Workspaces, datasets, and compliance boundaries
What exactly is stored in a BusDK workspace?
A BusDK workspace stores repository data that includes canonical datasets, schema definitions, and deterministic script inputs. This structure is designed so the same inputs can produce the same outputs over time.
Who owns each dataset file?
Each dataset has an owning module that defines write behavior and business rules for that dataset. Other modules may read the data, but cross-module access should follow module contracts instead of hardcoded assumptions.
Why are schemas so central in BusDK?
Schemas are the contract layer that keeps data predictable across modules and across time. They support deterministic validation and make drift visible before it turns into hidden business inconsistencies, as defined in the table schema contract.
Does append-only behavior mean data can never be corrected?
No. Corrections are supported, but they should remain explicit and reviewable through controlled workflow operations instead of silent destructive edits. The goal is traceability, not immutability for its own sake, consistent with append-only updates and soft deletion.
Can we keep sensitive data outside the repository?
Yes. Sensitive material handling should follow module and environment policy boundaries. BusDK includes patterns for reference-based handling and secret management so operational workflows do not require unsafe inline secrets.
How does BusDK help with compliance and audit readiness?
BusDK emphasizes explicit workflow steps, reproducible outputs, and evidence mapping. Compliance pages define jurisdiction-specific expectations, while module flows and validation steps from bus-validate and bus-reports make controls executable rather than just descriptive.
Can one repository contain multiple workspaces?
BusDK supports explicit workspace-scoped operation patterns. Multi-workspace usage is possible, but commands should run with clear workspace boundaries so data ownership and replay behavior stay deterministic.
What is the practical difference between “workspace data” and generated outputs?
Workspace data is the authoritative operational source. Generated outputs are derived artifacts produced from that source through deterministic command workflows. If output generation logic changes, bus replay should regenerate outputs from the same source data.