dot-iu-cutter v0.4 — SQL/NoSQL Hybrid Information-Unit Strategy (2026-05-17)
dot-iu-cutter v0.4 — SQL/NoSQL Hybrid Information-Unit Strategy
Date: 2026-05-17 · DESIGN ONLY (no code/commit/dry-run/provision/secret/deploy). Companion to the scale/automation/non-hardcode review; defines the authoritative SQL/NoSQL split for the information-unit (IU) lifecycle so the LedgerWriter binding is forward-compatible with scale + automation.
1. Principle: SQL authority, NoSQL acceleration
PostgreSQL cutter_governance is the system of record (SSOT) for everything that must be correct, auditable, and replayable. Any document store (Agent Data / Directus) and any future vector store are derived, rebuildable projections for retrieval/UX latency — never authority.
| Concern | Authority | Acceleration / projection |
|---|---|---|
| IU identity (uuids) | SQL | — |
| IU lifecycle / state machine | SQL (status + CAS, OD-SM-1/2) |
— |
| Governance / audit ledger | SQL (append-only decision_backlog_history, DOT signatures) |
— |
| Review / cut / verify decisions | SQL (FK lineage) | — |
| Idempotency / dedup | SQL (UNIQUE cut_change_set.idempotency_key; MARK key indexed) |
— |
| IU/canonical text & manifests | SQL (jsonb payloads) | doc store mirror (mirror_path) |
| Semantic search over IUs | — | vector store (future), rebuildable from SQL |
2. Field classification taxonomy (binding-forward)
Every column the LedgerWriter binding touches is one of:
- SQL core identity — PK uuids. Generated in-process (uuid4) or DB default
gen_random_uuid(). Never in JSONB, never derived from external systems. - SQL relationship — FK columns + self-FK chains. Enforce lineage; must be real columns (never JSONB) so FK integrity + indexed joins hold at scale.
- SQL governance event — discrete state/verdict/timestamp columns (
status, change_kind, governance_event_kind, verdict, validation_state, state, *_at, cross_signed_by_dot_verifier). Drive control flow + audit; must be columnar (queried, constrained, partition keys). - SQL query projection — keys used for filter/join/sweep/lineage (
status, source_doc_ref, idempotency_key, rollback_key, manifest_id, change_set_id, entry_id, emitted_at, changed_at). Indexed before scale (scale-review A.3). - JSONB payload — flexible, non-queried-yet bodies (
payload, change_diff, source_span, findings, reviewer_identity, payload_envelope, payload_summary, candidate_edges, report_summary, *_state_snapshot). Schemaless by intent; normalize per §4. - Vector payload — none in v0.4. Future: embeddings of canonical IU text. Stored outside the SSOT; derived; rebuildable; never FK'd from SQL.
- Blob / object pointer — text pointers to external content (
mirror_path;source_doc_refas IU/doc lineage pointer). The pointer is SQL-authoritative; the pointee lives in the doc/object store.
3. Why identity/lifecycle/governance stay in SQL (non-negotiable)
- Append-only + CAS give a tamper-evident, race-safe ledger that a document/vector store cannot guarantee (no atomic compare-and-set across the lifecycle, no FK, weak audit).
- DOT-pair signature chain (
prior_signature_id) is a per-lane hash chain whose integrity depends on relational ordering + FK. - Idempotency requires a UNIQUE constraint (server-enforced) — only SQL provides it transactionally with the write.
- Replay/automation under at-least-once delivery requires deterministic keys + transactional dedupe — SQL only.
4. JSONB → normalize: the rule and the queue
Keep a field JSONB until a concrete need arises; then normalize. Normalize when ANY of:
- (a) filtered/joined on the hot path or at scale (index opacity),
- (b) needs FK integrity to another entity,
- (c) drives a governance/control-flow decision,
- (d) routinely aggregated/reported.
Normalization queue (priority order, design-time):
payload.idempotency_key→ indexed scalar (expression UNIQUE index, or generated stored column) — required before scale; MARK currently O(N) (scale-review A.3). First to graduate.manifest_unit_block.source_span→ columns iff span-range queried (canonical-address resolution at scale).verify_result.findings/review_decision.findings→ child table iff drift/governance analytics become routine.- Others stay JSONB until a query need is proven (avoid premature schema churn / migration noise).
Each normalization is its own additive, GPT-gated DDL+code step — never bundled silently into the binding cycle.
5. Information-unit lifecycle ↔ storage map
source IU (iu_ref)
└─ decision_backlog_entry [SQL identity+lifecycle] payload(JSONB: signal, idempotency_key*)
└─ manifest_envelope [SQL] source_doc_ref→IU lineage (SB-DEC-1)
└─ manifest_unit_block(s) [SQL] proposed_canonical_address (result-unit address)
source_span(JSONB)
└─ review_decision [SQL governance] findings/reviewer_identity(JSONB)
└─ cut_change_set [SQL] idempotency_key(UNIQUE), payload_summary(JSONB: content_hash)
└─ cut_change_set_affected_row(s) [SQL] before/after_state_snapshot(JSONB)
└─ dot_pair_signature (DOT-991) [SQL hash-chain] payload_envelope(JSONB)
└─ verify_result [SQL] + dot_pair_signature (DOT-992) → status=verified_complete
(decision_backlog_history append-only at every transition — SQL audit spine)
(canonical_address_alias — DEFERRED OD-2, 0 rows; future alias/supersession workstream)
(* idempotency_key graduates JSONB→indexed scalar before scale)
- canonical_address relationship: the source IU address is implicit in
iu_ref/source_doc_ref; the result unit address ismanifest_unit_block.proposed_canonical_address(SB-DEC-2). Aliasing/rename across addresses = deferred (OD-2). - Supersession: write-once SQL stamps (
review_decision.superseded_by_*,manifest_envelope.superseded_by_envelope_id) + forward-compensatingcut_change_set— no physical delete, append-only preserved.
6. Automation/hybrid invariants the binding must not break
- SQL transaction boundary owns the phase; no write to doc/vector store inside the governance txn (those are async, post-commit, derived projections — out of v0.4 scope, named only).
- Deterministic keys (idempotency/rollback) ⇒ safe under queue retries.
- Vector/doc projections are eventually-consistent caches reconciled from SQL; SQL never reads them for a governance decision.
- No hardcoded collection/endpoint/DSN — all config (non-hardcode review B).
7. Verdict (this doc)
- SQL is and remains SSOT for identity, lifecycle, governance, audit, review/cut/verify, idempotency. Confirmed.
- JSONB stays JSONB for v0.4 + dry-run; a documented normalization queue exists, led by the idempotency key (required before scale, not before dry-run).
- Vector store = search acceleration only, derived & rebuildable, never authority; introduced later, config-driven.
- The schema binding is forward-compatible with scale + automation provided the code cycle adopts the centralized vocabulary module, server-side idempotency lookup, and config-driven knobs (scale-review F). No change to the SQL SSOT principle; no structural migration for the dry-run.