KB-3C5F

dot-iu-cutter v0.4 — SQL/NoSQL Hybrid Information-Unit Strategy (2026-05-17)

8 min read Revision 1
dot-iu-cutterv0.4schema-bindingsql-nosql-hybridinformation-unitdieu44design

dot-iu-cutter v0.4 — SQL/NoSQL Hybrid Information-Unit Strategy

Date: 2026-05-17 · DESIGN ONLY (no code/commit/dry-run/provision/secret/deploy). Companion to the scale/automation/non-hardcode review; defines the authoritative SQL/NoSQL split for the information-unit (IU) lifecycle so the LedgerWriter binding is forward-compatible with scale + automation.

1. Principle: SQL authority, NoSQL acceleration

PostgreSQL cutter_governance is the system of record (SSOT) for everything that must be correct, auditable, and replayable. Any document store (Agent Data / Directus) and any future vector store are derived, rebuildable projections for retrieval/UX latency — never authority.

Concern Authority Acceleration / projection
IU identity (uuids) SQL
IU lifecycle / state machine SQL (status + CAS, OD-SM-1/2)
Governance / audit ledger SQL (append-only decision_backlog_history, DOT signatures)
Review / cut / verify decisions SQL (FK lineage)
Idempotency / dedup SQL (UNIQUE cut_change_set.idempotency_key; MARK key indexed)
IU/canonical text & manifests SQL (jsonb payloads) doc store mirror (mirror_path)
Semantic search over IUs vector store (future), rebuildable from SQL

2. Field classification taxonomy (binding-forward)

Every column the LedgerWriter binding touches is one of:

  1. SQL core identity — PK uuids. Generated in-process (uuid4) or DB default gen_random_uuid(). Never in JSONB, never derived from external systems.
  2. SQL relationship — FK columns + self-FK chains. Enforce lineage; must be real columns (never JSONB) so FK integrity + indexed joins hold at scale.
  3. SQL governance event — discrete state/verdict/timestamp columns (status, change_kind, governance_event_kind, verdict, validation_state, state, *_at, cross_signed_by_dot_verifier). Drive control flow + audit; must be columnar (queried, constrained, partition keys).
  4. SQL query projection — keys used for filter/join/sweep/lineage (status, source_doc_ref, idempotency_key, rollback_key, manifest_id, change_set_id, entry_id, emitted_at, changed_at). Indexed before scale (scale-review A.3).
  5. JSONB payload — flexible, non-queried-yet bodies (payload, change_diff, source_span, findings, reviewer_identity, payload_envelope, payload_summary, candidate_edges, report_summary, *_state_snapshot). Schemaless by intent; normalize per §4.
  6. Vector payloadnone in v0.4. Future: embeddings of canonical IU text. Stored outside the SSOT; derived; rebuildable; never FK'd from SQL.
  7. Blob / object pointer — text pointers to external content (mirror_path; source_doc_ref as IU/doc lineage pointer). The pointer is SQL-authoritative; the pointee lives in the doc/object store.

3. Why identity/lifecycle/governance stay in SQL (non-negotiable)

  • Append-only + CAS give a tamper-evident, race-safe ledger that a document/vector store cannot guarantee (no atomic compare-and-set across the lifecycle, no FK, weak audit).
  • DOT-pair signature chain (prior_signature_id) is a per-lane hash chain whose integrity depends on relational ordering + FK.
  • Idempotency requires a UNIQUE constraint (server-enforced) — only SQL provides it transactionally with the write.
  • Replay/automation under at-least-once delivery requires deterministic keys + transactional dedupe — SQL only.

4. JSONB → normalize: the rule and the queue

Keep a field JSONB until a concrete need arises; then normalize. Normalize when ANY of:

  • (a) filtered/joined on the hot path or at scale (index opacity),
  • (b) needs FK integrity to another entity,
  • (c) drives a governance/control-flow decision,
  • (d) routinely aggregated/reported.

Normalization queue (priority order, design-time):

  1. payload.idempotency_key → indexed scalar (expression UNIQUE index, or generated stored column) — required before scale; MARK currently O(N) (scale-review A.3). First to graduate.
  2. manifest_unit_block.source_span → columns iff span-range queried (canonical-address resolution at scale).
  3. verify_result.findings / review_decision.findings → child table iff drift/governance analytics become routine.
  4. Others stay JSONB until a query need is proven (avoid premature schema churn / migration noise).

Each normalization is its own additive, GPT-gated DDL+code step — never bundled silently into the binding cycle.

5. Information-unit lifecycle ↔ storage map

source IU (iu_ref)
  └─ decision_backlog_entry        [SQL identity+lifecycle]   payload(JSONB: signal, idempotency_key*)
     └─ manifest_envelope          [SQL]  source_doc_ref→IU lineage (SB-DEC-1)
        └─ manifest_unit_block(s)  [SQL]  proposed_canonical_address (result-unit address)
                                          source_span(JSONB)
        └─ review_decision         [SQL governance]  findings/reviewer_identity(JSONB)
           └─ cut_change_set       [SQL]  idempotency_key(UNIQUE), payload_summary(JSONB: content_hash)
              └─ cut_change_set_affected_row(s)  [SQL]  before/after_state_snapshot(JSONB)
              └─ dot_pair_signature (DOT-991)    [SQL hash-chain]  payload_envelope(JSONB)
        └─ verify_result           [SQL]  + dot_pair_signature (DOT-992)   → status=verified_complete
  (decision_backlog_history append-only at every transition — SQL audit spine)
  (canonical_address_alias — DEFERRED OD-2, 0 rows; future alias/supersession workstream)
  (* idempotency_key graduates JSONB→indexed scalar before scale)
  • canonical_address relationship: the source IU address is implicit in iu_ref/source_doc_ref; the result unit address is manifest_unit_block.proposed_canonical_address (SB-DEC-2). Aliasing/rename across addresses = deferred (OD-2).
  • Supersession: write-once SQL stamps (review_decision.superseded_by_*, manifest_envelope.superseded_by_envelope_id) + forward-compensating cut_change_set — no physical delete, append-only preserved.

6. Automation/hybrid invariants the binding must not break

  • SQL transaction boundary owns the phase; no write to doc/vector store inside the governance txn (those are async, post-commit, derived projections — out of v0.4 scope, named only).
  • Deterministic keys (idempotency/rollback) ⇒ safe under queue retries.
  • Vector/doc projections are eventually-consistent caches reconciled from SQL; SQL never reads them for a governance decision.
  • No hardcoded collection/endpoint/DSN — all config (non-hardcode review B).

7. Verdict (this doc)

  • SQL is and remains SSOT for identity, lifecycle, governance, audit, review/cut/verify, idempotency. Confirmed.
  • JSONB stays JSONB for v0.4 + dry-run; a documented normalization queue exists, led by the idempotency key (required before scale, not before dry-run).
  • Vector store = search acceleration only, derived & rebuildable, never authority; introduced later, config-driven.
  • The schema binding is forward-compatible with scale + automation provided the code cycle adopts the centralized vocabulary module, server-side idempotency lookup, and config-driven knobs (scale-review F). No change to the SQL SSOT principle; no structural migration for the dry-run.
Back to Knowledge Hub knowledge/dev/laws/dieu44-trien-khai/v0.4-schema-binding/dot-iu-cutter-v0.4-sql-nosql-hybrid-information-unit-strategy-2026-05-17.md