dot-iu-cutter v0.4 — LedgerWriter Schema-Binding DESIGN Report

Date: 2026-05-17 · Phase: v0.4 LedgerWriter Schema-Binding DESIGN ONLY. Nothing executed, no code, no commit, no provisioning, no dry-run, no production connection beyond read-only catalog/schema inspection, no secret/.env read.

Revision 2 (2026-05-17). GPT review of the r1 package: schema-binding narrow blocker = PASS-candidate; code authoring NOT allowed yet; r2/addendum required because r1 solved schema binding but did not yet cover the User scale / automation / non-hardcode / SQL-NoSQL-hybrid mandate. r2 adds two docs (…-scale-automation-nonhardcode-review-…, …-sql-nosql-hybrid-information-unit-strategy-…) and revises the final verdict (§3/§5 below). No design decision from r1 is reversed; the binding is unchanged and now shown to be scale/automation forward-compatible with an expanded (still column-level) code scope and additive-index-only scale provisioning.

1. Deliverables (9 docs total, `knowledge/dev/laws/dieu44-trien-khai/v0.4-schema-binding/`)

Doc	Rev	Content
`…-ledgerwriter-schema-gap-analysis-…`	r1	Root cause; 12-writer MATCH/MISMATCH (3/9); representability ⇒ no migration; lineage gap
`…-ledgerwriter-per-writer-mapping-design-…`	r1	Per-writer A/B reconciliation; constants + SB-DEC-1..6
`…-state-history-and-sweep-mapping-design-…`	r1	`append_history`/`append_sweep_log`/CAS deep-dive; count invariance
`…-mark-review-cut-verify-schema-binding-plan-…`	r1	Per-phase FK order; matrix == r3
`…-pg-backed-test-revision-plan-…`	r1	Schema-contract tests; r3 preserved (no r4); optional G-26
`…-schema-binding-risk-and-code-change-plan-…`	r1	Bounded code surface; no migration; risk STANDARD
`…-scale-automation-nonhardcode-review-…`	r1 (new)	Scale A, non-hardcode B, hybrid C, IU-centric D, automation E
`…-sql-nosql-hybrid-information-unit-strategy-…`	r1 (new)	SQL SSOT, JSONB normalization queue, vector=acceleration, IU lifecycle map
`…-ledgerwriter-schema-binding-report-…`	r2 (this)	Verdict incl. scale/automation/non-hardcode/hybrid

2. Coverage of GPT's required points (r1 + r2)

A/B/C (r1): per-writer reconciliation (12 writers), code-change flags, SQL ops, principal, txn/rollback, invariants, no migration.
D (r1): schema-contract tests; r3 row-count baseline preserved.
Scale A · Non-hardcode B · Hybrid C · IU-centric D · Automation E (r2): in the two new docs — write amplification (~10+U+A rows/IU; 1M IUs ⇒ 15M–1B+), fastest growers (manifest_unit_block, cut_change_set_affected_row, decision_backlog_history), pre-scale index list, deterministic keyset sweep cursor, bounded retry/idempotency, archival boundary; every mapping constant classified (protocol / config / schema-contract value / derived) with reject-list cleared (no IP/DSN/password/batch/collection literals); SQL=SSOT for identity/lifecycle/governance/audit/review/cut/verify/idempotency; JSONB normalization rule + queue (idempotency key first); vector store = rebuildable acceleration only; IU-centric writer map; automation readiness (resumable status, CAS concurrency guard, no manual runtime SQL, redacted logs, deferred queue contract).

3. Final verdict & answers (revised by r2)

Is the code patch still sufficient? Yes for correctness, but its scope EXPANDS (still column-level binding — no flow/principal/isolation/state-machine change, db_adapter.py untouched). The next code-authoring cycle must additionally: (1) centralize all binding vocabulary/sentinels in one module (non-hardcode); (2) make mark() idempotency lookup server-side filtered (today it is O(N) full-scan — a scale blocker); (3) make the sweep cursor a config-driven deterministic keyset scan; (4) add schema-contract tests covering columns + vocabulary.
Does schema migration remain unnecessary? For the PG-backed dry-run: YES — no structural migration. For production scale: an additive, index-only DDL cycle is required (FK/lookup indexes per scale-review A.3, plus one expression/generated index for the MARK idempotency key). Index-only, CREATE INDEX CONCURRENTLY, no column/constraint/structure change to existing cutter_governance semantics — and not a prerequisite for the single-IU dry-run.
Are indexes required before scale? YES (before scale, not before dry-run): decision_backlog_history(entry_id[,changed_at]), cut_change_set(decision_backlog_entry_id), cut_change_set_affected_row(change_set_id), verify_result(change_set_id,prior_verify_result_id), review_decision(manifest_id,prior_review_decision_id), manifest_envelope(source_doc_ref), dot_pair_signature(prior_signature_id), partial decision_backlog_entry(status), and the idempotency-key index.
Should JSONB stay JSONB or be normalized? Stay JSONB for v0.4/dry-run. Normalize on the documented rule (queried-at-scale / FK-needed / decision-driving / aggregated). Priority-1 graduation: payload.idempotency_key → indexed scalar before scale. Others remain JSONB until a query need is proven.
Can the PG-backed dry-run resume after the code patch? YES — after the (expanded) code cycle PASSes, resume with command-review r1 + verification-plan r3 unchanged (single-IU canonical, count-invariant; no index/migration needed for the dry-run itself).
Exact next code-authoring scope: cutter_agent/ledger.py — rebind the 9 MISMATCH row-builders; new cutter_agent/schema_binding.py (or equivalent) — centralized vocabulary/sentinel constants + lane/kind maps + deterministic key builders, all config-/version-derived; cutter_agent/phases.py — thread SB-DEC-5/6 args, replace InMemory _source_entry with the SB-DEC-1 real-schema lineage join, make mark() idempotency lookup server-side filtered, config-driven keyset sweep cursor; tests/ — static schema-contract fixture + per-writer contract tests + vocabulary-registry test + targeted InMemory-fixture updates. No change to db_adapter.py, state-machine, idempotency, signing, signal, cli. No cutter_governance structural migration. Indexes = separate later GPT-gated index-only DDL cycle (pre-scale, not pre-dry-run).
Git SSOT proof: Branch main; /opt/incomex/dot HEAD = 56d3732cb74d07546c938242180a434ed1067a9a (accepted, unchanged); git status --short -- iu-cutter = empty. No code change this phase ⇒ no commit needed. Tests 92/92 at 56d3732.

4. Boundaries honoured

No code change · no commit · no dry-run · no env provision · no production connection except read-only catalog/information_schema/PK-FK inspection · no production row read beyond schema metadata · no secret/.env read · no deploy · no self-advance. Read-only grounding: full 12-table DDL + NOT-NULL/defaults + PK/FK/UNIQUE; accepted code at 56d3732. PROD system_identifier 7611578671664259111 and prod container untouched; 3 protected prior dry-run envs untouched.

5. Next gate

GPT review of this 9-doc package (r1 + r2 addendum). On PASS, open the LedgerWriter schema-binding code-authoring cycle (separate, GPT-gated, scope = §3 "exact next code-authoring scope"). The pre-scale index-only DDL cycle and any JSONB normalization are further separate GPT-gated cycles, not prerequisites for the dry-run. PG-backed dry-run remains BLOCKED until the code cycle PASSes; then resume with command-review r1 + verification-plan r3 unchanged. No self-advance.