KB-7F01

23-P2 — IU Edit/Merge Runtime Inspection Report

16 min read Revision 1
pack-23p2inspectionread-onlyreporteditmergemulti-axis

23-P2 — IU Edit/Merge Runtime Inspection Report

inspection_status: COMPLETE queries_run: 33 / queries_failed: 0 Date: 2026-05-06 Source log: VPS /tmp/23-p2.20260506-113044.log (PG container postgres, db directus) Scope: READ-ONLY. No DDL/DML/function/trigger/vector/cleanup performed.


§1 Schema [IU + UV]

1.1 information_unit (15 cols)

PK id uuid (default gen_random_uuid()), canonical_address text NOT NULL UNIQUE, unit_kind text NOT NULL, lifecycle_status text NOT NULL DEFAULT 'draft', content_anchor_ref text NULLABLE, version_anchor_ref uuid NULLABLE, owner_ref text NOT NULL, parent_or_container_ref uuid NULLABLE, conformance_status text NOT NULL DEFAULT 'open', identity_profile jsonb NOT NULL DEFAULT '{}', audit cols (created_at/updated_at/created_by/updated_by), deleted_at nullable.

Constraints: fk_iu_version_anchor FK → unit_version(id) DEFERRABLE INITIALLY DEFERRED; PK; trg_iu_birth_gate_layer2 (trigger constraint, deferrable); UNIQUE(canonical_address).

Indexes: btree on canonical/lifecycle/unit_kind; partial btree on parent_or_container_ref WHERE NOT NULL; GIN on identity_profile; PK + UNIQUE(canonical_address).

1.4 unit_version (9 cols)

PK id uuid, unit_id uuid NOT NULL (FK → IU, NOT deferrable), body text NOT NULL, content_hash text NOT NULL, version_seq integer NOT NULL, lifecycle_status text NOT NULL DEFAULT 'draft', content_profile jsonb NOT NULL DEFAULT '{}', created_at, created_by text NOT NULL.

Constraints: PK; FK(unit_id); UNIQUE(unit_id, version_seq) as uq_unit_version_seq.

Indexes: btree(content_hash), btree(unit_id), PK, UNIQUE(unit_id, version_seq). No GIN on content_profile.

1.7 version_seq properties

NOT NULL = t, has_unique = t (composite), has_pk = f. → version_seq is mandatory and uniqueness is enforced per unit.

1.8 IU anchor columns

content_anchor_ref text nullable; version_anchor_ref uuid nullable. Both populated post-INSERT (deferred FK pattern).

1.9 Lifecycle/proposal columns on UV

Only lifecycle_status exists. Missing: review_state, base_version_ref, status, review_actor, merged_at. → No proposal/review/merge state schema exists yet.


§2 Functions

Function Present Purpose
fn_iu_create Canonical creator: validates input → preflight → vocab resolve → advisory lock by canonical_address → classify_existing → INSERT IU+UV(seq=1) → set anchors → invariants verify. Sets app.canonical_writer='fn_iu_create'.
fn_iu_create_plan Read-only plan/dry-run companion. Returns status ∈ {plan_ok, invalid_input, unresolved_vocab, invalid_publication_type, preflight_failed, exists_*}.
fn_iu_verify_invariants Checks i1..i5 (iu_exists, uv_linked, anchors_exact, birth_exists, uv_birth_ok). Detects missing_version, duplicate_version, uv_strategy_changed.
fn_content_hash encode(digest(p_body,'sha256'),'hex') — deterministic SHA-256.
fn_iu_gateway_write_guard Trigger guard. Reads iu_create.gateway.marker_key/marker_value from dot_config. Compares current_setting(marker_key,true) to marker_value via EXACT MATCH (v_cur=v_mv). Allows when match, else RAISE EXCEPTION.

2.6 Edit/merge/proposal/version functions

Only gen_code_table_proposals exists (unrelated, code-table generator). No edit/merge/proposal/versioning functions for IU/UV exist. Greenfield.


§3 Data Shape

  • IU count = 5; UV count = 5. 1:1 ratio.
  • All IUs have exactly one UV with version_seq=1 (min=max=1 across all). No IU has been edited.
  • All UVs were created by fn_iu_create (births only), agents agent:p2b-p1, agent:p3-p2-test, agent:p2-test, agent:p3-p1-test.
  • Two UVs share content hash 55d3b3126c5ae3... (same body "Content by P3-P2." inserted under different canonical addresses) — deduplication is not enforced by hash; cross-IU collisions are allowed.

§4 Parent/Child + Order

  • All 5 IUs have parent_or_container_ref = NULL (flat — no hierarchy yet).
  • unit_kind uniformly design_doc_section.
  • identity_profile keys present: title, primary_section_type_ref, owner_lookup_ref (5 of 5, all string typed).
  • No sort_order / display_order column on IU. Containment ordering (for document assembly) currently has no native primitive on IU — must come from edge metadata or future column.

§5 Metadata Evolution / Schema-Evolution Capacity

  • JSONB columns: information_unit.identity_profile (default {}, NOT NULL), unit_version.content_profile (default {}, NOT NULL).
  • GIN indexes on JSONB: only idx_iu_identity_profile on IU. unit_version.content_profile has no GIN — query 5.5 returned 0 rows for IU+UV combined; explicit cross-check on 1.6 confirms.
  • dot_config iu_* keys are gateway-only (9 keys, all iu_create.gateway.*). No iu_* config keys for vocab governance, profile schemas, allowed metadata keys, or migration policy.
  • identity_profile value-type distribution: 100% strings across the 3 keys observed (n=5). No nested/array values yet — schema is currently homogeneous and shallow.

Capacity: JSONB on both IU & UV provides unbounded forward extensibility without DDL. GIN on identity_profile allows containment-based filters at scale; content_profile lacks GIN — fine while UV count is small but a future bottleneck if content_profile becomes a search axis. No governance/vocab table exists for evolving metadata keys.


§6 Multi-Axis Composition / Traceability

6.1 universal_edges (25 cols)

Rich edge model. Notable cols: source_collection/source_id/source_code/source_composition_level, mirror for target, edge_type/edge_subtype, weight numeric, source_info (NOT NULL, provenance), is_auto_managed, symmetry_group_id uuid, metadata jsonb, valid_from/valid_to/valid_time tstzrange, status, _dot_origin, confidence numeric, provenance jsonb.

6.2 Edge-type usage

edge_type count
USES 1,486
BELONGS_TO 431
CONTAINS 282

Naming convention is UPPERCASE. Query 6.5 filtered on lowercase tokens ('contains','belongs_to',...) per the prompt and therefore returned 0 rows — this is a filter-vs-data convention mismatch, not a missing-data finding. Same for 6.6.

6.3 Order/sort/priority columns on edges

None. No *_order, *_sort, *_priority, *_seq columns. Ordering for document assembly must live in metadata JSONB or be added later.

6.4 JSONB on edges

Two: metadata, provenance. Available for storing order, role, axis labels, etc. without DDL.

6.5/6.6 Governance/CONTAINS-BELONGS_TO collection-pair distribution

Returned 0 rows due to lowercase filter. Re-running with uppercase would yield distributions for CONTAINS/BELONGS_TO (282 + 431 rows) — not re-executed in this read-only run; flagged for follow-up if needed.

Composition primitives present:

  • Structural containment: information_unit.parent_or_container_ref (currently unused, all NULL); universal_edges CONTAINS/BELONGS_TO types in active use across other collections.
  • Cross-domain traceability: universal_edges with arbitrary edge_type/edge_subtype, weighted, time-bounded (valid_from/valid_to/valid_time), and JSONB metadata. Strongly capable.
  • Composition level on both source and target — supports multi-level assembly semantics.

§7 Gateway

7.1 dot_config keys (9, all iu_create.gateway.*)

  • mode = enforced
  • marker_key = app.canonical_writer, marker_value = fn_iu_create
  • canonical_function = public.fn_iu_create(text,text,text,text,text,text,text,text,uuid)
  • plan_function = public.fn_iu_create_plan(text,text,text,text,text,text,text,text,uuid)
  • direct_insert_policy = block_after_guard
  • exempt_policy = none_active
  • policy_doc_path / readme_path set.

7.2 Marker check type

EXACT_MATCH (v_cur=v_mv). The guard accepts exactly one canonical writer string. Allow-list (multi-writer) not implemented.


§8 Proposal Model Feasibility (ADVISORY)

Model Feasible now? Blockers Required changes Scale-safe?
A. UV stores proposals, consumes version_seq ⚠ Partial UNIQUE(unit_id, version_seq) means proposals burn seq numbers; gaps on rejection. Gateway exact-match + block_after_guard blocks any non-fn_iu_create UV writer; need new canonical fn. Add lifecycle_status value proposed; add canonical writer (e.g. fn_iu_propose) and either widen guard to allow-list or rewrite as single multi-mode fn. Index on (unit_id, lifecycle_status) for proposal lookup. ⚠ Seq gaps + per-IU contention as proposals scale. Many proposals/IU = many indexed rows but bounded.
B. UV stores proposals, seq assigned at merge ✅ Feasible Current UNIQUE(unit_id, version_seq) is NOT NULL — proposals can't sit with NULL seq. Make version_seq nullable OR move to merge-time assignment via partial unique WHERE merged. Add merged_at, base_version_ref, proposal_status. Allow-list or expand canonical fn. ✅ Best for many-proposal/IU. Indexed lookups by status easy.
C. Separate unit_proposal table ✅ Feasible Greenfield (no proposal infra) — clean room. Gateway is single-target so no spillover. New table mirroring UV cols + base_version_ref, review_state, FK to IU. New canonical fns. UV stays merged-truth-only. ✅ Strongest separation. Independent indexing/retention.
D. Hybrid (proposal table for review, UV on merge) ✅ Feasible Same as C plus merge orchestration. C plus a merge fn that copies proposal → UV, sets version_seq via MAX+1, updates anchors. ✅ Same as C; explicit merge boundary is operationally clearest.

rev3 scale-safety answers:

  • Many proposals/IU: Models B/C/D scale; Model A burns seq numbers. With current dataset (5 UVs, all seq=1), no measurable load — choice is architectural, not performance-driven.
  • JSONB metadata long-term: identity_profile GIN-indexed, governance-free. Stable for Phase 1; will need a vocab/governance layer (none exists in dot_config) once keys diversify.
  • Future indexes (advisory only, do NOT create): (unit_id, lifecycle_status) on UV for any model storing proposals on UV; GIN on unit_version.content_profile if it becomes a query axis; partial unique on (unit_id, version_seq) WHERE lifecycle_status='merged' for Model B.

§9 Composition Axis Assessment (ADVISORY)

Axis Example Current primitive Feasible now? Missing
Document assembly doc → sections, gộp lại = văn bản gốc parent_or_container_ref (all NULL today); no sort_order col on IU ⚠ Schema present but unused; ordering missing Sort/order column or stored as identity_profile.sort_order / edge metadata; convention not yet established
Workflow assembly process → steps Same parent_or_container_ref ⚠ Same as document axis Same — and a unit_kind value to differentiate workflow vs doc
Domain traceability law → design → procedure → code → test → report universal_edges (25 cols, USES/BELONGS_TO/CONTAINS in active use; metadata/provenance JSONB; valid_time range; weight; symmetry_group_id) Strong — rich edge model already operating at scale (1486+431+282 edges) Edge types for domain relations not yet seeded for IU sources/targets; convention/vocab decisions
Release/render bundle selected slices → artifact TBD — no unit_bundle/render_set table observed ❌ Not present Bundle table or edge-typed grouping (e.g. BUNDLED_IN) + manifest semantics

User directive confirmed:

  • Document/workflow axis = parent_or_container_ref (structural containment) — schema present, unused, ordering primitive missing.
  • Domain axis = universal_edges typed relations — fully capable, in active use elsewhere, ready for IU adoption.
  • Two axes are structurally distinct in the schema; no need to gom thành parent/child.

§10 Gateway Allow-list + Metadata Evolution (ADVISORY)

Gateway:

  • Current guard is EXACT_MATCH on a single marker_value. Allow-list requires either: (a) widen fn_iu_gateway_write_guard to read a CSV/array config and use ANY(string_to_array(...)), or (b) keep exact-match and route every new writer through one umbrella fn that re-sets the marker. Option (a) is cleaner for diverse writers (create/edit/merge); option (b) keeps guard simpler but couples writers.
  • Adding a second canonical fn (e.g. fn_iu_propose, fn_iu_merge) under exact-match means each must set_config('app.canonical_writer','fn_iu_create',true) — semantically wrong (lying about identity). Allow-list is the right move before edit/merge land.

Metadata evolution:

  • JSONB on IU+UV is safe for Phase 1: shallow, all-string values today, GIN on IU side.
  • Governance gaps to plan for (not blocking Phase 1):
    • No dot_config namespace for allowed identity_profile / content_profile keys.
    • No vocab table for axis labels in universal_edges.metadata.
    • No GIN on content_profile — add when first JSONB filter on UV appears.

  1. Choose proposal model: lean Model C (separate unit_proposal table) or Model D (hybrid w/ explicit merge). Rationale: greenfield (no proposal infra to migrate), cleanest invariants, no version_seq ambiguity, easy retention. Model B is a valid alternative if keeping a single audit chain on UV is preferred.
  2. Gateway allow-list first: rewrite fn_iu_gateway_write_guard to read a config-driven allow-list (iu_create.gateway.marker_valueiu_create.gateway.marker_values CSV; or new key iu_gateway.allowed_writers). Keep EXACT_MATCH semantics per-writer; just permit set membership. Necessary before any second canonical fn.
  3. Schema additions (Phase 1, advisory only): unit_version.lifecycle_status already exists — extend allowed values; if Model C, define unit_proposal mirroring UV + base_version_ref, review_state, proposed_by, proposed_at, decided_by, decided_at, decision_reason. Add edge type SUPERSEDES (domain axis) for cross-version traceability.
  4. Ordering primitive for document axis: decide convention — column on IU vs identity_profile.sort_order vs universal_edges.metadata->'order' on a CONTAINS edge. Pick before parent/child gets populated to avoid migration.
  5. Index plan (deferred, do not create yet): GIN on unit_version.content_profile, btree (unit_id, lifecycle_status) on UV, partial unique on (unit_id, version_seq) WHERE lifecycle_status='merged' if Model B chosen.
  6. Metadata governance (defer to Phase 2): add dot_config keys defining allowed profile keys + types; introduce a vocab table for edge axis labels.
  7. Verify-invariants extension: when proposal/merge land, extend fn_iu_verify_invariants with i6 (proposal integrity) and i7 (no orphan proposals).

Notes / Caveats

  • Query 6.5 and 6.6 returned 0 rows because the prompt's IN (...) filter used lowercase tokens while live data uses UPPERCASE edge types (USES, BELONGS_TO, CONTAINS). Aggregate counts were captured by 6.2. Distribution-by-collection-pair was not re-queried (read-only run, prompt followed verbatim) — recommend a follow-up SELECT with upper()/uppercase filter if collection-pair detail is needed for 23-P3 design.
  • All 5 IUs are pilot/test rows. No production IU edits/merges have happened — model choice is unconstrained by historical data.
  • VPS log retained at /tmp/23-p2.20260506-113044.log (host 38.242.240.89).

23-P2 rev3 inspection — read-only — 2026-05-06.

Back to Knowledge Hub knowledge/dev/laws/dieu44-trien-khai/reports/23-p2-iu-edit-merge-runtime-inspection-report.md