KB-4007

03 fn_iu_cut_preflight_validate — Design

6 min read Revision 1
dieu44preflightdesignfunction2026-05-27

03 — fn_iu_cut_preflight_validate Design

Signature

public.fn_iu_cut_preflight_validate(
    p_staging_record_id uuid,
    p_source_hash       text DEFAULT NULL,
    p_actor             text DEFAULT 'fn_iu_cut_preflight_validate'
) RETURNS jsonb
LANGUAGE plpgsql
STABLE

STABLE because the function performs only SELECTs on iu_core.iu_staging_record, iu_core.iu_staging_payload, public.dot_config, public.information_unit. No INSERT/UPDATE/DELETE anywhere in the body. Therefore safe to call from any caller, including STABLE views or read-only sessions.

md5(pg_get_functiondef) = 914e26d61de0de914408af5cdc679c07.

Invariants (proven by mig 055 + Phase F)

  1. No IU creation. Body does not reference public.information_unit for writes.
  2. No staging mutation. Body does not reference iu_core.iu_staging_* for writes.
  3. No event emission. Body does not reference event_outbox / job_queue / cut_request_signal.
  4. No Qdrant. No mention of iu_vector_sync_point or vector tables.
  5. No production_documents. Table is absent in the schema; not referenced.
  6. Idempotent. Returns same verdict for same inputs within a transaction (STABLE).
  7. Lifecycle-agnostic. Returns a verdict for pending_review, approved, consumed, rejected, expired, cleaned — caller decides what to do with it. Reports the lifecycle as lifecycle_status_at_check for forensic clarity.

Return shape

{
  "ok": <boolean>,                   // overall verdict
  "verdict": "approved" | "rejected" | "not_found",
  "axis_a_ok": <boolean>,
  "axis_b_ok": <boolean>,
  "axis_c_ok": <boolean>,
  "axis_d_ok": <boolean>,
  "cut_readiness_ok": <boolean>,    // = axes A..D AND gates E1..E7 AND optional E0
  "problems": [<indexed-string>...],
  "counts": {
    "pieces": <int>,
    "unit_kind_in_vocab": <int>,
    "unit_kind_missing": <int>,
    "section_type_in_vocab": <int>,
    "address_collisions": <int>,
    "publication_type_bad": <int>,
    "title_bad": <int>,
    "local_piece_id_unique": <int>,
    "local_piece_id_total": <int>
  },
  "gates": {
    "e1_section_ok":   <boolean>,
    "e2_addr_ok":      <boolean>,
    "e3_pubtype_ok":   <boolean>,
    "e4_title_ok":     <boolean>,
    "e5_local_uniq_ok":<boolean>,
    "e6_digest_ok":    <boolean>,
    "e7_coverage_ok":  <boolean>
  },
  "staging_record_id":          <uuid>,
  "lifecycle_status_at_check":  <text>,
  "checked_at":                 <timestamptz>,
  "checked_by":                 <text>
}

Check semantics

Axis A — dense source_position from 1

positions := array_agg((p->>'source_position')::int)
v_axis_a_ok := (min(positions)=1) AND (max-min+1-len=0)

Refusal regex: piece without ^[0-9]+$ source_position is short-circuited with an "Axis A" problem before the cast.

Axis B — piece_role + section_type non-empty

NULL or btrim()='' fails. Vocab existence is the E1 gate (separate).

Axis C — parent_local_id resolves within manifest

Anti-join check: for every piece with a non-null parent_local_id, the value must equal some sibling's local_piece_id.

Axis D — unit_kind ∈ vocab.unit_kind.*

EXISTS (SELECT 1 FROM dot_config WHERE key = 'vocab.unit_kind.'||piece.unit_kind)

Two counters: unit_kind_missing (NULL/empty) and unit_kind_in_vocab (live key).

E1 — section_type ∈ vocab.section_type.*

Same pattern as Axis D, with vocab.section_type. prefix.

E2 — canonical_address collision detection

SELECT count(*) FROM jsonb_array_elements(pieces) p
JOIN information_unit iu ON iu.canonical_address = btrim(p->>'canonical_address')

If >0, fn_iu_create's fn_iu_classify_existing would return a non-created status at CUT time, and fn_iu_cut_from_manifest would RAISE EXCEPTION mid-loop.

E3 — publication_type valid if present

Counted only when piece supplies a non-empty publication_type. Optional field.

E4 — title derivable

Mirrors fn_iu_cut_from_manifest L123:

COALESCE(NULLIF(split_part(content_text, E'\n', 1), ''),
         NULLIF(btrim(canonical_address), ''))

If both are empty/null, title would be NULL at CUT and fn_iu_create would RAISE title required.

E5 — local_piece_id unique

Compares count(*) vs count(DISTINCT btrim(local_piece_id)). Catches duplicates that fn_cut_mark_staged_file already enforces; restated here so preflight is the verdict authority.

E6 — manifest_digest regex

^[0-9a-f]{32}$ (same as fn_iu_cut_from_manifest G5).

E7 — coverage_proof balance

coverage_proof.covered_bytes = manifest.source_bytes (bigint compare).

E0 — source_hash match (optional)

Only invoked when p_source_hash IS NOT NULL. Compares against manifest.source_hash.

Composition

cut_readiness_ok := axis_a_ok AND axis_b_ok AND axis_c_ok AND axis_d_ok
                AND e1_section_ok AND e2_addr_ok AND e3_pubtype_ok
                AND e4_title_ok AND e5_local_uniq_ok
                AND e6_digest_ok AND e7_coverage_ok
ok := cut_readiness_ok AND (problems array empty)

Error-strings are indexed

Every problems[i] begins with Axis X: or EN: so callers / dashboards / KB searches can group by gate without parsing.

Concurrency

The function takes no explicit locks. The reads are best-effort snapshots; race conditions are tolerated because callers re-invoke at apply-time inside the same transaction as the mutation (fn_iu_verify_mark apply path uses a fresh preflight result and then commits).

Not in scope

  • Composer gate (runtime).
  • Approval completeness (set when verify_mark applies).
  • Body invariants (post-insert; require IU rows).
  • piece_role vocab (vocabulary is empty today; reserved for future E8).
Back to Knowledge Hub knowledge/dev/laws/dieu44-trien-khai/v0.6-iu-cut-verify-mark-cut-readiness-gate/03-cut-preflight-validator-design.md