03 fn_iu_cut_preflight_validate — Design
03 — fn_iu_cut_preflight_validate Design
Signature
public.fn_iu_cut_preflight_validate(
p_staging_record_id uuid,
p_source_hash text DEFAULT NULL,
p_actor text DEFAULT 'fn_iu_cut_preflight_validate'
) RETURNS jsonb
LANGUAGE plpgsql
STABLE
STABLE because the function performs only SELECTs on iu_core.iu_staging_record, iu_core.iu_staging_payload, public.dot_config, public.information_unit. No INSERT/UPDATE/DELETE anywhere in the body. Therefore safe to call from any caller, including STABLE views or read-only sessions.
md5(pg_get_functiondef) = 914e26d61de0de914408af5cdc679c07.
Invariants (proven by mig 055 + Phase F)
- No IU creation. Body does not reference
public.information_unitfor writes. - No staging mutation. Body does not reference
iu_core.iu_staging_*for writes. - No event emission. Body does not reference
event_outbox/job_queue/cut_request_signal. - No Qdrant. No mention of
iu_vector_sync_pointor vector tables. - No production_documents. Table is absent in the schema; not referenced.
- Idempotent. Returns same verdict for same inputs within a transaction (STABLE).
- Lifecycle-agnostic. Returns a verdict for
pending_review,approved,consumed,rejected,expired,cleaned— caller decides what to do with it. Reports the lifecycle aslifecycle_status_at_checkfor forensic clarity.
Return shape
{
"ok": <boolean>, // overall verdict
"verdict": "approved" | "rejected" | "not_found",
"axis_a_ok": <boolean>,
"axis_b_ok": <boolean>,
"axis_c_ok": <boolean>,
"axis_d_ok": <boolean>,
"cut_readiness_ok": <boolean>, // = axes A..D AND gates E1..E7 AND optional E0
"problems": [<indexed-string>...],
"counts": {
"pieces": <int>,
"unit_kind_in_vocab": <int>,
"unit_kind_missing": <int>,
"section_type_in_vocab": <int>,
"address_collisions": <int>,
"publication_type_bad": <int>,
"title_bad": <int>,
"local_piece_id_unique": <int>,
"local_piece_id_total": <int>
},
"gates": {
"e1_section_ok": <boolean>,
"e2_addr_ok": <boolean>,
"e3_pubtype_ok": <boolean>,
"e4_title_ok": <boolean>,
"e5_local_uniq_ok":<boolean>,
"e6_digest_ok": <boolean>,
"e7_coverage_ok": <boolean>
},
"staging_record_id": <uuid>,
"lifecycle_status_at_check": <text>,
"checked_at": <timestamptz>,
"checked_by": <text>
}
Check semantics
Axis A — dense source_position from 1
positions := array_agg((p->>'source_position')::int)
v_axis_a_ok := (min(positions)=1) AND (max-min+1-len=0)
Refusal regex: piece without ^[0-9]+$ source_position is short-circuited with an "Axis A" problem before the cast.
Axis B — piece_role + section_type non-empty
NULL or btrim()='' fails. Vocab existence is the E1 gate (separate).
Axis C — parent_local_id resolves within manifest
Anti-join check: for every piece with a non-null parent_local_id, the value must equal some sibling's local_piece_id.
Axis D — unit_kind ∈ vocab.unit_kind.*
EXISTS (SELECT 1 FROM dot_config WHERE key = 'vocab.unit_kind.'||piece.unit_kind)
Two counters: unit_kind_missing (NULL/empty) and unit_kind_in_vocab (live key).
E1 — section_type ∈ vocab.section_type.*
Same pattern as Axis D, with vocab.section_type. prefix.
E2 — canonical_address collision detection
SELECT count(*) FROM jsonb_array_elements(pieces) p
JOIN information_unit iu ON iu.canonical_address = btrim(p->>'canonical_address')
If >0, fn_iu_create's fn_iu_classify_existing would return a non-created status at CUT time, and fn_iu_cut_from_manifest would RAISE EXCEPTION mid-loop.
E3 — publication_type valid if present
Counted only when piece supplies a non-empty publication_type. Optional field.
E4 — title derivable
Mirrors fn_iu_cut_from_manifest L123:
COALESCE(NULLIF(split_part(content_text, E'\n', 1), ''),
NULLIF(btrim(canonical_address), ''))
If both are empty/null, title would be NULL at CUT and fn_iu_create would RAISE title required.
E5 — local_piece_id unique
Compares count(*) vs count(DISTINCT btrim(local_piece_id)). Catches duplicates that fn_cut_mark_staged_file already enforces; restated here so preflight is the verdict authority.
E6 — manifest_digest regex
^[0-9a-f]{32}$ (same as fn_iu_cut_from_manifest G5).
E7 — coverage_proof balance
coverage_proof.covered_bytes = manifest.source_bytes (bigint compare).
E0 — source_hash match (optional)
Only invoked when p_source_hash IS NOT NULL. Compares against manifest.source_hash.
Composition
cut_readiness_ok := axis_a_ok AND axis_b_ok AND axis_c_ok AND axis_d_ok
AND e1_section_ok AND e2_addr_ok AND e3_pubtype_ok
AND e4_title_ok AND e5_local_uniq_ok
AND e6_digest_ok AND e7_coverage_ok
ok := cut_readiness_ok AND (problems array empty)
Error-strings are indexed
Every problems[i] begins with Axis X: or EN: so callers / dashboards / KB searches can group by gate without parsing.
Concurrency
The function takes no explicit locks. The reads are best-effort snapshots; race conditions are tolerated because callers re-invoke at apply-time inside the same transaction as the mutation (fn_iu_verify_mark apply path uses a fresh preflight result and then commits).
Not in scope
- Composer gate (runtime).
- Approval completeness (set when verify_mark applies).
- Body invariants (post-insert; require IU rows).
- piece_role vocab (vocabulary is empty today; reserved for future E8).