KB-1370 rev 5

P3D Pack 1 Phase 4C — Read-Only/Dry-Run Species Mapping + QT-001 Mission Prompt (rev3)

15 min read Revision 5
p3dpack1phase4cspecies-mappingdry-runmission-promptrev3field-adaptivedeterministic

P3D Pack 1 Phase 4C — Read-Only/Dry-Run Species Mapping + QT-001 Mission Prompt (rev3)

Date: 2026-05-11 | Mode: READ-ONLY / DRY-RUN — no writes, no seeds, no backfill Style: MISSION PROMPT — scope + preflight + goals + deterministic criteria + scale guards Legal alignment: Điều 0-B, Điều 0-G, Điều 29, QT-001, QT-005 Reference snapshots: Phase 4B discovery report (labels only, re-verify live before use)


0. SCOPE BLOCK (declared once — Pack 1 only)

These are the ONLY literal identifiers in this prompt. They are scope definitions from approved Pack 1 design artifacts:

target_collection_primary       = 'information_unit'
target_collection_subordinate   = 'unit_version'
target_unit_kind_committed      = 'law_unit'
discriminator_candidate_column  = 'unit_kind'
target_vocab_key                = 'vocab.unit_kind.law_unit'

No other literal decision values appear as selectors or decision criteria in this prompt. Semantic field labels such as species_code, composition_level, management_mode, and governance_role may appear only as candidate field names to verify via introspection; if absent, report FIELD_ABSENT and do not substitute from memory. All actual values MUST be derived from live PG at execution time.


1. GATE-0 — Scope preflight (STOP if any fails)

Before any goal, verify that scope constants are valid against live PG. If any check fails → STOP, report which check failed, do not proceed.

Check What to verify Source
P1 Table target_collection_primary exists in PG pg_class WHERE relname = <target> AND relkind = 'r'
P2 Table target_collection_subordinate exists in PG same
P3 collection_registry has a row for target_collection_primary SELECT count(*) FROM collection_registry WHERE collection_name = <target>
P4 collection_registry has a row for target_collection_subordinate same
P5 Column discriminator_candidate_column exists on target_collection_primary information_schema.columns WHERE table_name = <target> AND column_name = <disc_col>
P6 dot_config has key matching target_vocab_key SELECT count(*) FROM dot_config WHERE key = <vocab_key>

All 6 must pass. Report each result. If P5 fails → discriminator strategy collapses (non-fatal for other goals, but mark discriminator goals as BLOCKED).


2. AUTHORITATIVE SOURCES

Class of value Live source
Species properties (any column) public.entity_species — introspect columns first, use SELECT * for data
Collection→species mappings public.species_collection_map — introspect columns first
Birth function logic pg_get_functiondef for fn_birth_registry_auto
Birth registry state public.birth_registry — introspect columns first
Collection governance properties public.collection_registry — introspect columns first
Composition allowed values SELECT DISTINCT composition_level FROM entity_species (live, not listed)
Management mode values SELECT DISTINCT management_mode FROM entity_species (live)
Governance role values SELECT DISTINCT governance_role FROM collection_registry (live)
Birth code strategy values SELECT DISTINCT birth_code_strategy FROM collection_registry (live)

Column name rule: Before querying any table with specific column names, introspect that table's columns via information_schema.columns. If a column is absent → skip any query depending on it + report FIELD_ABSENT:<table>.<column>.


3. GOALS (G1–G10)

G1 — Discriminator usage in production

  • Source: species_collection_map introspected columns.
  • Selector: Introspect column names first. Then for each of the discriminator-candidate columns (discriminator_field, discriminator_value, discriminator_config — verify existence before querying), count rows where that column IS NOT NULL.
  • Skip/STOP: If discriminator columns don't exist in the table → report FIELD_ABSENT for each, conclude discriminator_schema_support=false.
  • Output: Per discriminator column: exists (yes/no), non-NULL count, first 5 sample rows with non-NULL values. If all 0 → discriminator_production_active=false.

G2 — fn_birth_registry_auto discriminator handling

  • Source: pg_get_functiondef for function fn_birth_registry_auto in public schema.
  • Selector: Search function source text for any of the discriminator column names found in G1 (or the candidate names if G1 found FIELD_ABSENT). Also search for references to NEW.<discriminator_candidate_column> or similar record-field access.
  • Skip/STOP: If function not found → STOP, report blocker.
  • Output: Function source excerpt (relevant lines) + conclusion: uses_discriminators=true|false + reads_record_fields_beyond_code=true|false.

G3 — Complete species landscape

  • Source: entity_species introspected columns.
  • Selector: SELECT * FROM entity_species ORDER BY 1 (all rows, all columns — NO assumed column names).
  • Output:
    • Full entity_species dump (all columns as they exist).
    • Semantic field presence map: For each column present, classify as IDENTITY (id/code/name), CLASSIFICATION (species_code/composition/management), HIERARCHY (parent/depth), OTHER. Agent derives this from column names + data types — does NOT assume a fixed field list.
    • Total count (snapshot label).
    • Aggregation: for each column that appears to be a categorical classification field (non-unique, non-numeric), show distinct values + counts. Agent determines which columns are categorical from the data, not from prompt instruction.

G4 — Collection-registry context for target collections

  • Source: collection_registry introspected columns.
  • Selector: WHERE collection_name IN (target_collection_primary, target_collection_subordinate).
  • Output: Full rows verbatim (SELECT *). Agent highlights any column that is NULL or absent.

G5 — Birth-registry state for target collections

  • Source: birth_registry introspected columns.
  • Selector: WHERE collection_name IN (target_collection_primary, target_collection_subordinate).
  • Output: Count per target. Count of rows where species_code column (if exists) IS NULL. First 20 rows verbatim.

G6 — Discriminator column candidate verification on target

  • Source: information_schema.columns + live data from target_collection_primary.
  • Selector: Confirm column discriminator_candidate_column exists on target_collection_primary (already checked in GATE-0 P5). Then query distinct values of that column with counts.
  • Skip/STOP: If P5 failed → skip G6, mark discriminator strategy as BLOCKED.
  • Output: Distinct values + counts from the discriminator candidate column. These are the population of possible discriminator_value entries for future species_collection_map rows.

G7 — Collections sharing exact birth strategy with subordinate target

  • Source: collection_registry.
  • Selector:
    1. Query the exact birth_code_strategy value of target_collection_subordinate from collection_registry. Call this v_sub_strategy.
    2. Find ALL other collections with the same exact birth_code_strategy = v_sub_strategy.
    3. For each of those collections, check if they have a species_collection_map row.
  • No fuzzy matching. No "subordinate-like." Only exact value match.
  • Skip/STOP: If birth_code_strategy column doesn't exist on collection_registry → FIELD_ABSENT, skip.
  • Output: v_sub_strategy value + count of collections with that exact strategy + count of those with species mapping + first 10 sample rows.

G8 — Dry-run: candidate species options with deterministic labels

For EACH species from G3, compute compatibility with target_collection_primary:

  • Source: G3 species data + G4 collection_registry data.
  • Compatibility criteria (deterministic):
Label Condition
PLAUSIBLE Species has a management_mode column value that matches the governance_role value of target_collection_primary (both from live queries). Both values exist and are equal or compatible per Điều 29 (observed↔observed, governed↔governed).
CONFLICT Both values exist but DON'T match (e.g., species management_mode=governed, target governance_role=observed).
UNKNOWN Either the management_mode column or the governance_role value is NULL, absent, or not found.
N/A Species or collection_registry row is missing required field for comparison — report FIELD_ABSENT.
  • Output: Per species: all columns from G3 + proposed_composition (from the species's composition_level field, if field exists) + compatibility_label (PLAUSIBLE/CONFLICT/UNKNOWN/N/A) + compatibility_reason.

Label EVERY row: candidate_not_approved.

G9 — QT-001 backfill dry-run per PLAUSIBLE candidate

For each species labelled PLAUSIBLE in G8:

  • Source: birth_registry rows for target_collection_primary where species_code IS NULL (from G5).
  • Computation: For each such row, project: entity_code, current_species_code (NULL), proposed_species_code, current_composition_level, proposed_composition_level.
  • Proposed values come from: the candidate species's species_code + entity_species.composition_level for that species.
  • If no PLAUSIBLE candidates exist: Report "0 PLAUSIBLE candidates. QT-001 dry-run skipped." Also dry-run for the first 3 CONFLICT candidates with explicit note: conflict_dryrun_for_information_only.
  • Output: Per PLAUSIBLE candidate: count of rows that would be updated + first 5 rows showing current→proposed.

Label EVERY result: candidate_not_approved.

G10 — TAC unit_kind distribution snapshot

  • Source: tac_logical_unit introspected columns.
  • Selector: Introspect whether tac_logical_unit has a column matching discriminator_candidate_column (i.e., unit_kind). If not, try common alternatives (introspect all columns, look for categorical text columns with few distinct values).
  • Skip/STOP: If no suitable column found → report FIELD_ABSENT, skip. This means TAC data cannot inform discriminator strategy.
  • Output: Distinct values + counts of the discovered column. Flag which value matches target_unit_kind_committed.

4. ACCEPTANCE CRITERIA (PASS/FAIL)

# Criterion PASS condition
1 GATE-0 all 6 preflight checks reported Each check shows PASS or FAIL with evidence
2 Goals G1-G10 all attempted Each reports evidence, FIELD_ABSENT skip, or explicit BLOCKED reason
3 Zero writes Agent confirms no INSERT/UPDATE/DELETE/DDL/function-patch
4 No species/field names from memory All names in output come from live PG queries
5 Column introspection before schema-sensitive queries Every table introspected; FIELD_ABSENT reported if column missing
6 Phase 4B snapshots re-verified Agent re-queries any Phase 4B finding before using as fact
7 All candidate outputs labelled candidate_not_approved on every dry-run row
8 G8 uses deterministic labels Every species gets exactly one of: PLAUSIBLE / CONFLICT / UNKNOWN / N/A
9 G7 uses exact strategy match No "subordinate-like" — only exact birth_code_strategy value
10 Scale brittleness self-audit passed §6 below

5. CONSTRAINTS

  • No writes. Read-only. Dry-run only.
  • No literal decision values beyond §0 scope block. Semantic column labels may be used only as candidate fields after introspection. All actual values derive live.
  • No assumed column existence. Introspect first. SELECT * for data dumps. FIELD_ABSENT for missing columns.
  • No fuzzy selectors. No "-like", "similar to", "equivalent" language for matching. Exact values only.
  • No subjective plausibility. Use deterministic labels (§G8 criteria table).
  • Re-verify snapshots. Phase 4B labels are starting points only.
  • Mission-style. Agent writes its own queries to satisfy goals and constraints.

6. SCALE BRITTLENESS SELF-AUDIT

Before submitting report, agent confirms:

Check Required answer
Did I list any species/composition/role value from memory? No — all from live queries.
Did I list any column name as "required" without verifying it exists? No — introspected or SELECT *.
Did I use any "-like" or semantic fuzzy match for selectors? No — exact values only.
Did I assume fixed counts? No — all counts are live snapshots.
Did I assume fixed field lists for output shapes? No — used SELECT * or introspected.
Would my queries adapt if a column was renamed? Yes — I'd report FIELD_ABSENT and skip.
Would my queries adapt if 50 new species were added? Yes — G3 fetches all with SELECT *.
Did I report Phase 4B labels as facts without re-verification? No.
Did I apply subjective judgment in G8/G9 labels? No — used deterministic criteria only.

7. REPORT REQUIREMENTS

Save to: knowledge/dev/laws/dieu44-trien-khai/reports/p3d-pack1-phase4c-species-mapping-dryrun-report.md

Must include:

  1. RUN_MARKER.
  2. GATE-0 preflight results (6 checks).
  3. SCOPE BLOCK echoed from §0.
  4. G1-G10 results with evidence, labels, FIELD_ABSENT notes.
  5. Column introspection results per table (what columns exist, what was FIELD_ABSENT).
  6. Scale brittleness self-audit (§6 answers).
  7. Phase 4B re-verification: which findings still hold, which differ.
  8. Status flags:
phase4c_dryrun_status=PASS|PARTIAL|BLOCKED
no_mutation_performed=true
gate0_all_passed=true|false
discriminator_runtime_active=true|false
discriminator_column_exists_on_target=true|false
species_decision_locked=false
composition_decision_locked=false
plausible_candidate_count=<N>
conflict_candidate_count=<N>
unknown_candidate_count=<N>
scale_brittleness_self_audit_passed=true|false

8. OUT OF SCOPE

  • Do not propose function patches.
  • Do not seed species_collection_map.
  • Do not backfill birth_registry.
  • Do not dispatch nested agents.
  • Do not use fuzzy ILIKE for selector logic.
  • Do not list field names from memory — introspect live.
  • Do not apply subjective labels — use deterministic criteria.

Phase 4C Mission Prompt rev3 | GATE-0 preflight | Field-adaptive SELECT * | Deterministic labels | No fuzzy | No field hardcode | 2026-05-11

Back to Knowledge Hub knowledge/dev/laws/dieu44-trien-khai/prompts/p3d-pack1-phase4c-readonly-dryrun-species-mapping-qt001-prompt.md