P3D Pack 1 Phase 4C — Read-Only/Dry-Run Species Mapping + QT-001 Mission Prompt (rev3)

Date: 2026-05-11 | Mode: READ-ONLY / DRY-RUN — no writes, no seeds, no backfill Style: MISSION PROMPT — scope + preflight + goals + deterministic criteria + scale guards Legal alignment: Điều 0-B, Điều 0-G, Điều 29, QT-001, QT-005 Reference snapshots: Phase 4B discovery report (labels only, re-verify live before use)

0. SCOPE BLOCK (declared once — Pack 1 only)

These are the ONLY literal identifiers in this prompt. They are scope definitions from approved Pack 1 design artifacts:

target_collection_primary       = 'information_unit'
target_collection_subordinate   = 'unit_version'
target_unit_kind_committed      = 'law_unit'
discriminator_candidate_column  = 'unit_kind'
target_vocab_key                = 'vocab.unit_kind.law_unit'

No other literal decision values appear as selectors or decision criteria in this prompt. Semantic field labels such as species_code, composition_level, management_mode, and governance_role may appear only as candidate field names to verify via introspection; if absent, report FIELD_ABSENT and do not substitute from memory. All actual values MUST be derived from live PG at execution time.

1. GATE-0 — Scope preflight (STOP if any fails)

Before any goal, verify that scope constants are valid against live PG. If any check fails → STOP, report which check failed, do not proceed.

Check	What to verify	Source
P1	Table `target_collection_primary` exists in PG	`pg_class WHERE relname = <target> AND relkind = 'r'`
P2	Table `target_collection_subordinate` exists in PG	same
P3	`collection_registry` has a row for `target_collection_primary`	`SELECT count(*) FROM collection_registry WHERE collection_name = <target>`
P4	`collection_registry` has a row for `target_collection_subordinate`	same
P5	Column `discriminator_candidate_column` exists on `target_collection_primary`	`information_schema.columns WHERE table_name = <target> AND column_name = <disc_col>`
P6	`dot_config` has key matching `target_vocab_key`	`SELECT count(*) FROM dot_config WHERE key = <vocab_key>`

All 6 must pass. Report each result. If P5 fails → discriminator strategy collapses (non-fatal for other goals, but mark discriminator goals as BLOCKED).

2. AUTHORITATIVE SOURCES

Class of value	Live source
Species properties (any column)	`public.entity_species` — introspect columns first, use `SELECT *` for data
Collection→species mappings	`public.species_collection_map` — introspect columns first
Birth function logic	`pg_get_functiondef` for `fn_birth_registry_auto`
Birth registry state	`public.birth_registry` — introspect columns first
Collection governance properties	`public.collection_registry` — introspect columns first
Composition allowed values	`SELECT DISTINCT composition_level FROM entity_species` (live, not listed)
Management mode values	`SELECT DISTINCT management_mode FROM entity_species` (live)
Governance role values	`SELECT DISTINCT governance_role FROM collection_registry` (live)
Birth code strategy values	`SELECT DISTINCT birth_code_strategy FROM collection_registry` (live)

Column name rule: Before querying any table with specific column names, introspect that table's columns via information_schema.columns. If a column is absent → skip any query depending on it + report FIELD_ABSENT:<table>.<column>.

3. GOALS (G1–G10)

G1 — Discriminator usage in production

Source: species_collection_map introspected columns.
Selector: Introspect column names first. Then for each of the discriminator-candidate columns (discriminator_field, discriminator_value, discriminator_config — verify existence before querying), count rows where that column IS NOT NULL.
Skip/STOP: If discriminator columns don't exist in the table → report FIELD_ABSENT for each, conclude discriminator_schema_support=false.
Output: Per discriminator column: exists (yes/no), non-NULL count, first 5 sample rows with non-NULL values. If all 0 → discriminator_production_active=false.

G2 — fn_birth_registry_auto discriminator handling

Source: pg_get_functiondef for function fn_birth_registry_auto in public schema.
Selector: Search function source text for any of the discriminator column names found in G1 (or the candidate names if G1 found FIELD_ABSENT). Also search for references to NEW.<discriminator_candidate_column> or similar record-field access.
Skip/STOP: If function not found → STOP, report blocker.
Output: Function source excerpt (relevant lines) + conclusion: uses_discriminators=true|false + reads_record_fields_beyond_code=true|false.

G3 — Complete species landscape

Source: entity_species introspected columns.
Selector: SELECT * FROM entity_species ORDER BY 1 (all rows, all columns — NO assumed column names).
Output:
- Full entity_species dump (all columns as they exist).
- Semantic field presence map: For each column present, classify as IDENTITY (id/code/name), CLASSIFICATION (species_code/composition/management), HIERARCHY (parent/depth), OTHER. Agent derives this from column names + data types — does NOT assume a fixed field list.
- Total count (snapshot label).
- Aggregation: for each column that appears to be a categorical classification field (non-unique, non-numeric), show distinct values + counts. Agent determines which columns are categorical from the data, not from prompt instruction.

G4 — Collection-registry context for target collections

Source: collection_registry introspected columns.
Selector: WHERE collection_name IN (target_collection_primary, target_collection_subordinate).
Output: Full rows verbatim (SELECT *). Agent highlights any column that is NULL or absent.

G5 — Birth-registry state for target collections

Source: birth_registry introspected columns.
Selector: WHERE collection_name IN (target_collection_primary, target_collection_subordinate).
Output: Count per target. Count of rows where species_code column (if exists) IS NULL. First 20 rows verbatim.

G6 — Discriminator column candidate verification on target

Source: information_schema.columns + live data from target_collection_primary.
Selector: Confirm column discriminator_candidate_column exists on target_collection_primary (already checked in GATE-0 P5). Then query distinct values of that column with counts.
Skip/STOP: If P5 failed → skip G6, mark discriminator strategy as BLOCKED.
Output: Distinct values + counts from the discriminator candidate column. These are the population of possible discriminator_value entries for future species_collection_map rows.

Source: collection_registry.
Selector:
1. Query the exact birth_code_strategy value of target_collection_subordinate from collection_registry. Call this v_sub_strategy.
2. Find ALL other collections with the same exact birth_code_strategy = v_sub_strategy.
3. For each of those collections, check if they have a species_collection_map row.
No fuzzy matching. No "subordinate-like." Only exact value match.
Skip/STOP: If birth_code_strategy column doesn't exist on collection_registry → FIELD_ABSENT, skip.
Output: v_sub_strategy value + count of collections with that exact strategy + count of those with species mapping + first 10 sample rows.

G8 — Dry-run: candidate species options with deterministic labels

For EACH species from G3, compute compatibility with target_collection_primary:

Source: G3 species data + G4 collection_registry data.
Compatibility criteria (deterministic):

Label	Condition
`PLAUSIBLE`	Species has a management_mode column value that matches the governance_role value of target_collection_primary (both from live queries). Both values exist and are equal or compatible per Điều 29 (observed↔observed, governed↔governed).
`CONFLICT`	Both values exist but DON'T match (e.g., species management_mode=governed, target governance_role=observed).
`UNKNOWN`	Either the management_mode column or the governance_role value is NULL, absent, or not found.
`N/A`	Species or collection_registry row is missing required field for comparison — report `FIELD_ABSENT`.

Output: Per species: all columns from G3 + proposed_composition (from the species's composition_level field, if field exists) + compatibility_label (PLAUSIBLE/CONFLICT/UNKNOWN/N/A) + compatibility_reason.

Label EVERY row: candidate_not_approved.

G9 — QT-001 backfill dry-run per PLAUSIBLE candidate

For each species labelled PLAUSIBLE in G8:

Source: birth_registry rows for target_collection_primary where species_code IS NULL (from G5).
Computation: For each such row, project: entity_code, current_species_code (NULL), proposed_species_code, current_composition_level, proposed_composition_level.
Proposed values come from: the candidate species's species_code + entity_species.composition_level for that species.
If no PLAUSIBLE candidates exist: Report "0 PLAUSIBLE candidates. QT-001 dry-run skipped." Also dry-run for the first 3 CONFLICT candidates with explicit note: conflict_dryrun_for_information_only.
Output: Per PLAUSIBLE candidate: count of rows that would be updated + first 5 rows showing current→proposed.

Label EVERY result: candidate_not_approved.

G10 — TAC unit_kind distribution snapshot

Source: tac_logical_unit introspected columns.
Selector: Introspect whether tac_logical_unit has a column matching discriminator_candidate_column (i.e., unit_kind). If not, try common alternatives (introspect all columns, look for categorical text columns with few distinct values).
Skip/STOP: If no suitable column found → report FIELD_ABSENT, skip. This means TAC data cannot inform discriminator strategy.
Output: Distinct values + counts of the discovered column. Flag which value matches target_unit_kind_committed.

4. ACCEPTANCE CRITERIA (PASS/FAIL)

#	Criterion	PASS condition
1	GATE-0 all 6 preflight checks reported	Each check shows PASS or FAIL with evidence
2	Goals G1-G10 all attempted	Each reports evidence, FIELD_ABSENT skip, or explicit BLOCKED reason
3	Zero writes	Agent confirms no INSERT/UPDATE/DELETE/DDL/function-patch
4	No species/field names from memory	All names in output come from live PG queries
5	Column introspection before schema-sensitive queries	Every table introspected; FIELD_ABSENT reported if column missing
6	Phase 4B snapshots re-verified	Agent re-queries any Phase 4B finding before using as fact
7	All candidate outputs labelled	`candidate_not_approved` on every dry-run row
8	G8 uses deterministic labels	Every species gets exactly one of: PLAUSIBLE / CONFLICT / UNKNOWN / N/A
9	G7 uses exact strategy match	No "subordinate-like" — only exact `birth_code_strategy` value
10	Scale brittleness self-audit passed	§6 below

5. CONSTRAINTS

No writes. Read-only. Dry-run only.
No literal decision values beyond §0 scope block. Semantic column labels may be used only as candidate fields after introspection. All actual values derive live.
No assumed column existence. Introspect first. SELECT * for data dumps. FIELD_ABSENT for missing columns.
No fuzzy selectors. No "-like", "similar to", "equivalent" language for matching. Exact values only.
No subjective plausibility. Use deterministic labels (§G8 criteria table).
Re-verify snapshots. Phase 4B labels are starting points only.
Mission-style. Agent writes its own queries to satisfy goals and constraints.

6. SCALE BRITTLENESS SELF-AUDIT

Before submitting report, agent confirms:

Check	Required answer
Did I list any species/composition/role value from memory?	No — all from live queries.
Did I list any column name as "required" without verifying it exists?	No — introspected or SELECT *.
Did I use any "-like" or semantic fuzzy match for selectors?	No — exact values only.
Did I assume fixed counts?	No — all counts are live snapshots.
Did I assume fixed field lists for output shapes?	No — used SELECT * or introspected.
Would my queries adapt if a column was renamed?	Yes — I'd report FIELD_ABSENT and skip.
Would my queries adapt if 50 new species were added?	Yes — G3 fetches all with SELECT *.
Did I report Phase 4B labels as facts without re-verification?	No.
Did I apply subjective judgment in G8/G9 labels?	No — used deterministic criteria only.

7. REPORT REQUIREMENTS

Save to: knowledge/dev/laws/dieu44-trien-khai/reports/p3d-pack1-phase4c-species-mapping-dryrun-report.md

Must include:

RUN_MARKER.
GATE-0 preflight results (6 checks).
SCOPE BLOCK echoed from §0.
G1-G10 results with evidence, labels, FIELD_ABSENT notes.
Column introspection results per table (what columns exist, what was FIELD_ABSENT).
Scale brittleness self-audit (§6 answers).
Phase 4B re-verification: which findings still hold, which differ.
Status flags:

phase4c_dryrun_status=PASS|PARTIAL|BLOCKED
no_mutation_performed=true
gate0_all_passed=true|false
discriminator_runtime_active=true|false
discriminator_column_exists_on_target=true|false
species_decision_locked=false
composition_decision_locked=false
plausible_candidate_count=<N>
conflict_candidate_count=<N>
unknown_candidate_count=<N>
scale_brittleness_self_audit_passed=true|false

8. OUT OF SCOPE

Do not propose function patches.
Do not seed species_collection_map.
Do not backfill birth_registry.
Do not dispatch nested agents.
Do not use fuzzy ILIKE for selector logic.
Do not list field names from memory — introspect live.
Do not apply subjective labels — use deterministic criteria.