P3D Pack 1 Phase 4B — Read-Only Species/Composition/Registry Discovery Prompt (rev5)
P3D Pack 1 Phase 4B — Read-Only Species/Composition/Registry Discovery Prompt (rev6)
Date: 2026-05-11 | Status: READ-ONLY DISCOVERY — no INSERT, no UPDATE, no DDL Legal alignment: Serves compliance matrix in
p3d-pack1-phase4b-legal-alignment-addendum.md. Discoveries inform decisions under Điều 0-B (composition), Điều 0-G (birth), Điều 29 (one-system), QT-001/QT-005. Introspect-first: Every schema-sensitive query is gated by column detection. No assumed column names. Graceful degradation: Every sub-query runs independently. Failure → log + continue. No cascade abort. All counts are SNAPSHOTS. No fuzzy matching for production policy.
0. Preflight
set -euo pipefail
TS=$(date +%Y%m%d-%H%M%S)
RUN_MARKER="p3d-phase4b-discovery-${TS}"
LOG="/tmp/${RUN_MARKER}.log"
exec > >(tee -a "$LOG") 2>&1
PG_CONTAINER="${PG_CONTAINER:-postgres}"
PG_DB="${PG_DB:-directus}"
PG_USER="${PG_USER:-directus}"
PSQL_CMD=(docker exec -i "$PG_CONTAINER" psql -U "$PG_USER" -d "$PG_DB")
# Note: ON_ERROR_STOP is NOT set globally. Each sub-query handles its own errors.
# This is intentional: graceful degradation requires individual failure handling.
echo "=== PHASE 4B DISCOVERY rev6 — RUN=$RUN_MARKER ==="
echo "=== READ-ONLY: no mutations ==="
# Helper: run a single query, log result or failure
run_query() {
local label="$1"; shift
echo ""
echo "======== $label ========"
"${PSQL_CMD[@]}" "$@" 2>&1 || echo ">>> $label FAILED — see error above, continuing <<<"
}
# Helper: check if a column exists in a table (returns 0=exists, 1=not found)
col_exists() {
local tbl="$1" col="$2"
"${PSQL_CMD[@]}" -t -A -c "
SELECT 1 FROM information_schema.columns
WHERE table_schema='public' AND table_name='$tbl' AND column_name='$col';
" 2>/dev/null | grep -q 1
}
# Helper: detect which column from a candidate list exists.
# ALWAYS returns exit code 0 (even if no column found) because set -e is active.
# Empty stdout = no matching column found. Non-empty stdout = detected column name.
detect_col() {
local tbl="$1"; shift
for candidate in "$@"; do
if col_exists "$tbl" "$candidate"; then
echo "$candidate"
return 0
fi
done
echo ""
return 0 # NOT return 1 — set -e would abort on VAR=$(detect_col ...) if return 1
}
1. D0 — Schema introspection (CORE — if this fails entirely, STOP)
for TBL in entity_species species_collection_map collection_registry birth_registry entity_labels universal_edges; do
run_query "D0: $TBL columns" -c "
SELECT column_name, data_type, is_nullable, column_default
FROM information_schema.columns
WHERE table_schema='public' AND table_name='$TBL'
ORDER BY ordinal_position;"
done
# D0 CHECK constraints
for TBL in entity_species species_collection_map birth_registry; do
run_query "D0: $TBL CHECK constraints" -c "
SELECT conname, pg_get_constraintdef(oid) AS def
FROM pg_constraint
WHERE conrelid = 'public.$TBL'::regclass AND contype = 'c'
ORDER BY conname;"
done
2. D1 — Full entity_species catalog
# D1.1: all species (SELECT * — no column assumption)
run_query "D1.1: entity_species full catalog" -c "SELECT * FROM public.entity_species ORDER BY 1;"
# D1.2: count (snapshot)
run_query "D1.2: entity_species count" -c "SELECT count(*) AS total_species FROM public.entity_species;"
# D1.3: composition_level distribution (gate column first)
if col_exists entity_species composition_level; then
run_query "D1.3: composition_level distribution" -c "
SELECT composition_level, count(*) AS species_count
FROM public.entity_species GROUP BY composition_level ORDER BY composition_level;"
else
echo "D1.3 SKIPPED: entity_species.composition_level not found"
fi
# D1.4: species tree (gate parent/depth columns)
PARENT_SP=$(detect_col entity_species parent_id parent_ref parent_species_id)
DEPTH_SP=$(detect_col entity_species depth tree_depth level)
if [ -n "$PARENT_SP" ] && [ -n "$DEPTH_SP" ]; then
run_query "D1.4: species tree (parent=$PARENT_SP, depth=$DEPTH_SP)" -c "
SELECT id, species_code, $PARENT_SP AS parent, $DEPTH_SP AS depth
FROM public.entity_species ORDER BY $DEPTH_SP, $PARENT_SP NULLS FIRST, species_code;"
elif [ -n "$PARENT_SP" ]; then
run_query "D1.4: species tree (parent=$PARENT_SP, depth=unknown)" -c "
SELECT id, species_code, $PARENT_SP AS parent
FROM public.entity_species ORDER BY $PARENT_SP NULLS FIRST, species_code;"
else
echo "D1.4 SKIPPED: no parent column found in entity_species (tried: parent_id, parent_ref, parent_species_id)"
fi
3. D2 — Species-collection mapping landscape
# D2.1: all mappings (SELECT *)
run_query "D2.1: species_collection_map full" -c "SELECT * FROM public.species_collection_map ORDER BY 1;"
# D2.2: count
run_query "D2.2: species_collection_map count" -c "SELECT count(*) AS total_mappings FROM public.species_collection_map;"
# D2.3: species distribution in mapping (gate species_code)
if col_exists species_collection_map species_code; then
run_query "D2.3: species distribution" -c "
SELECT species_code, count(*) AS collections_mapped
FROM public.species_collection_map GROUP BY species_code ORDER BY species_code;"
else
echo "D2.3 SKIPPED: species_collection_map.species_code not found"
fi
# D2.4: composition distribution in mapping (gate composition_level)
if col_exists species_collection_map composition_level; then
run_query "D2.4: composition distribution" -c "
SELECT composition_level, count(*) AS collections_at_level
FROM public.species_collection_map GROUP BY composition_level ORDER BY composition_level;"
else
echo "D2.4 SKIPPED: species_collection_map.composition_level not found"
fi
# D2.5: collections without species mapping (gate collection_name in both tables)
SCM_COLNAME=$(detect_col species_collection_map collection_name name table_name)
CR_COLNAME=$(detect_col collection_registry collection_name name table_name)
if [ -n "$SCM_COLNAME" ] && [ -n "$CR_COLNAME" ]; then
run_query "D2.5: collections WITHOUT species mapping" -c "
SELECT cr.$CR_COLNAME, cr.*
FROM public.collection_registry cr
WHERE NOT EXISTS (
SELECT 1 FROM public.species_collection_map scm
WHERE scm.$SCM_COLNAME = cr.$CR_COLNAME
)
ORDER BY cr.$CR_COLNAME;"
else
echo "D2.5 SKIPPED: could not detect collection_name column in species_collection_map or collection_registry"
fi
# D2.6: IU/UV/TAC in species_collection_map
if [ -n "$SCM_COLNAME" ]; then
run_query "D2.6: IU/UV/TAC species mapping" -c "
SELECT * FROM public.species_collection_map
WHERE $SCM_COLNAME IN ('information_unit','unit_version',
'tac_logical_unit','tac_unit_version','tac_publication')
ORDER BY $SCM_COLNAME;"
else
echo "D2.6 SKIPPED: collection_name column not found"
fi
4. D3 — Collection registry IU/UV/TAC state
# D3.1: IU/UV/TAC rows (gate collection_name)
if [ -n "$CR_COLNAME" ]; then
run_query "D3.1: collection_registry IU/UV/TAC" -c "
SELECT * FROM public.collection_registry
WHERE $CR_COLNAME IN ('information_unit','unit_version',
'tac_logical_unit','tac_unit_version','tac_publication')
ORDER BY $CR_COLNAME;"
else
# Fallback: dump all and let agent filter
run_query "D3.1 FALLBACK: collection_registry full dump" -c "SELECT * FROM public.collection_registry ORDER BY 1;"
fi
# D3.2: governance_role distribution (gate column)
GOV_ROLE=$(detect_col collection_registry governance_role role)
if [ -n "$GOV_ROLE" ]; then
run_query "D3.2: governance_role distribution" -c "
SELECT $GOV_ROLE, count(*) AS collections
FROM public.collection_registry GROUP BY $GOV_ROLE ORDER BY $GOV_ROLE;"
else
echo "D3.2 SKIPPED: governance_role column not found"
fi
# D3.3: governed collections with species (reference pattern)
SP_CODE_CR=$(detect_col collection_registry species_code)
if [ -n "$GOV_ROLE" ] && [ -n "$SP_CODE_CR" ]; then
run_query "D3.3: governed+species reference pattern (LIMIT 30)" -c "
SELECT * FROM public.collection_registry
WHERE $GOV_ROLE = 'governed' AND $SP_CODE_CR IS NOT NULL
ORDER BY $CR_COLNAME LIMIT 30;"
else
echo "D3.3 SKIPPED: governance_role or species_code column not found"
fi
5. D4 — Birth registry IU/UV state
# D4 gate: detect collection_name in birth_registry
BR_COLNAME=$(detect_col birth_registry collection_name)
BR_SPECIES=$(detect_col birth_registry species_code)
# D4.1: birth_registry for information_unit
if [ -n "$BR_COLNAME" ]; then
run_query "D4.1: birth_registry information_unit" -c "
SELECT * FROM public.birth_registry WHERE $BR_COLNAME = 'information_unit' ORDER BY 1;"
run_query "D4.2: birth_registry unit_version" -c "
SELECT * FROM public.birth_registry WHERE $BR_COLNAME = 'unit_version' ORDER BY 1;"
else
echo "D4.1/D4.2 SKIPPED: birth_registry.collection_name not found"
fi
# D4.3: species/composition distribution
if [ -n "$BR_SPECIES" ]; then
BR_COMP=$(detect_col birth_registry composition_level)
run_query "D4.3: birth species/composition distribution (LIMIT 50)" -c "
SELECT species_code, ${BR_COMP:-'NULL AS composition_level'}, count(*) AS births
FROM public.birth_registry WHERE species_code IS NOT NULL
GROUP BY 1, 2 ORDER BY 1, 2 LIMIT 50;"
run_query "D4.4: births with NULL species" -c "
SELECT count(*) AS null_species_births FROM public.birth_registry WHERE species_code IS NULL;"
if [ -n "$BR_COLNAME" ]; then
run_query "D4.5: NULL species by collection" -c "
SELECT $BR_COLNAME, count(*) AS null_births
FROM public.birth_registry WHERE species_code IS NULL
GROUP BY $BR_COLNAME ORDER BY null_births DESC;"
fi
else
echo "D4.3-D4.5 SKIPPED: birth_registry.species_code not found"
fi
6. D5 — Entity labels and universal edges
# D5.1: entity_labels for IU/UV (gate entity_code)
EL_CODE=$(detect_col entity_labels entity_code code entity_ref)
if [ -n "$EL_CODE" ]; then
run_query "D5.1: entity_labels IU/UV count" -c "
SELECT count(*) AS iu_labels FROM public.entity_labels
WHERE $EL_CODE LIKE 'information_unit::%' OR $EL_CODE LIKE 'unit_version::%';"
else
echo "D5.1 SKIPPED: entity_labels code column not found (tried: entity_code, code, entity_ref)"
fi
# D5.2: universal_edges for IU/UV (detect columns first)
UE_SRC=$(detect_col universal_edges source_collection source_table from_collection)
UE_TGT=$(detect_col universal_edges target_collection target_table to_collection)
if [ -n "$UE_SRC" ] && [ -n "$UE_TGT" ]; then
run_query "D5.2: universal_edges IU/UV (src=$UE_SRC, tgt=$UE_TGT)" -c "
SELECT count(*) AS iu_edges FROM public.universal_edges
WHERE $UE_SRC IN ('information_unit','unit_version')
OR $UE_TGT IN ('information_unit','unit_version');"
else
echo "D5.2: source/target columns not detected — using FALLBACK jsonb scan"
run_query "D5.2 FALLBACK: universal_edges IU mention scan" -c "
SELECT count(*) AS iu_edge_mentions FROM public.universal_edges
WHERE to_jsonb(universal_edges)::text ILIKE '%information_unit%';"
fi
# D5.3: label facets in use (gate facet_code)
EL_FACET=$(detect_col entity_labels facet_code facet label_facet)
if [ -n "$EL_FACET" ]; then
run_query "D5.3: label facets in use (top 20)" -c "
SELECT $EL_FACET, count(*) AS usage FROM public.entity_labels
GROUP BY $EL_FACET ORDER BY usage DESC LIMIT 20;"
else
echo "D5.3 SKIPPED: facet column not found"
fi
# D5.4: edge types in use (gate edge_type)
UE_TYPE=$(detect_col universal_edges edge_type relation_type type)
if [ -n "$UE_TYPE" ]; then
run_query "D5.4: edge types in use (top 20)" -c "
SELECT $UE_TYPE, count(*) AS usage FROM public.universal_edges
GROUP BY $UE_TYPE ORDER BY usage DESC LIMIT 20;"
else
echo "D5.4 SKIPPED: edge_type column not found"
fi
# D5.5: exemplar governed collection with full pipeline
if [ -n "$GOV_ROLE" ] && [ -n "$SP_CODE_CR" ] && [ -n "$SCM_COLNAME" ] && [ -n "$EL_CODE" ] && [ -n "$UE_SRC" ]; then
run_query "D5.5: exemplar governed+species+labels+edges collection" <<SQL
WITH candidates AS (
SELECT cr.$CR_COLNAME AS coll
FROM public.collection_registry cr
JOIN public.species_collection_map scm ON scm.$SCM_COLNAME = cr.$CR_COLNAME
WHERE cr.$GOV_ROLE = 'governed'
AND EXISTS (SELECT 1 FROM public.entity_labels el WHERE el.$EL_CODE LIKE cr.$CR_COLNAME || '::%')
AND EXISTS (SELECT 1 FROM public.universal_edges ue WHERE ue.$UE_SRC = cr.$CR_COLNAME)
LIMIT 1
)
SELECT 'exemplar_collection=' || coll FROM candidates;
SQL
else
echo "D5.5 SKIPPED: prerequisite columns not detected for exemplar query"
fi
7. D6 — fn_birth_registry_auto source inspection
run_query "D6.1: fn_birth_registry_auto source" -c "
SELECT pg_get_functiondef(oid) AS source FROM pg_proc
WHERE proname = 'fn_birth_registry_auto' AND pronamespace = 'public'::regnamespace;"
Agent: from the function source, answer:
- Does it read
species_collection_mapto fillspecies_code? (Expected YES per Điều 0-G) - What happens when no mapping exists? (Expected NULL passthrough)
- Does it fill
composition_levelfrom the same JOIN? - What JOIN condition does it use? (Expected
scm.collection_name = TG_TABLE_NAME)
8. D7 — Candidate analysis (read-only, NO INSERT)
# D7.1: valid composition_level values
if col_exists entity_species composition_level; then
run_query "D7.1a: composition_level from entity_species" -c "
SELECT DISTINCT composition_level FROM public.entity_species WHERE composition_level IS NOT NULL ORDER BY 1;"
fi
if col_exists species_collection_map composition_level; then
run_query "D7.1b: composition_level from species_collection_map" -c "
SELECT DISTINCT composition_level FROM public.species_collection_map WHERE composition_level IS NOT NULL ORDER BY 1;"
fi
# D7.2: species_code naming pattern
if col_exists entity_species species_code; then
run_query "D7.2: species_code prefixes" -c "
SELECT left(species_code, 4) AS prefix, count(*) FROM public.entity_species
GROUP BY 1 ORDER BY 1;"
fi
# D7.3: IU-adjacent collections (CANDIDATE ONLY — candidate_not_approved)
if [ -n "$SCM_COLNAME" ]; then
run_query "D7.3: IU-adjacent collections (candidate_not_approved)" -c "
SELECT * FROM public.species_collection_map
WHERE $SCM_COLNAME ILIKE '%unit%' OR $SCM_COLNAME ILIKE '%content%'
OR $SCM_COLNAME ILIKE '%document%' OR $SCM_COLNAME ILIKE '%law%'
ORDER BY $SCM_COLNAME;"
fi
9. D8 — Containment analysis for Điều 0-B composition decision
# D8.0: detect parent column in tac_logical_unit
echo "=== D8.0: Detect parent column in tac_logical_unit ==="
TAC_PARENT=$(detect_col tac_logical_unit parent_ref parent_id parent_or_container_ref container_ref)
echo "Detected TAC parent column: ${TAC_PARENT:-NONE}"
# D8.0b: detect parent column in information_unit
IU_PARENT=$(detect_col information_unit parent_or_container_ref parent_ref parent_id container_ref)
echo "Detected IU parent column: ${IU_PARENT:-NONE}"
# D8.1: TAC parent-child nesting
if [ -n "$TAC_PARENT" ]; then
run_query "D8.1: TAC parent-child counts (parent=$TAC_PARENT)" -c "
SELECT
count(*) FILTER (WHERE $TAC_PARENT IS NOT NULL) AS has_parent,
count(*) FILTER (WHERE $TAC_PARENT IS NULL) AS root_units,
count(*) AS total
FROM public.tac_logical_unit;"
else
echo "D8.1 SKIPPED: no parent column found in tac_logical_unit"
fi
# D8.2: IU parent state
if [ -n "$IU_PARENT" ]; then
run_query "D8.2: IU parent state (parent=$IU_PARENT)" -c "
SELECT
count(*) AS total_iu,
count(*) FILTER (WHERE $IU_PARENT IS NOT NULL) AS has_parent,
count(*) FILTER (WHERE $IU_PARENT IS NULL) AS no_parent
FROM public.information_unit;"
else
echo "D8.2 SKIPPED: no parent column found in information_unit"
fi
# D8.3: TAC parent→child sample (containment evidence for Điều 0-B)
if [ -n "$TAC_PARENT" ]; then
ST_COL=$(detect_col tac_logical_unit section_type type)
ST_SEL="${ST_COL:+, p.$ST_COL AS parent_type, c.$ST_COL AS child_type}"
run_query "D8.3: TAC parent→child sample (LIMIT 20)" -c "
SELECT p.id AS parent_id${ST_SEL:-}, c.id AS child_id
FROM public.tac_logical_unit p
JOIN public.tac_logical_unit c ON c.$TAC_PARENT = p.id
ORDER BY p.id, c.id LIMIT 20;"
# D8.3b: nesting depth
run_query "D8.3b: TAC nesting depth" <<SQL
WITH RECURSIVE tree AS (
SELECT id, $TAC_PARENT AS parent, 0 AS depth
FROM public.tac_logical_unit WHERE $TAC_PARENT IS NULL
UNION ALL
SELECT tlu.id, tlu.$TAC_PARENT, t.depth + 1
FROM public.tac_logical_unit tlu JOIN tree t ON tlu.$TAC_PARENT = t.id
WHERE t.depth < 10
)
SELECT max(depth) AS max_depth, count(*) AS total_nodes FROM tree;
SQL
else
echo "D8.3/D8.3b SKIPPED: no parent column in tac_logical_unit"
fi
# D8.4: planner parent handling
run_query "D8.4: fn_iu_create_plan source (for parent references)" -c "
SELECT pg_get_functiondef(oid) FROM pg_proc
WHERE proname = 'fn_iu_create_plan' AND pronamespace = 'public'::regnamespace;"
Agent: from D8 results, answer the Điều 0-B containment question for the report:
- If TAC units have children that ALSO become separate IU rows → IU CONTAINS other entities → NOT atom
- If TAC parent/child is ordering within a publication (children don't become separate IU) → atom
- If unclear → report ambiguity, let GPT/User decide
10. Report requirements
Save to: knowledge/dev/laws/dieu44-trien-khai/reports/p3d-pack1-phase4b-species-composition-discovery-report.md
Mandatory contents:
RUN_MARKERvalue.- D0 schema introspection output verbatim. Note any FAILED sub-blocks.
- D1 full entity_species catalog.
- D2 species_collection_map landscape + IU/UV gap.
- D3 collection_registry IU/UV/TAC state.
- D4 birth_registry species/composition state for IU/UV.
- D5 label/edge coverage for IU/UV + exemplar if found.
- D6 fn_birth_registry_auto source + analysis.
- D7 candidate analysis (labelled
candidate_not_approved). - D8 containment analysis for Điều 0-B.
- Law crosswalk summary: for each compliance matrix item (addendum §B #1-12), state what discovery found.
- Column detection results: for each detect_col call, what was detected vs what was tried.
- Failed sub-queries: which query, what error, what evidence is missing.
- Status flags:
phase4b_discovery_status=PASS|PARTIAL|BLOCKED
mode=READ_ONLY_DISCOVERY
no_mutation_performed=true
entity_species_columns_discovered=true|false
entity_species_row_count=<N>
species_collection_map_iu_rows=<N>
birth_registry_iu_null_species_count=<N>
entity_labels_iu_count=<N>
universal_edges_iu_count=<N>
fn_birth_registry_auto_logic_documented=true|false
tac_parent_column_detected=<column_name|NONE>
tac_max_nesting_depth=<N|UNKNOWN>
containment_assessment=<atom_likely|not_atom|ambiguous>
11. Out of scope
- Do not INSERT into any table.
- Do not UPDATE any table.
- Do not dispatch nested agents.
- Do not patch functions or triggers.
- Do not derive production species decisions from fuzzy D7.3 results.
- Do not assume column names exist without D0 or detect_col verification.
Phase 4B Discovery Prompt rev6 | detect_col return 0 fix | Truly introspect-first | Column detection + fallback | Independent sub-queries | No hardcode | 2026-05-11