KB-6858 rev 9

P3D Pack 1 Phase 4B — Read-Only Species/Composition/Registry Discovery Prompt (rev5)

20 min read Revision 9
p3dpack1phase4bdiscoveryread-onlyintrospect-firstrev5

P3D Pack 1 Phase 4B — Read-Only Species/Composition/Registry Discovery Prompt (rev6)

Date: 2026-05-11 | Status: READ-ONLY DISCOVERY — no INSERT, no UPDATE, no DDL Legal alignment: Serves compliance matrix in p3d-pack1-phase4b-legal-alignment-addendum.md. Discoveries inform decisions under Điều 0-B (composition), Điều 0-G (birth), Điều 29 (one-system), QT-001/QT-005. Introspect-first: Every schema-sensitive query is gated by column detection. No assumed column names. Graceful degradation: Every sub-query runs independently. Failure → log + continue. No cascade abort. All counts are SNAPSHOTS. No fuzzy matching for production policy.


0. Preflight

set -euo pipefail
TS=$(date +%Y%m%d-%H%M%S)
RUN_MARKER="p3d-phase4b-discovery-${TS}"
LOG="/tmp/${RUN_MARKER}.log"
exec > >(tee -a "$LOG") 2>&1
PG_CONTAINER="${PG_CONTAINER:-postgres}"
PG_DB="${PG_DB:-directus}"
PG_USER="${PG_USER:-directus}"
PSQL_CMD=(docker exec -i "$PG_CONTAINER" psql -U "$PG_USER" -d "$PG_DB")
# Note: ON_ERROR_STOP is NOT set globally. Each sub-query handles its own errors.
# This is intentional: graceful degradation requires individual failure handling.

echo "=== PHASE 4B DISCOVERY rev6 — RUN=$RUN_MARKER ==="
echo "=== READ-ONLY: no mutations ==="

# Helper: run a single query, log result or failure
run_query() {
  local label="$1"; shift
  echo ""
  echo "======== $label ========"
  "${PSQL_CMD[@]}" "$@" 2>&1 || echo ">>> $label FAILED — see error above, continuing <<<"
}

# Helper: check if a column exists in a table (returns 0=exists, 1=not found)
col_exists() {
  local tbl="$1" col="$2"
  "${PSQL_CMD[@]}" -t -A -c "
    SELECT 1 FROM information_schema.columns
    WHERE table_schema='public' AND table_name='$tbl' AND column_name='$col';
  " 2>/dev/null | grep -q 1
}

# Helper: detect which column from a candidate list exists.
# ALWAYS returns exit code 0 (even if no column found) because set -e is active.
# Empty stdout = no matching column found. Non-empty stdout = detected column name.
detect_col() {
  local tbl="$1"; shift
  for candidate in "$@"; do
    if col_exists "$tbl" "$candidate"; then
      echo "$candidate"
      return 0
    fi
  done
  echo ""
  return 0  # NOT return 1 — set -e would abort on VAR=$(detect_col ...) if return 1
}

1. D0 — Schema introspection (CORE — if this fails entirely, STOP)

for TBL in entity_species species_collection_map collection_registry birth_registry entity_labels universal_edges; do
  run_query "D0: $TBL columns" -c "
    SELECT column_name, data_type, is_nullable, column_default
    FROM information_schema.columns
    WHERE table_schema='public' AND table_name='$TBL'
    ORDER BY ordinal_position;"
done

# D0 CHECK constraints
for TBL in entity_species species_collection_map birth_registry; do
  run_query "D0: $TBL CHECK constraints" -c "
    SELECT conname, pg_get_constraintdef(oid) AS def
    FROM pg_constraint
    WHERE conrelid = 'public.$TBL'::regclass AND contype = 'c'
    ORDER BY conname;"
done

2. D1 — Full entity_species catalog

# D1.1: all species (SELECT * — no column assumption)
run_query "D1.1: entity_species full catalog" -c "SELECT * FROM public.entity_species ORDER BY 1;"

# D1.2: count (snapshot)
run_query "D1.2: entity_species count" -c "SELECT count(*) AS total_species FROM public.entity_species;"

# D1.3: composition_level distribution (gate column first)
if col_exists entity_species composition_level; then
  run_query "D1.3: composition_level distribution" -c "
    SELECT composition_level, count(*) AS species_count
    FROM public.entity_species GROUP BY composition_level ORDER BY composition_level;"
else
  echo "D1.3 SKIPPED: entity_species.composition_level not found"
fi

# D1.4: species tree (gate parent/depth columns)
PARENT_SP=$(detect_col entity_species parent_id parent_ref parent_species_id)
DEPTH_SP=$(detect_col entity_species depth tree_depth level)
if [ -n "$PARENT_SP" ] && [ -n "$DEPTH_SP" ]; then
  run_query "D1.4: species tree (parent=$PARENT_SP, depth=$DEPTH_SP)" -c "
    SELECT id, species_code, $PARENT_SP AS parent, $DEPTH_SP AS depth
    FROM public.entity_species ORDER BY $DEPTH_SP, $PARENT_SP NULLS FIRST, species_code;"
elif [ -n "$PARENT_SP" ]; then
  run_query "D1.4: species tree (parent=$PARENT_SP, depth=unknown)" -c "
    SELECT id, species_code, $PARENT_SP AS parent
    FROM public.entity_species ORDER BY $PARENT_SP NULLS FIRST, species_code;"
else
  echo "D1.4 SKIPPED: no parent column found in entity_species (tried: parent_id, parent_ref, parent_species_id)"
fi

3. D2 — Species-collection mapping landscape

# D2.1: all mappings (SELECT *)
run_query "D2.1: species_collection_map full" -c "SELECT * FROM public.species_collection_map ORDER BY 1;"

# D2.2: count
run_query "D2.2: species_collection_map count" -c "SELECT count(*) AS total_mappings FROM public.species_collection_map;"

# D2.3: species distribution in mapping (gate species_code)
if col_exists species_collection_map species_code; then
  run_query "D2.3: species distribution" -c "
    SELECT species_code, count(*) AS collections_mapped
    FROM public.species_collection_map GROUP BY species_code ORDER BY species_code;"
else
  echo "D2.3 SKIPPED: species_collection_map.species_code not found"
fi

# D2.4: composition distribution in mapping (gate composition_level)
if col_exists species_collection_map composition_level; then
  run_query "D2.4: composition distribution" -c "
    SELECT composition_level, count(*) AS collections_at_level
    FROM public.species_collection_map GROUP BY composition_level ORDER BY composition_level;"
else
  echo "D2.4 SKIPPED: species_collection_map.composition_level not found"
fi

# D2.5: collections without species mapping (gate collection_name in both tables)
SCM_COLNAME=$(detect_col species_collection_map collection_name name table_name)
CR_COLNAME=$(detect_col collection_registry collection_name name table_name)
if [ -n "$SCM_COLNAME" ] && [ -n "$CR_COLNAME" ]; then
  run_query "D2.5: collections WITHOUT species mapping" -c "
    SELECT cr.$CR_COLNAME, cr.*
    FROM public.collection_registry cr
    WHERE NOT EXISTS (
      SELECT 1 FROM public.species_collection_map scm
      WHERE scm.$SCM_COLNAME = cr.$CR_COLNAME
    )
    ORDER BY cr.$CR_COLNAME;"
else
  echo "D2.5 SKIPPED: could not detect collection_name column in species_collection_map or collection_registry"
fi

# D2.6: IU/UV/TAC in species_collection_map
if [ -n "$SCM_COLNAME" ]; then
  run_query "D2.6: IU/UV/TAC species mapping" -c "
    SELECT * FROM public.species_collection_map
    WHERE $SCM_COLNAME IN ('information_unit','unit_version',
      'tac_logical_unit','tac_unit_version','tac_publication')
    ORDER BY $SCM_COLNAME;"
else
  echo "D2.6 SKIPPED: collection_name column not found"
fi

4. D3 — Collection registry IU/UV/TAC state

# D3.1: IU/UV/TAC rows (gate collection_name)
if [ -n "$CR_COLNAME" ]; then
  run_query "D3.1: collection_registry IU/UV/TAC" -c "
    SELECT * FROM public.collection_registry
    WHERE $CR_COLNAME IN ('information_unit','unit_version',
      'tac_logical_unit','tac_unit_version','tac_publication')
    ORDER BY $CR_COLNAME;"
else
  # Fallback: dump all and let agent filter
  run_query "D3.1 FALLBACK: collection_registry full dump" -c "SELECT * FROM public.collection_registry ORDER BY 1;"
fi

# D3.2: governance_role distribution (gate column)
GOV_ROLE=$(detect_col collection_registry governance_role role)
if [ -n "$GOV_ROLE" ]; then
  run_query "D3.2: governance_role distribution" -c "
    SELECT $GOV_ROLE, count(*) AS collections
    FROM public.collection_registry GROUP BY $GOV_ROLE ORDER BY $GOV_ROLE;"
else
  echo "D3.2 SKIPPED: governance_role column not found"
fi

# D3.3: governed collections with species (reference pattern)
SP_CODE_CR=$(detect_col collection_registry species_code)
if [ -n "$GOV_ROLE" ] && [ -n "$SP_CODE_CR" ]; then
  run_query "D3.3: governed+species reference pattern (LIMIT 30)" -c "
    SELECT * FROM public.collection_registry
    WHERE $GOV_ROLE = 'governed' AND $SP_CODE_CR IS NOT NULL
    ORDER BY $CR_COLNAME LIMIT 30;"
else
  echo "D3.3 SKIPPED: governance_role or species_code column not found"
fi

5. D4 — Birth registry IU/UV state

# D4 gate: detect collection_name in birth_registry
BR_COLNAME=$(detect_col birth_registry collection_name)
BR_SPECIES=$(detect_col birth_registry species_code)

# D4.1: birth_registry for information_unit
if [ -n "$BR_COLNAME" ]; then
  run_query "D4.1: birth_registry information_unit" -c "
    SELECT * FROM public.birth_registry WHERE $BR_COLNAME = 'information_unit' ORDER BY 1;"

  run_query "D4.2: birth_registry unit_version" -c "
    SELECT * FROM public.birth_registry WHERE $BR_COLNAME = 'unit_version' ORDER BY 1;"
else
  echo "D4.1/D4.2 SKIPPED: birth_registry.collection_name not found"
fi

# D4.3: species/composition distribution
if [ -n "$BR_SPECIES" ]; then
  BR_COMP=$(detect_col birth_registry composition_level)
  run_query "D4.3: birth species/composition distribution (LIMIT 50)" -c "
    SELECT species_code, ${BR_COMP:-'NULL AS composition_level'}, count(*) AS births
    FROM public.birth_registry WHERE species_code IS NOT NULL
    GROUP BY 1, 2 ORDER BY 1, 2 LIMIT 50;"

  run_query "D4.4: births with NULL species" -c "
    SELECT count(*) AS null_species_births FROM public.birth_registry WHERE species_code IS NULL;"

  if [ -n "$BR_COLNAME" ]; then
    run_query "D4.5: NULL species by collection" -c "
      SELECT $BR_COLNAME, count(*) AS null_births
      FROM public.birth_registry WHERE species_code IS NULL
      GROUP BY $BR_COLNAME ORDER BY null_births DESC;"
  fi
else
  echo "D4.3-D4.5 SKIPPED: birth_registry.species_code not found"
fi

6. D5 — Entity labels and universal edges

# D5.1: entity_labels for IU/UV (gate entity_code)
EL_CODE=$(detect_col entity_labels entity_code code entity_ref)
if [ -n "$EL_CODE" ]; then
  run_query "D5.1: entity_labels IU/UV count" -c "
    SELECT count(*) AS iu_labels FROM public.entity_labels
    WHERE $EL_CODE LIKE 'information_unit::%' OR $EL_CODE LIKE 'unit_version::%';"
else
  echo "D5.1 SKIPPED: entity_labels code column not found (tried: entity_code, code, entity_ref)"
fi

# D5.2: universal_edges for IU/UV (detect columns first)
UE_SRC=$(detect_col universal_edges source_collection source_table from_collection)
UE_TGT=$(detect_col universal_edges target_collection target_table to_collection)
if [ -n "$UE_SRC" ] && [ -n "$UE_TGT" ]; then
  run_query "D5.2: universal_edges IU/UV (src=$UE_SRC, tgt=$UE_TGT)" -c "
    SELECT count(*) AS iu_edges FROM public.universal_edges
    WHERE $UE_SRC IN ('information_unit','unit_version')
       OR $UE_TGT IN ('information_unit','unit_version');"
else
  echo "D5.2: source/target columns not detected — using FALLBACK jsonb scan"
  run_query "D5.2 FALLBACK: universal_edges IU mention scan" -c "
    SELECT count(*) AS iu_edge_mentions FROM public.universal_edges
    WHERE to_jsonb(universal_edges)::text ILIKE '%information_unit%';"
fi

# D5.3: label facets in use (gate facet_code)
EL_FACET=$(detect_col entity_labels facet_code facet label_facet)
if [ -n "$EL_FACET" ]; then
  run_query "D5.3: label facets in use (top 20)" -c "
    SELECT $EL_FACET, count(*) AS usage FROM public.entity_labels
    GROUP BY $EL_FACET ORDER BY usage DESC LIMIT 20;"
else
  echo "D5.3 SKIPPED: facet column not found"
fi

# D5.4: edge types in use (gate edge_type)
UE_TYPE=$(detect_col universal_edges edge_type relation_type type)
if [ -n "$UE_TYPE" ]; then
  run_query "D5.4: edge types in use (top 20)" -c "
    SELECT $UE_TYPE, count(*) AS usage FROM public.universal_edges
    GROUP BY $UE_TYPE ORDER BY usage DESC LIMIT 20;"
else
  echo "D5.4 SKIPPED: edge_type column not found"
fi

# D5.5: exemplar governed collection with full pipeline
if [ -n "$GOV_ROLE" ] && [ -n "$SP_CODE_CR" ] && [ -n "$SCM_COLNAME" ] && [ -n "$EL_CODE" ] && [ -n "$UE_SRC" ]; then
  run_query "D5.5: exemplar governed+species+labels+edges collection" <<SQL
WITH candidates AS (
  SELECT cr.$CR_COLNAME AS coll
  FROM public.collection_registry cr
  JOIN public.species_collection_map scm ON scm.$SCM_COLNAME = cr.$CR_COLNAME
  WHERE cr.$GOV_ROLE = 'governed'
    AND EXISTS (SELECT 1 FROM public.entity_labels el WHERE el.$EL_CODE LIKE cr.$CR_COLNAME || '::%')
    AND EXISTS (SELECT 1 FROM public.universal_edges ue WHERE ue.$UE_SRC = cr.$CR_COLNAME)
  LIMIT 1
)
SELECT 'exemplar_collection=' || coll FROM candidates;
SQL
else
  echo "D5.5 SKIPPED: prerequisite columns not detected for exemplar query"
fi

7. D6 — fn_birth_registry_auto source inspection

run_query "D6.1: fn_birth_registry_auto source" -c "
  SELECT pg_get_functiondef(oid) AS source FROM pg_proc
  WHERE proname = 'fn_birth_registry_auto' AND pronamespace = 'public'::regnamespace;"

Agent: from the function source, answer:

  • Does it read species_collection_map to fill species_code? (Expected YES per Điều 0-G)
  • What happens when no mapping exists? (Expected NULL passthrough)
  • Does it fill composition_level from the same JOIN?
  • What JOIN condition does it use? (Expected scm.collection_name = TG_TABLE_NAME)

8. D7 — Candidate analysis (read-only, NO INSERT)

# D7.1: valid composition_level values
if col_exists entity_species composition_level; then
  run_query "D7.1a: composition_level from entity_species" -c "
    SELECT DISTINCT composition_level FROM public.entity_species WHERE composition_level IS NOT NULL ORDER BY 1;"
fi
if col_exists species_collection_map composition_level; then
  run_query "D7.1b: composition_level from species_collection_map" -c "
    SELECT DISTINCT composition_level FROM public.species_collection_map WHERE composition_level IS NOT NULL ORDER BY 1;"
fi

# D7.2: species_code naming pattern
if col_exists entity_species species_code; then
  run_query "D7.2: species_code prefixes" -c "
    SELECT left(species_code, 4) AS prefix, count(*) FROM public.entity_species
    GROUP BY 1 ORDER BY 1;"
fi

# D7.3: IU-adjacent collections (CANDIDATE ONLY — candidate_not_approved)
if [ -n "$SCM_COLNAME" ]; then
  run_query "D7.3: IU-adjacent collections (candidate_not_approved)" -c "
    SELECT * FROM public.species_collection_map
    WHERE $SCM_COLNAME ILIKE '%unit%' OR $SCM_COLNAME ILIKE '%content%'
       OR $SCM_COLNAME ILIKE '%document%' OR $SCM_COLNAME ILIKE '%law%'
    ORDER BY $SCM_COLNAME;"
fi

9. D8 — Containment analysis for Điều 0-B composition decision

# D8.0: detect parent column in tac_logical_unit
echo "=== D8.0: Detect parent column in tac_logical_unit ==="
TAC_PARENT=$(detect_col tac_logical_unit parent_ref parent_id parent_or_container_ref container_ref)
echo "Detected TAC parent column: ${TAC_PARENT:-NONE}"

# D8.0b: detect parent column in information_unit
IU_PARENT=$(detect_col information_unit parent_or_container_ref parent_ref parent_id container_ref)
echo "Detected IU parent column: ${IU_PARENT:-NONE}"

# D8.1: TAC parent-child nesting
if [ -n "$TAC_PARENT" ]; then
  run_query "D8.1: TAC parent-child counts (parent=$TAC_PARENT)" -c "
    SELECT
      count(*) FILTER (WHERE $TAC_PARENT IS NOT NULL) AS has_parent,
      count(*) FILTER (WHERE $TAC_PARENT IS NULL) AS root_units,
      count(*) AS total
    FROM public.tac_logical_unit;"
else
  echo "D8.1 SKIPPED: no parent column found in tac_logical_unit"
fi

# D8.2: IU parent state
if [ -n "$IU_PARENT" ]; then
  run_query "D8.2: IU parent state (parent=$IU_PARENT)" -c "
    SELECT
      count(*) AS total_iu,
      count(*) FILTER (WHERE $IU_PARENT IS NOT NULL) AS has_parent,
      count(*) FILTER (WHERE $IU_PARENT IS NULL) AS no_parent
    FROM public.information_unit;"
else
  echo "D8.2 SKIPPED: no parent column found in information_unit"
fi

# D8.3: TAC parent→child sample (containment evidence for Điều 0-B)
if [ -n "$TAC_PARENT" ]; then
  ST_COL=$(detect_col tac_logical_unit section_type type)
  ST_SEL="${ST_COL:+, p.$ST_COL AS parent_type, c.$ST_COL AS child_type}"
  run_query "D8.3: TAC parent→child sample (LIMIT 20)" -c "
    SELECT p.id AS parent_id${ST_SEL:-}, c.id AS child_id
    FROM public.tac_logical_unit p
    JOIN public.tac_logical_unit c ON c.$TAC_PARENT = p.id
    ORDER BY p.id, c.id LIMIT 20;"

  # D8.3b: nesting depth
  run_query "D8.3b: TAC nesting depth" <<SQL
WITH RECURSIVE tree AS (
  SELECT id, $TAC_PARENT AS parent, 0 AS depth
  FROM public.tac_logical_unit WHERE $TAC_PARENT IS NULL
  UNION ALL
  SELECT tlu.id, tlu.$TAC_PARENT, t.depth + 1
  FROM public.tac_logical_unit tlu JOIN tree t ON tlu.$TAC_PARENT = t.id
  WHERE t.depth < 10
)
SELECT max(depth) AS max_depth, count(*) AS total_nodes FROM tree;
SQL
else
  echo "D8.3/D8.3b SKIPPED: no parent column in tac_logical_unit"
fi

# D8.4: planner parent handling
run_query "D8.4: fn_iu_create_plan source (for parent references)" -c "
  SELECT pg_get_functiondef(oid) FROM pg_proc
  WHERE proname = 'fn_iu_create_plan' AND pronamespace = 'public'::regnamespace;"

Agent: from D8 results, answer the Điều 0-B containment question for the report:

  • If TAC units have children that ALSO become separate IU rows → IU CONTAINS other entities → NOT atom
  • If TAC parent/child is ordering within a publication (children don't become separate IU) → atom
  • If unclear → report ambiguity, let GPT/User decide

10. Report requirements

Save to: knowledge/dev/laws/dieu44-trien-khai/reports/p3d-pack1-phase4b-species-composition-discovery-report.md

Mandatory contents:

  1. RUN_MARKER value.
  2. D0 schema introspection output verbatim. Note any FAILED sub-blocks.
  3. D1 full entity_species catalog.
  4. D2 species_collection_map landscape + IU/UV gap.
  5. D3 collection_registry IU/UV/TAC state.
  6. D4 birth_registry species/composition state for IU/UV.
  7. D5 label/edge coverage for IU/UV + exemplar if found.
  8. D6 fn_birth_registry_auto source + analysis.
  9. D7 candidate analysis (labelled candidate_not_approved).
  10. D8 containment analysis for Điều 0-B.
  11. Law crosswalk summary: for each compliance matrix item (addendum §B #1-12), state what discovery found.
  12. Column detection results: for each detect_col call, what was detected vs what was tried.
  13. Failed sub-queries: which query, what error, what evidence is missing.
  14. Status flags:
phase4b_discovery_status=PASS|PARTIAL|BLOCKED
mode=READ_ONLY_DISCOVERY
no_mutation_performed=true
entity_species_columns_discovered=true|false
entity_species_row_count=<N>
species_collection_map_iu_rows=<N>
birth_registry_iu_null_species_count=<N>
entity_labels_iu_count=<N>
universal_edges_iu_count=<N>
fn_birth_registry_auto_logic_documented=true|false
tac_parent_column_detected=<column_name|NONE>
tac_max_nesting_depth=<N|UNKNOWN>
containment_assessment=<atom_likely|not_atom|ambiguous>

11. Out of scope

  • Do not INSERT into any table.
  • Do not UPDATE any table.
  • Do not dispatch nested agents.
  • Do not patch functions or triggers.
  • Do not derive production species decisions from fuzzy D7.3 results.
  • Do not assume column names exist without D0 or detect_col verification.

Phase 4B Discovery Prompt rev6 | detect_col return 0 fix | Truly introspect-first | Column detection + fallback | Independent sub-queries | No hardcode | 2026-05-11

Back to Knowledge Hub knowledge/dev/laws/dieu44-trien-khai/prompts/p3d-pack1-phase4b-readonly-species-composition-registry-discovery-prompt.md