KB-721D

P3D Pack 1 Phase 4B — Species/Composition/Registry Discovery Report

16 min read Revision 1
p3dpack1phase4bdiscoveryreportread-onlyrev62026-05-11

P3D Pack 1 Phase 4B — Species/Composition/Registry Discovery Report

Date: 2026-05-11 Mode: READ-ONLY DISCOVERY (prompt rev6 approved 2026-05-11) Executor: Opus 4.7 RUN_MARKER: p3d-phase4b-discovery-20260511-092233 Log on VPS: /tmp/p3d-phase4b-discovery-20260511-092233.log No mutations performed. No nested agent dispatch. No production species/composition decision.


0. Column detection summary

Target Detected Tried
entity_species.parent parent_id parent_id, parent_ref, parent_species_id
entity_species.depth depth depth, tree_depth, level
species_collection_map.collection_name collection_name collection_name, name, table_name
species_collection_map.composition_level NOT FOUND (single check)
collection_registry.collection_name collection_name collection_name, name, table_name
collection_registry.governance_role governance_role governance_role, role
collection_registry.species_code species_code species_code
birth_registry.collection_name collection_name collection_name
birth_registry.species_code species_code species_code
birth_registry.composition_level composition_level (single check)
entity_labels.code entity_code entity_code, code, entity_ref
entity_labels.facet NOT FOUND facet_code, facet, label_facet
universal_edges.source source_collection source_collection, source_table, from_collection
universal_edges.target target_collection target_collection, target_table, to_collection
universal_edges.edge_type edge_type edge_type, relation_type, type
tac_logical_unit.parent parent_id parent_ref, parent_id, parent_or_container_ref, container_ref
information_unit.parent parent_or_container_ref parent_or_container_ref, parent_ref, parent_id, container_ref

Failed sub-queries: none. All gates evaluated. Skips were intentional (column not present).


1. D0 — Schema introspection

entity_species (18 cols)

Key fields: id, code, species_code (NOT NULL), display_name, composition_level (NOT NULL), management_mode (default governed), prefix, status (default active), parent_id, depth (default 1), kg_metadata (jsonb). No CHECK constraints.

species_collection_map (13 cols)

Key fields: species_code (NOT NULL), collection_name (NOT NULL), is_primary (default true), discriminator_field, discriminator_value, discriminator_operator, discriminator_config (json). No composition_level column — composition comes from entity_species via JOIN. No CHECK constraints.

collection_registry (29 cols)

Key fields: code, name, collection_name, storage_role, governance_role, source_kind, migration_state (default unclassified), species_code, description_policy, birth_code_strategy, birth_code_column, birth_identity_source.

birth_registry (19 cols)

Key fields: entity_code (NOT NULL), collection_name (NOT NULL), species_code (nullable), composition_level (nullable), dot_origin, born_at, governance_role, inspect_pen, inspect_stamp, inspect_gate, certified (default false), status (default born). No CHECK constraints.

entity_labels (6 cols)

id, entity_code (NOT NULL), label_code (NOT NULL), assigned_by (default auto), rule_id, assigned_at. No facet_code column.

universal_edges (25 cols)

Key fields: source_collection, source_id, source_code, source_composition_level, target_collection, target_id, target_code, target_composition_level, edge_type, edge_subtype, weight, source_info, is_auto_managed, symmetry_group_id, metadata (jsonb), valid_from/valid_to, confidence, valid_time (tstzrange), provenance (jsonb).


2. D1 — entity_species catalog

  • Total species: 40
  • Composition distribution: atom=20, compound=8, meta=1, molecule=11
  • All 40 species have parent_id=NULL and depth=1 — the tree is flat in practice. No multi-level taxonomy is currently materialized.
  • Management modes observed: governed, observed, excluded (across catalog).
  • Notable species relevant to IU/Law domain:
    • SPE-LAW (law, compound, governed, prefix=LAW)
    • SPE-JUR (jurisdiction, atom, governed, prefix=JUR)
    • SPE-ENF (law_enforcement, atom, governed, prefix=ENF)
    • SPE-GRL (governance_relation, atom, governed, prefix=GRL)
    • SPE-GAG (governance_agency, compound, governed, prefix=GOV)
    • SPE-GOV (governance_infra, atom, observed) — current gom species for governance plumbing
    • SPE-APP (approval_request, compound, governed, prefix=APR-)

3. D2 — species_collection_map landscape

  • Total mappings: 153 rows covering 39 species.
  • is_primary=true for all rows.
  • species_collection_map has no composition_level column (D2.4 skipped). Composition is derived through entity_species at JOIN time (confirmed in D6).
  • Heaviest gom species: cms_block (26), os_crm (24), governance_infra (15), website_content (14), business_support (11).
  • Collections WITHOUT species mapping (15):
    • information_unit ← target of Phase 4B
    • unit_version ← target of Phase 4B
    • admin_fallback_log
    • apr_action_types, apr_approvals, apr_request_types
    • binding_registry
    • dot_domain_rules
    • field_type_equivalences
    • governance_audit_log
    • kb_audit_log, kb_documents_history
    • law_version_verification_log
    • nrm_approval_rules, nrm_doc_type_config
  • D2.6 IU/UV/TAC mapping: 0 rows. Confirms Phase 3 finding.

4. D3 — collection_registry IU/UV/TAC state

  • COL-176 information_unit: governance_role=observed, migration_state=pilot, species_code=NULL, description_policy=structured_exempt, birth_code_strategy=synthetic_id, birth_identity_source=manual, storage_role=primary, source_kind=native, _dot_origin=DIRECTUS.
  • COL-177 unit_version: same shape; birth_code_strategy=subordinate.
  • No tac_logical_unit / tac_unit_version / tac_publication rows in collection_registry (these are unregistered TAC tables in current schema).
  • governance_role distribution: observed=67, excluded=60, governed=34, locked=3, law_artifact=2.
  • D3.3 governed+species reference pattern (LIMIT 30) returned 6 governed law-domain collections all with species_code filled and _dot_origin=DIRECTUS (governance_registry → SPE-GAG, governance_relations → SPE-GRL, law_dot_enforcement → SPE-ENF, law_jurisdiction → SPE-JUR, normative_registry → SPE-LAW, normative_relations → SPE-GOV).

5. D4 — birth_registry IU/UV state

  • information_unit: 12 birth rows, all with species_code=NULL, composition_level=NULL, governance_role=observed, certified=false, status=born. All dot_origin=PG:trg_birth_information_unit, born between 2026-05-05 and 2026-05-07.
  • unit_version: 0 birth rows.
  • Total birth_registry rows with NULL species: 12 — entirely from information_unit.
  • D4.3 confirms healthy birth ecosystem for the other 38 species (98 614+ rows distributed across the full composition spectrum, with composition_level consistently populated).

6. D5 — Labels and edges

  • entity_labels for information_unit::% or unit_version::%: 0.
  • universal_edges for source/target in (information_unit, unit_version): 0.
  • entity_labels.facet_code column does not exist (D5.3 skipped — schema has only entity_code, label_code, assigned_by, rule_id, assigned_at).
  • universal_edges.edge_type usage: USES=1486, BELONGS_TO=431, CONTAINS=282.
  • D5.5 exemplar (governed + species_collection_map + entity_labels + universal_edges): 0 rows — no governed collection in the live registry simultaneously satisfies all four conditions (the entity_labels coverage is too narrow).

7. D6 — fn_birth_registry_auto analysis

Source captured. Key logic:

SELECT scm.species_code, es.composition_level
  INTO v_species_code, v_comp_level
  FROM species_collection_map scm
  LEFT JOIN entity_species es ON es.species_code = scm.species_code
  WHERE scm.collection_name = TG_TABLE_NAME
    AND scm.is_primary = true
  LIMIT 1;

SELECT governance_role INTO v_gov_role
  FROM collection_registry
  WHERE collection_name = TG_TABLE_NAME
  LIMIT 1;

INSERT INTO birth_registry (...)
VALUES (v_entity_code, TG_TABLE_NAME, v_species_code, v_comp_level, ...,
        COALESCE(v_gov_role,'excluded'), false)
ON CONFLICT (entity_code) DO NOTHING;

Answers to prompt §7 questions:

  • Does it read species_collection_map to fill species_code? YES (Điều 0-G compliant).
  • What happens when no mapping exists? Both v_species_code and v_comp_level stay NULL; INSERT still proceeds with NULL species and NULL composition. governance_role falls back to 'excluded' only if collection_registry row is missing — for IU/UV it resolves to observed.
  • Does it fill composition_level? YES, from entity_species.composition_level via JOIN — not from species_collection_map (which has no such column).
  • JOIN condition? scm.collection_name = TG_TABLE_NAME AND scm.is_primary = true.

This explains exactly why the 12 IU births have NULL species/composition: no mapping row exists; function is working as designed.

fn_birth_registry_auto_logic_documented = true.


8. D7 — Candidate analysis (candidate_not_approved)

  • Valid composition_level enum values (from entity_species distinct): atom, compound, meta, molecule. Same set in species_collection_map? — column doesn't exist, so no second source. Only entity_species is authoritative.

  • species_code naming: all snake_case identifiers. Prefix codes (SPE-XXX) are stored in entity_species.code, not in species_code itself.

  • D7.3 IU-adjacent (heuristic, candidate_not_approved):

    • content_requestsbusiness_support (gom)
    • knowledge_documentsai_support (gom)
    • law_catalogentity_rule
    • law_dot_enforcementlaw_enforcement
    • law_jurisdictionjurisdiction
    • law_registrylaw

    No existing dedicated species named information_unit or unit_version. No production species decision derived from this list.


9. D8 — Containment analysis for Điều 0-B

D8.1 TAC nesting state

  • tac_logical_unit: total 86, has_parent=83, root_units=3 → highly nested.
  • Max recursive depth: 2 (so the tree spans up to 3 levels: 0→1→2).

D8.2 IU parent state

  • information_unit: total 12, has_parent=0, no_parent=12. Column parent_or_container_ref exists but is universally NULL on current rows.

D8.3 TAC parent→child sample

  • Containment is structural across distinct section types: e.g.,
    • headingprinciple, checklist, process, paragraph, technical_spec, heading (nested)
    • governance_processgovernance_process (recursive)
  • Each child is its own tac_logical_unit row with its own UUID id. This is structural containment, not ordering metadata.

D8.4 fn_iu_create_plan

  • Signature accepts p_parent_ref uuid DEFAULT NULL and p_publication_type text DEFAULT NULL.
  • The planner is structurally aware of a parent reference (it's a first-class plan input), but the plan_ok path does not require it — IU can be created without a parent.
  • The planner does not currently mutate child IUs from a parent — it plans single-IU creation only.

Containment assessment

The TAC model is unambiguously a container hierarchy (parent rows contain child rows of different section types as separate entities). The IU model, as currently materialized, has a parent_or_container_ref column but zero usage. The planner accepts a parent_ref input but does not propagate containment semantics.

If Phase 5 migrates TAC → IU and preserves the TAC nesting via parent_or_container_ref, then IU rows WILL contain other IU rows → IU would NOT be atom at minimum for the IU rows that have children (likely molecule or compound per Điều 0-B).

If IU is migrated as a flat normalized table and TAC nesting is encoded as a sibling property only, IU could remain atom.

The schema permits both. Live data is unused. Planner signature suggests parent semantics are intended.

containment_assessment = ambiguous — TAC evidence indicates containment, IU column exists for it, but current IU rows do not exercise it. Decision must be made by GPT/User based on Phase 5 migration intent.


10. Law crosswalk vs compliance matrix (addendum §B)

# Requirement Evidence from discovery Status
1 Every collection has species mapping 15 collections (incl. IU/UV) without mapping confirmed gap
2 Every collection has birth trigger 12 IU births via trg_birth_information_unit; 0 UV births trigger present for IU, UV trigger present but unexercised
3 Every collection has governance_role COL-176/177 both observed present but not governed
4 Purpose description Both have description; policy structured_exempt exempt tier
5 Birth auto-fills species from mapping D6 function source confirmed; no mapping → NULL gap due to missing mapping
6 Birth auto-fills composition from species D6 confirms JOIN to entity_species.composition_level gap due to missing mapping (same root cause as #5)
7 Existing entities → QT-001 backfill 12 IU birth rows with NULL species; 0 UV rows awaiting QT-001
8 Composition determined by containment D8: TAC contains children; IU column present but unused; planner accepts parent_ref ambiguous (see §9)
9 One classification system IU/UV outside species universe (no mapping, no species_code on registry) confirmed violation
10 QT-005 promotion observed→governed Both still observed; migration_state=pilot open question
11 Species gom for non-governed SPE-GOV (governance_infra, atom, observed) exists and is the gom for 15 governance_infra collections incl. birth_registry, species_collection_map option available
12 QT-003R retroactive registration COL-176/177 already exist in collection_registry done

Summary: 4 confirmed gaps, 3 partial, 1 ambiguous, 1 open question, 2 options/done.


11. Status flags

phase4b_discovery_status=PASS
mode=READ_ONLY_DISCOVERY
no_mutation_performed=true
entity_species_columns_discovered=true
entity_species_row_count=40
species_collection_map_iu_rows=0
birth_registry_iu_null_species_count=12
entity_labels_iu_count=0
universal_edges_iu_count=0
fn_birth_registry_auto_logic_documented=true
tac_parent_column_detected=parent_id
tac_max_nesting_depth=2
containment_assessment=ambiguous
script_abort_due_to_optional_missing_column=false
failed_sub_queries=0
candidate_outputs_labelled=candidate_not_approved

Review this report with GPT/User to decide:

  1. Species decision (gom vs dedicated): assign SPE-GOV (governance_infra) as gom for IU/UV while observed, OR create dedicated SPE-IUN / SPE-UVN species under QT-005 if promoting to governed.
  2. Composition decision (Điều 0-B): confirm whether Phase 5 migration will preserve TAC parent nesting in IU via parent_or_container_ref (→ molecule/compound) or flatten it (→ atom).
  3. Phase 4C executable plan: must include explicit QT-001 5-step backfill for the 12 IU birth rows, NOT ad-hoc UPDATE; must include species_collection_map INSERT (metadata-only); must NOT patch fn_birth_registry_auto.

No production decision locked. No migration started.


Phase 4B Discovery Report rev6 | RUN=p3d-phase4b-discovery-20260511-092233 | 2026-05-11

Back to Knowledge Hub knowledge/dev/laws/dieu44-trien-khai/reports/p3d-pack1-phase4b-species-composition-discovery-report.md