KB-6BE5

GPT Analysis — Information Unit Infrastructure Next

5 min read Revision 1
dieu38information-unitmetadatalinkskgvectortext-as-codenext-steps

GPT Analysis — Information Unit Infrastructure Next

Date: 2026-05-01
Purpose: Record GPT assessment of User concerns after P10D production closeout.


1. Current state

P10A/P10B/P10D prove that information units can be cut, stored in PG, rendered back into original documents, and served on /knowledge/laws.

This solves only the first layer: document-to-units-to-document round trip.

It does not yet prove the smarter layers:

  • metadata intelligence;
  • cross-unit references;
  • topic assembly;
  • version dependency checks;
  • numeric/value consistency checks;
  • KG/vector projection usefulness;
  • automated detection of contradictions between code/text/documents.

2. Existing laws/design that already address parts of this

LSL-01

Key doctrines found:

  • NT15: Information Unit First.
  • Three layers: Content / Label / Publication.
  • Labels support pivot/grouping by doc/topic/layer/family.
  • Publication membership is distinct from label doc=X.
  • Canonical address is identity; references should target addresses/units, not raw text.
  • Vector/Qdrant is projection, not SoT.
  • Metadata changes update payload; content changes create new chunks/version.

Phụ lục 02C1

Contains concept of text-unit catalog and reference edges:

  • references point to canonical address;
  • broken refs detected when target retires;
  • refs are edges between units.

P5 / G6 schema work

Current production P10 schema supports the core 4 tables:

  • tac_publication;
  • tac_logical_unit;
  • tac_unit_version;
  • tac_publication_member.

But it likely does not yet fully implement relation/label/KG/vector/checker layers required for the smarter use cases.


3. Gaps to prove next

Gap A — Metadata profile

Need a practical metadata profile for each unit:

  • identity metadata;
  • source span;
  • section type;
  • lifecycle/review;
  • semantic labels;
  • topic labels;
  • entities mentioned;
  • numeric claims;
  • dependency/version references;
  • cross-references.

Gap B — Relation edges

Need a unit-to-unit relation model:

  • references;
  • depends_on;
  • contradicts;
  • refines;
  • supersedes;
  • implements;
  • evidence_for;
  • compatible_with / incompatible_with.

Gap C — Topic assembly

Need to assemble a topic view, not just original document view:

  • query labels/relations;
  • return units from many publications;
  • show source publication/address for each unit;
  • retain provenance and lifecycle.

Gap D — Consistency checks

Need checker prototypes for:

  • numeric claim drift (10% vs 15%);
  • version reference drift (Constitution says Law A v3.0, current is v3.1);
  • broken cross-reference;
  • code/text mismatch where code has canonical source of truth.

Gap E — Vector/KG projection

Need verify whether current vector/KG layer can be populated from PG unit metadata and whether it improves retrieval without becoming SoT.


Do not design the entire universe first.

Recommended next step:

P11A — Information Unit Metadata + Relation Proof on 3 Publications

Scope:

  1. Inventory current production schema and Directus collections for labels/relations/vector fields.
  2. Compare actual schema with LSL-01, P5, 02C1 requirements.
  3. Create a minimal metadata profile proposal using existing JSONB fields first if present.
  4. Create a small relation/metadata sample on the 86 existing units, preferably read-only draft package first.
  5. Demonstrate 3 queries:
    • original document view already done;
    • topic view: e.g. database/schema-related units across D28/D32/D35;
    • relation view: unit A references/depends on unit B.
  6. Demonstrate 2 checker prototypes:
    • version reference drift;
    • numeric claim drift.

Assembly-first rule:

  • Prefer existing PG JSONB fields, Directus collection permissions, labels, views, and existing query/table infrastructure.
  • Do not add schema until inventory proves existing fields cannot support the proof.
  • If schema is needed, propose a gated migration package, not direct mutation.

5. Deployment governance note

Deploy Governance Cleanup / Đ41 patch has been recorded as deferred TD:

knowledge/dev/laws/dieu38-trien-khai/td/td-deploy-governance-cleanup-deferred-2026-05-01.md

Back to Knowledge Hub knowledge/dev/laws/dieu38-trien-khai/reports/gpt-analysis-information-unit-infrastructure-next-2026-05-01.md