dot-iu-cutter v0.5 WS-3 Cross-source Topic Assembly Logical Proof Design
dot-iu-cutter v0.5 — WS-3 Cross-source Topic Assembly Logical Proof (Design)
Date: 2026-05-18 Phase: v0.5 WS-3 — cross-source topic assembly logical proof Nature: design_only_logical_proof — no schema, no table/view/function, no code, no dry-run, no production write, no Directus mutation, no edge creation, no APR approval. Authority consumed (read-only, NOT redesigned — QG1): WS-1 package (6 files) + WS-2 package (4 files) + GPT WS-2 review + P11D + P44-4A relation-edge-conformance.
Plain-language framing (QG7). "Topic assembly" = given one subject (here: how a collection is structured), pull every relevant unit of knowledge that lives in different kinds of source (the internal architecture constitution, an internal law, a process, the live database, the code, a report, a lesson), put them in a sensible order, label what role each plays (rule vs proof vs running-reality), and emit one coherent thread — without copying the underlying systems into the knowledge base and without inventing new graph edges. This file proves that the WS-1 + WS-2 logical contracts are sufficient to describe that operation on paper. It proves nothing physical.
0. Scope guard (read first)
ws3_proves: cross_source_topic_thread / topic_thread # for ONE sample topic
ws3_does_NOT_prove: compliance_matrix (full) # explicitly deferred — see gap/readiness file §C, QG9
sample_topic:
topic_code: collection_structure # snake_case, immutable (P11D §3.1)
topic_name: "cấu trúc collection"
topic_namespace: tac.structure # logical placeholder namespace (P11D §3.1 style)
topic_status: active # logical sample only
forbidden_respected: true
self_advance: PROHIBITED
all_addresses_in_this_file: LOGICAL PLACEHOLDER (QG6) — not extracted from production IU rows
1. The assembly_profile being proved
We instantiate one WS-1 assembly_profile of profile_type = cross_source_topic_thread (WS-1 brief G4 sketch). Field names are taken verbatim from the WS-1 G1 logical contract; values below are the logical proof instance. This is a logical contract instance, not a row, not a table.
assembly_profile: # LOGICAL INSTANCE — WS-1 G1 contract, not persisted
profile_id: "ap.cross_source_topic_thread.collection_structure.LOGICAL" # placeholder identity
profile_type: cross_source_topic_thread # resolve via registry (no enum hardcode)
input_selector:
by: topic_code
topic_code: collection_structure
# logical meaning: seed set = IUs carrying content_profile.topic_labels.topic_code == collection_structure
# (Optional Enrichment per Đ44 §9.1 + P44-3 INV-P4 — topic label does NOT block birth gate)
ordering_rule:
kind: internal_architecture # WS-1 G4 ordering_rule enum value
authority_aware: true # G4: ordering is authority-aware, NOT flat
tier_order: # logical render tiering for THIS thread
- normative_authority # rules first (constitution → law → process)
- implementation_authority # then running reality (sql_entity / code_artifact)
- evidence_authority # then proof / retrospective (report → lesson)
within_tier: source_numbering_ordinal_then_canonical_address # legible source order
inclusion_rule:
- include IU if topic_code == collection_structure (core)
- include IU reached via relation_filter during ENRICH (extended set)
- for normative families: honor source_family.status_policy (enacted_only / draft_flagged)
exclusion_rule:
- exclude IU with lifecycle in {retired}
- exclude IU superseded by an active successor (supersedes edge → keep successor only)
- exclude raw↔raw adjacency that has no IU anchor (out of Fabric scope — see §4)
relation_filter: # edge whitelist for ENRICH — see §3 + §4 for mechanism
iu_to_iu_universal_edges:
core: [references, depends_on, compatible_with, contradicts, implements, supersedes] # P11D §5.3.1 whitelist
candidate: [governed_by, derived_from] # P44-4A §3.2 CANDIDATE — pending v1.0 ratification (dependency, see §6)
apr_gated: [evidenced_by] # NOT in active filter — APR-gated, NOT approved (QG5/QG10, see §5)
contains_excluded: true # composition (Cap-5), not a Cap-4 relation (P11D §5.3.1)
uses_optional_deferred: true # P11D §5.3.1 ⚠ — defer to Pilot benchmark (P11D-ζ)
source_family_filter: # GPT-mandated chain families for WS-3
- internal_incomex_constitution
- internal_incomex_law
- internal_process
- sql_entity
- code_artifact
- report
- lesson
# (architecture_note / external_government_law in the 9-seed list but out of THIS sample chain)
provenance_policy:
level: full_trace # P11D §7 + Q-T5
require: [creator, method, timestamp, source_context] # mandatory post-Đ44
record_override_provenance: true # WS-2 D4 authority_override.provenance must be carried (see §7)
render_template: "pointer-only (Directus/Nuxt) — NOT authority (WS-1 G5)" # logical pointer, not designed here
output_format: "logical TopicThread shape (P11D §6.2) — not an API/UI contract"
lifecycle: proposed # logical state of this profile instance
created_by: "WS-3 logical proof (design-only)"
created_at: "2026-05-18 (logical)"
QG2 note: every block in this file is YAML / pseudo-query. There is no executable SQL anywhere.
2. Engine = generic executor over P11D §6.1 pipeline (OD-FA8)
Per WS-1 OD-FA8 (rec. (a) generic executor, defer_WS3 for code — no code now), the cross_source_topic_thread profile is not a bespoke pipeline. It is the single P11D §6.1 pipeline parameterised by the profile above, run over the union of:
universal_edgesrows where both endpoints are governed Objects/IUs (OQC ≥ 3/4) — §4 mechanism A;iu_entity_bindingrows where one side is araw_entity— §4 mechanism B;- assembly-local relations computed at assembly time, output-only, not persisted, no APR — §4 mechanism C.
P11D §5.3.1 edge whitelist is respected for the ENRICH step.
3. Logical pipeline run — GATHER → ENRICH → GROUP → QUALITY → PROVENANCE
Stage names and order are taken verbatim from P11D §6.1. Each stage is expressed as a pseudo-query (logical, not executable — QG2).
Step 1 — GATHER (P11D Q-T1: Topic → core unit set)
pseudo-Q-T1:
SELECT iu WHERE iu.content_profile.topic_labels[*].topic_code == 'collection_structure'
→ CORE_SET
Logical result (placeholders — see sample-chain file for the full chain):
CORE_SET (logical placeholder):
- N1 source_family=internal_incomex_constitution addr=ICX-CONST/NT-7-COLLECTION-ARCH
- N2 source_family=internal_incomex_law addr=ICX-LAW/DIEU-12-COLLECTION-STRUCTURE
- N3 source_family=internal_process addr=ICX-PROC/QT-COLLECTION-PROVISION-S3
- N6 source_family=report addr=RPT-COLLSTRUCT-2026/SEC-FINDINGS
- N7 source_family=lesson addr=LSN-COLLSTRUCT-2026/L-1
# N4 (sql_entity) and N5 (code_artifact) are raw_entity → NOT IU → enter via ENRICH binding, not GATHER
Plain terms: GATHER only collects things that are themselves governed units of knowledge (IUs). The live database schema and the code file are not IUs — they enter the thread later through a binding, not here.
Step 2 — ENRICH (P11D Q-T3: related units via relation edges)
pseudo-Q-T3:
FOR each iu in CORE_SET:
FOLLOW universal_edges WHERE edge_type IN relation_filter.iu_to_iu_universal_edges.core
∪ candidate(governed_by, derived_from) # candidate-gated, §6
ANNOTATE each reached unit with relation_kind
EXCLUDE contains (composition); INCLUDE contradicts WITH opposing annotation (P11D §5.3.1, GPT Q-D2)
FOR each iu in CORE_SET:
FOLLOW iu_entity_binding → attach raw_entity refs (N4 sql_entity, N5 code_artifact) by reference only
COMPUTE assembly-local relations (output-only adjacency, NOT persisted, NO APR)
→ EXTENDED_SET = CORE_SET ∪ related-IUs ∪ bound-raw-entity-refs, each annotated with {relation_kind, mechanism}
Edges traversed in this sample (mechanism column is mandatory — QG4; full per-relation table is in the sample-chain file §4):
| From | To | relation_kind | Mechanism | Status |
|---|---|---|---|---|
| N2 law | N1 constitution | governed_by | universal_edges (IU↔IU) | CANDIDATE edge — dependency-gated (§6) |
| N3 process | N2 law | implements | universal_edges (IU↔IU, core) | active core edge |
| N3 process | N4 sql_entity (raw) | binding_kind=implements | iu_entity_binding | active (raw side, §4-B) |
| N3 process | N5 code_artifact (raw) | binding_kind=implements | iu_entity_binding | active (raw side, §4-B) |
| N4 sql_entity | N5 code_artifact | (adjacency in thread) | assembly-local (output-only, not persisted) | no APR (§4-C) |
| N6 report | N2/N3 (requirement) | evidence-of | APR-gated evidenced_by OR alternative |
NOT approved — §5 |
| N7 lesson | N6 report | derived_from (lesson_from) | universal_edges (IU↔IU) | CANDIDATE edge — dependency-gated (§6) |
Step 3 — GROUP (P11D Q-T2: group by publication, then by authority tier)
pseudo-Q-T2:
GROUP EXTENDED_SET BY publication (D28 / D32 / D35 logical) # P11D Q-T2 native grouping
THEN re-tier BY authority_semantics per ordering_rule.tier_order:
normative_authority → implementation_authority → evidence_authority
AGGREGATE per group: count, lifecycle mix, source_family mix
Plain terms: first group the way P11D already groups (by publication), then arrange those groups so the rules come first, the running reality next, and the proof last — because for a cross-source thread the reader wants "what must be true → what is actually built → what proves it".
Step 4 — QUALITY (P11D Q-T4: flag missing metadata / low confidence)
pseudo-Q-T4:
FLAG iu WHERE topic_confidence < threshold (P11D-γ deferred — threshold NOT decided here)
FLAG iu WHERE missing mandatory provenance (post-Đ44)
FLAG node WHERE authority_semantics ambiguous AND no authority_override present (→ mixed-authority candidate)
SPLIT needs_review SUBSET (does not drop nodes; marks them)
Note: a node flagged "authority ambiguous" is exactly the mixed-authority case handled by WS-2 D4 authority_override (proved in the sample-chain file via N2, QG8).
Step 5 — PROVENANCE (P11D Q-T5: attach trace)
pseudo-Q-T5:
FOR each node: ATTACH provenance {creator, method, timestamp, source_context}
FOR each authority_override: ATTACH override.provenance {set_by, set_at, reason, source} # WS-2 D4
FOR each raw_entity ref: ATTACH binding.provenance + entity_reference_registry.authority_note
FOR each candidate/APR-gated relation: ATTACH gating note (candidate-not-activated / APR-not-approved)
→ Output: TopicThread object (logical shape, P11D §6.2)
4. Where each mechanism is used (QG4 — explicit)
Per the WS-1 binding-authority clarification note §2 decision table and the edge-APR-minimization note:
mechanism_A_universal_edges:
when: BOTH endpoints are governed Objects/IUs (OQC >= 3/4, P44-4A §2.1)
used_here_for:
- N2 law --governed_by--> N1 constitution # 'constrains' resolved to reuse governed_by (edge-APR note bucket 1)
- N3 process --implements--> N2 law # core edge, IU<->IU
- N7 lesson --derived_from--> N6 report # 'lesson_from' resolved to reuse derived_from + source_family=lesson
note: governed_by & derived_from are P44-4A §3.2 CANDIDATE edges (pending v1.0) — see §6 dependency
mechanism_B_iu_entity_binding:
when: ONE side is a raw_entity (not an IU; does NOT satisfy OQC; not in relation graph)
used_here_for:
- N3 process --binding_kind=implements--> N4 sql_entity (PG schema / Directus collection)
- N3 process --binding_kind=implements--> N5 code_artifact (git file / code module)
rule: bind by reference via entity_reference_registry (entity_ref_id, entity_kind, source_system,
natural_key, authority_note) + provenance; read live from SSOT at assembly time; do NOT copy
SQL/business/code data into IU text (Option D hybrid, OD-FA2)
mechanism_C_assembly_local:
when: a relation that exists ONLY inside the assembly_profile output (relation_filter/annotation),
computed at assembly time, NOT persisted as a universal_edges row
used_here_for:
- N4 sql_entity <-> N5 code_artifact adjacency in the rendered thread
(raw<->raw is out of universal_edges scope; the "same-topic" adjacency is output-only grouping)
- topic-grouping cohesion of the whole thread (the "thread" itself is an assembly-local construct)
governance: NO APR, NO edge_type vocab change (edge-APR-minimization note)
Edge-minimization order respected: reuse > iu_entity_binding > assembly-local > new edge.
5. evidenced_by handling (QG5 + QG10 — APR-gated, NOT approved)
The report node (N6) carries an evidentiary relation to the normative/requirement nodes ("this report attests the collection structure requirement was satisfied"). The natural edge for this is evidenced_by.
evidenced_by_in_this_proof:
status: APR-GATED
approved: false # NOT approved by this file; APR reserved for GPT/User
created: false # no edge type created
appears_as: "logical example only" # per GPT WS-2 review directive
classification: P44-4A §3.3 extension, OD-FA4 bucket 3 = true new durable edge,
minimized set = { evidenced_by } (exactly 1, edge-APR-minimization note)
governance_required_before_use: APR cấp medium (Đ32) + amend edge_type vocab framework (NT4)
reverse_type_if_ever_approved: evidences
alternative_paths_if_evidenced_by_NOT_available: # QG10 requires stating these
A_iu_entity_binding:
when: the proof target is a raw evidence entity (verification artifact path / report file as raw)
use: iu_entity_binding with binding_kind=evidences (+ entity_reference_registry)
apr_needed: false
B_assembly_local_relation:
when: output-only "this unit is evidenced-by that report" annotation inside the thread
use: assembly-local relation in the cross_source_topic_thread output (NOT persisted)
apr_needed: false
C_reuse_references_weakest:
when: only a read-only mention is needed (NO attestation semantics)
use: reuse core edge `references` — explicitly weaker; loses "proves performed" meaning
apr_needed: false
chosen_for_this_logical_proof: B (assembly-local) — keeps WS-3 runnable WITHOUT the new edge
Plain terms: we are not allowed to create or approve
evidenced_by. So in this paper proof we show the evidence link as an output-only annotation (alternative B). If a durable, queryable "X is proven by Y" edge is ever truly needed, that is a separate APR decision for GPT/User — flagged, not taken here.
6. governed_by / derived_from candidate dependency (non-self-resolved)
The edge-minimization result reuses governed_by (for constrains, node N2→N1) and derived_from (for lesson_from, node N7→N6). Both are P44-4A §3.2 CANDIDATE edge types pending User v1.0 ratification.
dependency:
governed_by: P44-4A §3.2 candidate — reverse `governs` — NOT yet activated
derived_from: P44-4A §3.2 candidate — reverse 1-directional — NOT yet activated
GPT_ruling: dependency_not_blocker_for_WS1_authority
but: if candidates are NEVER activated, the edge-minimization resolutions
(constrains→governed_by, lesson_from→derived_from) MUST be revisited
BEFORE WS-3 execution (risk R-WS2-2)
ws3_position: FLAGGED, not self-resolved. WS-3 is a logical proof; it does not
require the candidate edges to be physically active to be valid on paper.
If they stay inactive, the two relations fall back to assembly-local
(output-only) representation — proof still holds, queryability reduced.
This is logged in the gap/readiness file as a non-blocker for the logical proof but a must_resolve_before_production item.
7. Authority labelling per node (WS-1 §4 binding rule)
WS-1 binding-authority §4: assembly output MUST label EACH included unit as exactly one of normative_authority | evidence_authority | implementation_authority for that assembly context, default from source_family registry, replaced by unit/span authority_override (WS-2 D4) when present.
default_authority_from_source_family (WS-2 D2 seed):
internal_incomex_constitution -> normative_authority
internal_incomex_law -> normative_authority
internal_process -> normative_authority
sql_entity -> implementation_authority
code_artifact -> implementation_authority
report -> evidence_authority
lesson -> evidence_authority
override_applied_in_sample (QG8): YES — node N2 (internal_incomex_law) carries a
span-level authority_override (law body = normative_authority, explanatory
appendix span = evidence_authority). Full mechanics in the sample-chain file §5.
Span-level mechanism is a LOGICAL example; detailed span mechanics remain
deferred to a real mixed-authority pilot (WS-2 D4).
8. What this proof establishes / does not establish
established_on_paper:
- WS-1 assembly_profile (cross_source_topic_thread) field set is SUFFICIENT to express
a cross-source thread for one topic.
- WS-2 source_family registry + authority_semantics default + D4 override is SUFFICIENT
to label every node's role, including a mixed-authority node.
- The 3 mechanisms (universal_edges / iu_entity_binding / assembly-local) cover every
relation in the chain WITHOUT creating an edge type and WITHOUT approving evidenced_by.
- P11D §6.1 pipeline (GATHER→ENRICH→GROUP→QUALITY→PROVENANCE) runs end-to-end logically
over the union, with P11D §5.3.1 whitelist respected.
- Canonical address scheme <DOCPREFIX>/<L1>-<L2>-...-<Lk> is expressible for every IU node.
NOT established (out of WS-3 scope — see gap/readiness file):
- compliance_matrix full logical proof (DEFERRED, QG9)
- any physical schema / table / view / index / code
- evidenced_by as an approved/queryable edge
- candidate edge (governed_by/derived_from) physical activation
9. Forbidden respected / next
design_only: true
schema_migration: none
table_view_function_index: none
production_write: none
code_change: none
directus_mutation: none
edge_type_created: none
evidenced_by_approved: false
dry_run: none
git_commit: none
self_advance: PROHIBITED
next: STOP after 4-file upload → route to GPT/User review