dot-iu-cutter v0.1 — Assembly Axes and Metadata Contract
dot-iu-cutter v0.1 — Assembly Axes & Metadata Contract (D6)
Date: 2026-05-15 Status: DESIGN DRAFT Baseline: rev5d §5, §7.J Scope: DESIGN ONLY.
1. Purpose
Define the metadata contract every IU must satisfy so the system supports both assembly axes as first-class:
- Axis-1: Document Reconstruction — render units back to the original document with 0 drift.
- Axis-2: Semantic Domain Assembly — assemble units across documents by professional domain.
A segmentation decision that supports only one axis is incomplete (rev5d §5.3, P13).
2. Scope
- Axis-1 metadata fields
- Axis-2 metadata fields
- Verification methods per axis
- Edge readiness hooks (Đ39)
- UOSL compatibility hooks (Đ44)
- KG feedback hooks (criterion 19)
Out of scope: thread membership lifecycle (D9); profile mapping in detail (D7); legal review (D10).
3. Dependencies
- rev5d §5, §7.J, §13.3
- D1, D2 (manifest carries these fields)
- D9 (thread consumes axis-2 metadata)
- C1A (boundaries determined by C1A)
- Đ24 (vocabulary), Đ39 (universal_edges), Đ44 (UOSL), Đ0-G (birth gate)
4. Key Decisions
4.1 Both Axes Are First-Class (P13; criterion 23, 24)
Every IU must carry metadata sufficient for:
- Axis-1: reconstruct the source document with 0 drift.
- Axis-2: participate in cross-document professional-domain assembly.
A unit missing axis-2 metadata is incomplete even if axis-1 round-trip passes.
4.2 Axis-1 Metadata (Q31; criterion 23)
Required fields per IU (logical; placement per current TAC schema + gaps):
| Field | Purpose |
|---|---|
source_path |
Origin source identifier |
source_revision |
Exact revision at cut time |
source_span_start / source_span_end |
Byte/line span in source |
render_order |
Stable order for reconstruction |
parent_unit_id |
Canonical parent (single canonical parent rule) |
canonical_address |
Stable human-readable address (e.g. "Đ44 §5.3.1") |
publication_membership |
Which publication(s) include this unit |
body_source_policy |
inline / container / referenced / generated |
4.3 Axis-2 Metadata (Q32; criterion 24, 30)
Required fields per IU for semantic domain assembly:
| Field | Purpose |
|---|---|
section_type |
Đ24 vocabulary |
unit_kind |
Đ24 vocabulary |
classification_labels |
Đ24-controlled labels |
semantic_role |
Conceptual function within domain |
candidate_edges |
Pre-marked edges to existing units (hook for Đ39 universal_edges) |
edge_readiness_notes |
Why edges are or aren't ready |
universal_edges_compat_flag |
Indicates the unit is shaped for Đ39 reuse |
vector_projection_readiness |
Hook for Qdrant indexing |
thread_hint |
Optional system-discovered or user-directed thread reference |
lifecycle_stage_hint |
law / design / code / report / runbook etc. |
4.4 Verification per Axis (Q33; criterion 5)
Axis-1 verification: mandatory round-trip.
Render units by publication membership, by render_order
→ Compare to source revision content (byte-equivalent or canonical-form equivalent)
→ 0 drift = PASS; any drift = FAIL → rollback (D1 §4.8)
Axis-2 verification: semantic assembly test cases.
Define N assembly queries (e.g., "find all design IUs for thread X")
→ Run query → expected canonical units returned
→ Coverage threshold per policy
→ Below threshold → axis-2 FAIL (advisory in v0.1; mandatory hook)
In v0.1, axis-2 verification is a hook with advisory failure; structural reject is reserved until policy maturity. Hook is mandatory from day one.
4.5 KG Feedback Hooks (criterion 19)
candidate_edges and edge_readiness_notes are explicit hooks for KG enrichment. Even before universal_edges is fully wired:
candidate_edgescarries proposed edge targets discovered at MARK time.edge_readiness_notesdocuments why the edge is candidate (e.g., similarity, citation, semantic_role alignment).- These feed the Semantic Intake Flow (D9 §4.3) once threading is operational.
4.6 UOSL Compatibility Hooks (Q39 — supporting; criterion 25)
Each unit metadata is mapped (at v0.1, conceptually) to UOSL G1–G12 field groups:
| Axis-1/Axis-2 field | UOSL group hint |
|---|---|
| canonical_address | G1 (identity) |
| section_type / unit_kind | G2 (classification) |
| parent_unit_id / hierarchy | G3 (relations) |
| publication_membership | G4 (publication) |
| risk_class / authority | G5 (governance state) |
| candidate_edges | Relation Layer / universal_edges |
| semantic_role / labels | G6 (semantics) |
Mappings are documented in D7; gaps recorded.
4.7 No Mechanical Splitting; C1A Authority (P1)
Axis-2 metadata richness does NOT justify making units smaller for the sake of granularity. C1A 3-question test remains authoritative.
4.8 Vocabulary Discipline (Đ24)
section_type, unit_kind, classification_labels, semantic_role, lifecycle_stage_hint — all from Đ24 vocabulary. Gaps → backlog (D5), not silent invention.
4.9 PG-Driven (P14)
All axis-1 and axis-2 fields are PG-persisted on tac_logical_unit (or in JSON profile until canonical column exists). Markdown is mirror.
4.10 Diff Discipline
Field changes across manifest versions are diffable (D2 §4.11). Axis-1 changes (e.g., render_order) are usually structural; axis-2 changes (e.g., new label) are usually enrichments. The diff classifies the change for governance routing.
5. PG Storage per Object (Design Intent — No DDL)
All axis-1 and axis-2 fields live on tac_logical_unit (existing) plus its JSON profile envelope, PLUS the schema gaps below for fields not currently present.
| Object | Target DB | Layer | Notes |
|---|---|---|---|
tac_logical_unit (extended) |
directus (TAC schema) | Kho | Existing; needs new fields per §6 |
unit_classification_label |
directus | Não | Đ24 label rows |
candidate_edge |
directus | Não | Prefer universal_edges with status='candidate'; else flag gap |
semantic_role_dictionary |
directus | Não | Đ24 vocabulary |
lifecycle_stage_hint_dictionary |
directus | Não | Đ24 vocabulary |
6. Schema Gaps
canonical_address— first-class column ontac_logical_unit(current presence unclear).semantic_role— field; vocabulary placement per Đ24.classification_labels— multi-valued; current shape unclear (JSON?).candidate_edges— preferuniversal_edges(status='candidate'); if not possible, flag.edge_readiness_notes— JSONB on unit or separate table.universal_edges_compat_flag— explicit indicator.vector_projection_readiness— hook for Qdrant; integration gap.thread_hint— pre-thread membership hint; new field.lifecycle_stage_hint— Đ24 vocabulary needed.render_order— guaranteed stable across cut cycles; current behavior unclear.- Axis-2 verification policy — test query catalog and coverage thresholds.
7. Law References
| Surface | Law |
|---|---|
| Boundary rules | C1A |
| Vocabulary | Đ24 |
| Universal edges | Đ39 |
| UOSL compat | Đ44 |
| Birth gate | Đ0-G |
| PG placement | Đ33 / Đ43 |
8. Open Questions
- Should axis-2 verification block CUT in v0.1, or remain advisory? Recommendation: advisory; promote later via D4 intake.
- How are
candidate_edgesnumerically scored at MARK time? Defer to a candidate-edge specification subdoc. - Should
classification_labelscardinality be capped? Defer to Đ24.
9. Coverage
Questions covered (primary): Q31, Q32, Q33. Questions covered (secondary): Q4, Q34.
Acceptance criteria covered:
- 19 (KG feedback hooks)
- 23 (axis-1 reconstruction)
- 24 (axis-2 semantic assembly)
- 25 (UOSL/Đ44 mapping — supporting D7)
- 30 (metadata for cross-doc assembly)
Schema gaps: 11 named (see §6).
Law dependencies: C1A, Đ24, Đ33/Đ43, Đ39, Đ44, Đ0-G.
Open questions: 3 (see §8).
Law conflicts encountered: none.