dot-iu-cutter v0.1 — Cross-Temporal Semantic Threading Design
dot-iu-cutter v0.1 — Cross-Temporal Semantic Threading Design (D9)
Date: 2026-05-15 Status: DESIGN DRAFT Baseline: rev5d §12, §13.1 Scope: DESIGN ONLY.
1. Purpose
Define Semantic Threads — living professional-domain axes that connect IUs across documents, object types, and time. Define their objects, intake flow, enrichment loop, membership lifecycle, and detection signals. The thread is not an edge type; it is a higher-order construct that contains memberships, evidence, and signals.
2. Scope
- Definition of
semantic_threadand family - Semantic Intake Flow
- Semantic Enrichment Loop
- Membership lifecycle (candidate → accepted/rejected/needs_human → stale/superseded)
- Negative knowledge for rejected proposals
- Missing/wrong link detection
- Thread split/merge lifecycle
- User-directed vs system-discovered threads (both first-class)
- Industry standards usage discipline (SKOS conceptual; W3C PROV; CDC; co-citation)
Out of scope: thread-first retrieval and consumer UX (D11), health-signal handling pipeline (D3), capability intake feeding extraction quality (D4), legal mapping summary (D10).
3. Dependencies
- rev5d §12 (full), §13.1 (constraints), §13.2.4 (Đ24 vocabulary)
- C1A (units must already pass birth gate)
- Đ24 (no parallel taxonomy), Đ32 (high-risk approval), Đ33/Đ43 (PG placement), Đ37 (governance roles), Đ38 (manifest as code), Đ39 (universal_edges first), Đ44/UOSL (schema compat target), Đ0-G (birth gate)
- D1 (operational design), D2 (manifest contract), D6 (axis-2 metadata), D11 (retrieval consumer)
4. Key Decisions
4.1 Thread Definition (Q37)
A Semantic Thread is:
- A living professional-domain axis (chuỗi chuyên môn).
- Connects units across documents, object types, and time.
- Lives in PG (not a document).
- Compatible with
universal_edges(Đ39) and UOSL (Đ44) — does NOT replace them. - Has two formation sources (system-discovered + user-directed) that converge into the same governance lifecycle.
A thread is NOT an edge. It is a higher-order container.
Rationale: rev5d §12.2, §13.1.1.
4.2 Thread Object Family (Q39)
Conceptual objects (PG-first design intent; final placement in D7/D10):
| Object | Role |
|---|---|
semantic_thread |
The living domain/topic |
semantic_thread_membership |
Approved: unit X is in thread Y |
semantic_thread_candidate |
Proposed but not yet reviewed; carries confidence |
semantic_thread_evidence |
W3C PROV-style evidence record |
semantic_thread_health_signal |
Missing/wrong/stale/overbroad/too-narrow/etc. |
semantic_thread_negative_knowledge |
Persisted rejection (P9.34) |
semantic_thread_expected_chain |
JSONB declaration of expected lifecycle artifacts |
universal_edges-first rule: before proposing a separate semantic_thread_membership table, evaluate whether universal_edges (with edge_kind, status, evidence payload, lifecycle) suffices. If universal_edges is sufficient → memberships are edges with edge_kind = 'thread_member'; thread metadata lives separately. If not → flag schema gap and document the missing capability.
Rationale: rev5d §12.4, §13.1.7, §13.2.5; acceptance criterion 33, 39.
4.3 Semantic Intake Flow (Q38, Q40)
New unit/object (birth event from F1 cut, or imported artifact)
→ Extract signals:
available_now: section_type, Đ24 labels, semantic_role, tsvector keywords, citations, embedding
requires_instrumentation: co-edit, co-citation, co-retrieval
future_capability: full lifecycle-chain awareness, deep KG traversal
→ Match against existing threads (keyword, label, embedding, edge proximity)
→ Produce candidate memberships, each with confidence + evidence bundle
→ Persist to semantic_thread_candidate (status = 'proposed')
→ Auto-accept gate (see 4.4) OR route to review
→ On acceptance → semantic_thread_membership (via universal_edges or table per 4.2)
→ On rejection → semantic_thread_negative_knowledge
Evidence types (Q40):
- Structural: section_type, label, semantic_role, canonical_address.
- Reference: citation, code symbol reference, manifest cross-link.
- Statistical: embedding similarity, tsvector match, co-citation.
- Behavioral (requires_instrumentation): co-edit, co-retrieval.
- Human: user direction.
4.4 Auto-Accept Risk Gate (Đ32; rev5d §13.1.5)
A candidate may be auto-accepted ONLY when ALL hold:
- Risk class is
low(no legal/governance/code-impact implication). - ≥ 2 independent evidence signals support the membership.
- No conflicting
semantic_thread_negative_knowledgeentry for this (unit, thread) pair. - Policy explicitly allows auto-accept for this thread or domain.
Otherwise the candidate goes to candidate_for_review and routes to the appropriate reviewer per Đ37. High-risk goes to human review per Đ32.
Rationale: acceptance criterion 33; rev5d §13.1.5; Đ32.
4.5 Semantic Enrichment Loop (Q26 supporting; criterion 41)
New data → intake → candidates → evidence → AI/policy review
→ accepted: enrich thread (universal_edges + thread record)
→ rejected: persist as negative_knowledge
→ next-cycle matching uses both positive memberships and negative knowledge
→ TAC progress improves extraction; KG progress improves discovery
→ positive recursion (P10)
Negative knowledge suppresses re-proposal unless materially different evidence emerges. "Materially different" criteria (proposed for design-phase definition):
- Confidence increases by ≥ threshold (e.g., 0.2 absolute), OR
- New independent evidence type appears (e.g., previously only embedding similarity; now also citation), OR
- Source revision of one side has changed (revision-aware re-eval).
Final thresholds are policy decisions, recorded in D5 backlog.
4.6 Membership Lifecycle (Q44; criterion 33)
candidate → [review] → accepted | rejected | needs_human
accepted → [usage] → stale | superseded
rejected → [new evidence materially different] → re-candidate
needs_human → [escalation closed] → accepted | rejected
Each membership row carries: status, confidence, evidence_bundle (JSONB, PROV envelope), reviewed_by, reviewed_at, provenance ∈ {accepted_by_user, accepted_by_policy, accepted_by_ai, candidate_by_ai, rejected_by_user, rejected_by_review}.
Rationale: rev5d §13.1.4, §12.7.
4.7 Negative Knowledge (criterion 34)
semantic_thread_negative_knowledge records rejection with: unit_id, thread_id, evidence_at_rejection (JSONB), reviewed_by, reviewed_at, reason_class (semantic_mismatch, scope_too_narrow, scope_too_broad, wrong_authority, contradictory_evidence, user_explicit_reject).
Re-proposal must cite the negative-knowledge entry and demonstrate material difference.
Rationale: rev5d §13.1.8.
4.8 Missing/Wrong Link Detection (Q41, Q42; criterion 35)
Missing-link signals:
| Signal | Source |
|---|---|
high_similarity_unlinked |
Embedding clustering finds high-similarity pair with no edge |
co_retrieval_no_edge |
Retrieval logs (requires_instrumentation) |
expected_artifact_missing |
semantic_thread_expected_chain mentions an artifact type with no member |
cited_without_edge |
Reports cite a canonical_address with no universal_edge |
Wrong-link signals:
| Signal | Source |
|---|---|
reviewer_rejection |
Explicit rejection in REVIEW |
retrieval_noise |
Returned but flagged irrelevant (D11) |
contradiction |
Semantic role conflict |
low_confidence_persistent |
Confidence remained low after multiple cycles |
Thread-level anomalies (Q44):
| Signal | Trigger |
|---|---|
overbroad |
> 50 members and high heterogeneity |
too_narrow |
< 3 members and stale |
stale |
No activity > N days (N is policy) |
noisy_retrieval |
D11 retrieval flagged with high noisy_context_rate |
All signals route to Decision Backlog Registry (D5) and/or Segmentation Health Report (D3). No new notification system is created (criterion 38; Đ37).
4.9 Expected Lifecycle Chain (Q46; rev5d §13.1.6)
Each thread has an OPTIONAL expected_chain JSONB declaration:
{
"expected_artifacts": [
{"kind": "law", "authority": "enacted", "required": true},
{"kind": "design", "required": true},
{"kind": "code", "required": false},
{"kind": "test_report", "required": true},
{"kind": "incident_report", "required": false}
]
}
The intake flow detects missing required artifacts and emits expected_artifact_missing signals. The chain is a hook, not a hard schema; v0.1 ships the hook even if no chains are populated yet.
4.10 Governance Mapping (Q45; criterion 37)
Mapping to Đ37:
| Role | Threading responsibility |
|---|---|
| Owner (per thread) | Approves user-directed thread creation; resolves conflicts |
| Reviewer (Đ37 roles) | Reviews candidate_for_review memberships |
| Escalation queue (Đ37) | High-risk per Đ32 |
| Council / AI Council | Resolves contested rejections; oversees overbroad/too_narrow signals |
| Registry custodian | Decision Backlog Registry sweeps (D5) |
If no Đ37 mapping exists for a threading concern → governance gap; record in D5; do NOT invent a parallel role.
4.11 Thread Split / Merge Lifecycle (Q44; criterion 36)
Split (one thread → two): old marked superseded, references retained; memberships migrated with provenance trail; aliases preserved; Decision Backlog records the rationale.
Merge (two threads → one): both old marked superseded; new thread acquires memberships; aliases preserved; negative knowledge from both is unioned with conflict review.
A split or merge is a governance action, not an automatic operation. Auto-merge based on similarity is forbidden (rev5d P7, criterion 8).
4.12 User-Directed Threads (Q40; criterion 40)
User-directed threads are first-class, equal to AI-discovered.
- User may: create thread, assign memberships, reject AI suggestions, request discovery, merge/split threads.
- User direction is authoritative intent, not automatic graph truth (rev5d §13.1.4).
- Memberships from user direction carry
provenance = accepted_by_user. - AI may flag inconsistencies (e.g., user adds a unit that contradicts negative knowledge) but does NOT silently override the user.
- Inconsistency between user direction and AI evidence → health signal
user_ai_disagreement→ governance queue.
4.13 Industry Standards Discipline (criterion 39; rev5d §12.3, §13.1.2, §13.1.3)
- SKOS — conceptual model for thread hierarchy/relations. v0.1 does NOT introduce RDF/SPARQL/triple-store.
- BERTopic / embedding clustering — used for candidate discovery only.
- Entity linking — IU content → thread membership candidate.
- Co-citation — bibliometrics for co-occurrence; treated as evidence, not decision.
- W3C PROV — evidence envelope discipline. Minimum:
evidence_source, evidence_method, generated_by, generated_at, source_object_id, target_thread_or_object, confidence, review_status, reviewer, decision_reason. - CDC — PG trigger + NOTIFY/LISTEN for intake triggers.
- PG-native maximum: tsvector, pg_trgm, triggers, materialized views, NOTIFY/LISTEN. Qdrant for vectors.
- No new stack (Neo4j, triple store, ML pipeline) unless a gap is proven and approved per Đ32.
4.14 Positive Recursion (Q28 supporting; P10; criterion 41)
The threading subsystem is part of positive recursion:
- New TAC capabilities → improved extraction → better intake.
- New KG capabilities → improved candidate discovery.
- Approved memberships and negative knowledge → smarter next-cycle matching.
- Capability intake (D4) records these improvements.
5. PG Storage per Object (Design Intent — No DDL)
| Object | Target DB | Layer | Notes |
|---|---|---|---|
| semantic_thread | directus | Kho | Living domain record |
| semantic_thread_membership | directus | Kho | Prefer reuse of universal_edges with edge_kind='thread_member'; if reuse impossible, flag gap |
| semantic_thread_candidate | directus | Não | Lifecycle status, confidence |
| semantic_thread_evidence | directus | Não | JSONB PROV envelope |
| semantic_thread_health_signal | directus | Não | Routes to D3 / D5 |
| semantic_thread_negative_knowledge | directus | Kho | Persistent rejection memory |
| semantic_thread_expected_chain | directus | Não | JSONB on thread record |
Final placement of separate-table vs universal_edges reuse: must be resolved in D7 (UOSL compat) and D10 (Legal Alignment) per Đ33/Đ43, Đ39.
6. Schema Gaps
semantic_threadcore table — needs placement decision; not present today.universal_edgesextension — needsedge_kindvaluethread_memberand possibly status/confidence fields; current Đ39 schema gap (verify).semantic_thread_candidatelifecycle status enum — Đ24 vocabulary placement.evidence_bundlePROV envelope — JSONB shape; minimum field set listed in §4.13 but not codified.semantic_thread_negative_knowledgepersistence — no current capability for rejection memory.expected_chainJSONB — new field on thread record.semantic_thread_health_signalrouting — must reuse existing Decision Backlog Registry (D5), not new system.- CDC trigger plumbing — PG triggers / NOTIFY-LISTEN on
tac_logical_unitbirth event; instrumentation gap. - Co-edit / co-retrieval instrumentation —
requires_instrumentation(criterion 11). user_ai_disagreementhealth signal — vocabulary gap.
7. Law References
| Surface | Law |
|---|---|
| Universal edges first | Đ39 |
| Vocabulary | Đ24 |
| Risk approval | Đ32 |
| Governance roles | Đ37 |
| Manifest-as-code style evidence | Đ38 |
| Schema compat target | Đ44 |
| PG placement | Đ33 / Đ43 |
| Birth gate | Đ0-G |
8. Open Questions
- Exact threshold values for auto-accept and material-difference (defer to policy doc / D5 backlog).
- Where
semantic_threadlives ifuniversal_edgesdoesn't fit — directus vs separate DB. Defer to D7/D10. - How is "thread heterogeneity" measured for overbroad detection — embedding variance? Edge density? Defer to D3.
- Should
semantic_thread_negative_knowledgedecay over time, or is rejection permanent until material difference? Recommendation: permanent until material difference, with optional periodic review.
9. Coverage
Questions covered (primary): Q37, Q38, Q39, Q40, Q41, Q42, Q43, Q44, Q45, Q46. Questions covered (secondary): Q22, Q32.
Acceptance criteria covered:
- 31 (cross-temporal semantic threading)
- 32 (semantic intake flow)
- 33 (candidate/thread membership lifecycle)
- 34 (negative knowledge)
- 35 (missing/wrong link detection)
- 36 (thread split/merge lifecycle)
- 38 (no parallel notification — supporting)
- 39 (industry standards leverage)
- 40 (user-directed + system-discovered both first-class)
- 41 (TAC/KG progress feeds quality)
Schema gaps: 10 named (see §6).
Law dependencies: Đ24, Đ32, Đ33/Đ43, Đ37, Đ38, Đ39, Đ44, Đ0-G.
Open questions: 4 (see §8).
Law conflicts encountered: none.