KB-10AD

S175 Vaccine Schema Audit

10 min read Revision 1
reports175vaccineschema-audit

S175 Vaccine — Schema Audit Report

Date: 2026-04-09 Mode: READ-ONLY Agent: Claude Code Goal: Find tables with same "upsert by code" anti-pattern as knowledge_documents

Summary

Audited 168 tables in public schema (Directus DB).

  • 2 tables HIGH priority (have data + writer + missing UNIQUE on logical_id) — needs vaccine
  • 5 tables MEDIUM priority (have logical_id col + missing UNIQUE, low data or no active writer)
  • Rest LOW (proper UNIQUE in place or no logical_id)

The biggest finding: system_issues.violation_hash — used by S174-FIX-04 DOTs (env-contract-check, logrotate-config-check, rsyslog-health-check) and original S175 reportConfigError — already has 2 OPEN duplicate pairs from dieu31-runner. Same class of bug as the original S175 root cause but in a different table.


Inventory: Tables Audited

public schema, excluding directus_* system tables: 168 tables.

Tables with logical_id columns: 65 (path, slug, code, source_id, hash, key, file_path, dimension)

Tables WITH UNIQUE on logical_id (good): 35

agents, agent_views, approval_requests, binding_registry, block_library, checkpoint_sets, checkpoint_types, collection_groups, collection_registry, comments, derived_objects_registry, design_templates, dot_domains, dot_tools, entity_dependencies, entity_species, governance_registry, kg_thresholds, knowledge_documents (S175 fix), languages, meta_catalog, modules, normative_registry, pivot_definitions, system_issues (code only), table_proposals (only PK), taxonomy, taxonomy_facets, trigger_registry, ui_pages, universal_edges, universal_rule_violations, workflow_categories, workflow_change_requests, workflow_steps

Tables MISSING UNIQUE on logical_id: 9 candidates examined

Table Logical col Has data? Has writer? Distinct/Total Priority
system_issues violation_hash 165 (2 dup pairs OPEN) YES (S174 DOTs) 148/165 HIGH
system_issues business_logic_hash 165 YES 12/165 low (collision intentional)
categories slug 1 unclear 1/1 LOW (1 row)
posts slug 2 unclear 2/2 LOW
forms key 0 unclear 0/0 LOW (empty)
tasks code 10 YES (Directus Flow) 10/10 MED
workflow_change_requests code 3 unclear 3/3 LOW
registry_changelog code 0 (15317 rows but code col mostly NULL) unclear 0/0 LOW (audit log)
help_articles slug 3 unclear -/3 LOW
help_collections slug -/- unclear -/- LOW

HIGH Priority Findings

H1. system_issues.violation_hash — Same root cause as S175

Used by:

  • S174-FIX-04: env-contract-check.sh, logrotate-config-check.sh, rsyslog-health-check.sh
  • S175 P1: directus_sync.py original reportConfigError for crash reporting
  • Internal Directus dedupe via dedupe.js

Schema gap:

SELECT indexdef FROM pg_indexes WHERE tablename = 'system_issues';
→ system_issues_pkey                          (id)
→ idx_system_issues_system_issues_code_unique (code) -- auto-gen by Directus, not logical id
→ idx_issues_open                             (status WHERE status='open')

No unique on violation_hash. code is Directus auto-gen (ISS-XXXX), not the dedupe key.

Evidence of bug already manifest:

SELECT violation_hash, status, COUNT(*) FROM system_issues
WHERE violation_hash IS NOT NULL
GROUP BY violation_hash, status HAVING COUNT(*) > 1;
violation_hash   |  status  | count
-----------------+----------+------
44815e74c3d0f846 | open     | 2     ← active duplicate
9cc0eb3d7958f6cf | open     | 2     ← active duplicate
0653e4f1c9c519d6 | resolved | 4
1bbc0770d083c274 | resolved | 4
aa8df24153c26d4b | resolved | 4
... (5 more resolved with 2-3x dup)

The 2 OPEN dup pairs:

id   | violation_hash   | issue_class | source_system  | last_seen_at
-----+------------------+-------------+----------------+---------------------------
3896 | 44815e74c3d0f846 | sync_fault  | dieu31-runner  | 2026-04-09 10:00:25.802+00
3994 | 44815e74c3d0f846 | sync_fault  | dieu31-runner  | 2026-03-30 10:00:15.042+00
3884 | 9cc0eb3d7958f6cf | sync_fault  | dieu31-runner  | 2026-04-09 10:00:17.869+00
3999 | 9cc0eb3d7958f6cf | sync_fault  | dieu31-runner  | 2026-04-01 04:00:27.476+00

Mechanism (same as S175 root cause):

  1. dedupe.js calls findExistingIssue(violationHash) with limit=1
  2. If 2 simultaneous calls don't find each other → both POST → 2 rows
  3. Or: previous resolved row counted as "not found" → new row created while old still pending re-open
  4. No DB-level UNIQUE catches it

Reference: universal_rule_violations already has the right pattern:

CREATE UNIQUE INDEX uq_open_violation_hash
ON public.universal_rule_violations USING btree (violation_hash)
WHERE ((status)::text = 'open'::text)

Proposed vaccine (not applied):

-- Step 1: dedupe existing OPEN duplicates (manual review needed)
-- Step 2:
CREATE UNIQUE INDEX CONCURRENTLY idx_system_issues_open_violation_hash_unique
ON public.system_issues USING btree (violation_hash)
WHERE status = 'open' AND violation_hash IS NOT NULL;

Effort: ~30 min (dedupe 2 OPEN pairs + create index). LOW risk if done in maintenance window.

H2. knowledge_documents.source_id — Already fixed in S175

For completeness: this is the table fixed in S175 P2 with idx_kd_current_source_id_unique. No further action needed.


MEDIUM Priority Findings

M1. tasks.code

  • 10 rows, all distinct codes (no current duplicate)
  • code looks like TSK-XXXX Directus auto-gen pattern
  • Found in writer search: items/tasks (agent-data writes via REST)
  • No unique constraint on code (only PK on id)

Risk: If 2 simultaneous task creates collide on the auto-gen sequence, duplicate codes possible. Currently no evidence of duplicates.

Vaccine (deferred):

CREATE UNIQUE INDEX idx_tasks_code_unique ON tasks (code) WHERE code IS NOT NULL;

M2. Other low-data tables

  • categories.slug (1 row), posts.slug (2 rows), help_articles.slug (3 rows), help_collections.slug
  • Sit at "could become an issue" stage but not currently corrupted
  • Defer until data growth or active writer identified

LOW Priority

L1. Tables with data but logical_id well-protected

35 tables have proper UNIQUE constraints on their logical_id columns. No vaccine needed.

L2. Log/audit tables with no logical_id semantics

  • birth_registry (34084 rows) — trigger-managed, no logical_id
  • registry_changelog (15317 rows) — audit log, no logical_id
  • measurement_log (5050 rows) — log, dedupe by composite (run_id, measurement_id)
  • lifecycle_log (827 rows) — append-only log
  • universal_rule_violations (1741 rows) — has uq_open_violation_hash partial unique ✓

No vaccine needed.


Matrix Summary

Priority Table Logical col Issue Vaccine effort
HIGH system_issues violation_hash 2 OPEN duplicates already ~30 min
MED tasks code No constraint, no current dup ~15 min
LOW categories,posts,forms,help_*,workflow_change_requests,registry_changelog slug/key/code Low data, no current dup defer
LOW 35 other tables various UNIQUE in place none

Recommendations for Next Session

Priority 1: system_issues.violation_hash vaccine

Why now: Already has 2 active OPEN duplicates from dieu31-runner. The S174-FIX-04 DOTs we shipped this week WILL create more duplicates over time without this constraint. Same root cause as S175.

Steps:

  1. Manual review of 2 OPEN dup pairs (3896/3994, 3884/3999) — likely both in each pair are stale/equivalent
  2. Resolve the older row in each pair (set status='resolved')
  3. Apply partial UNIQUE: CREATE UNIQUE INDEX ... WHERE status='open' AND violation_hash IS NOT NULL
  4. Update dedupe.js and S174 DOT scripts to handle UNIQUE violation gracefully (catch and reuse existing)

Priority 2: tasks.code vaccine

Defensive only. Not blocking, but low effort.

Priority 3: reconcile-knowledge.py cleanup

From C3 closure verify: this script bypasses the atomic writer. Either:

  • Refactor to call _sync_document_atomic directly
  • Or document as deprecated and add a warning at top of file
  • Or remove if no longer needed

Unknowns / Need clarification

  1. tasks.code writer source: Found in agent-data grep but unclear if it's actively writing tasks or just reading. Need to check if Directus Flows are also creating tasks (likely yes per memory).

  2. categories, posts, etc.: Only 1-3 rows each. Unclear if these are seeds or actively used. If not used, can defer indefinitely.

  3. registry_changelog: 15317 rows but code mostly NULL — likely an audit log where code references entity_codes from other tables. Schema audit can't determine semantics without code reading.

Constraints Honored

  • READ-ONLY: 0 INSERT/UPDATE/DELETE/DDL executed (except inside ROLLBACK transaction)
  • No fix attempted, even on HIGH priority
  • Scope limited to directus DB
  • All claims backed by SQL output or grep result