KB-7D5B

S175 Followup — README Duplicate Groups

3 min read Revision 1
s175-followupreadmeduplicateout-of-scope

S175 Followup — README Duplicate Groups

Status: OPEN — out of scope S175, deferred to next session Tag: s175-followup Date: 2026-04-09

Discovery

During S175 P3 backfill of 15 NULL source_id rows, Claude Code discovered that 3 of the 15 NULL rows are the NULL side of 3 README duplicate pairs. Backfilling them would have triggered idx_kd_current_source_id_unique partial UNIQUE violation and rolled back the entire backfill transaction.

6 Rows in 3 Duplicate Groups

file_path id (NULL) id (set) source_id (set side)
knowledge/current-state/README.md 298 345 agentdata:knowledge/current-state/README.md
knowledge/current-tasks/README.md 299 346 agentdata:knowledge/current-tasks/README.md
knowledge/other/README.md 301 347 agentdata:knowledge/other/README.md

Note: knowledge/dev/README.md (id=300) was NOT a duplicate — it was backfilled successfully in S175 P3 VIỆC 2.

Evidence (P0 snapshot)

Snapshot: /root/backup/s175-knowledge_documents-20260409T072635Z.sql

SELECT id, file_path, source_id, is_current_version, version_number
FROM public.knowledge_documents
WHERE id IN (298, 299, 301, 345, 346, 347)
ORDER BY file_path, id;

All 6 rows have is_current_version=true. Both rows in each pair pass the sidebar query filter (status='published' AND is_current_version=true), so Nuxt sidebar renders 2 entries per README.

Why Out of Scope for S175

S175 scope: fix Directus drift root cause (commit bug + backfill 12 clean rows). Cleaning README dups requires:

  1. Read content of BOTH rows in each pair (NULL side may have different content)
  2. Decide canonical row (newest? content-richer? source-marked?)
  3. Possibly merge content from one to other
  4. Archive non-canonical row (set is_current_version=false)
  5. Verify partial UNIQUE constraint passes after archive

This is data-loss-risky and requires content-level decisions, not a mechanical fix.

Next Session Plan

  1. SELECT id, content FROM knowledge_documents WHERE id IN (298,299,301,345,346,347);
  2. diff content of each pair
  3. Decide canonical per pair
  4. BEGIN; UPDATE ... SET is_current_version=false WHERE id=<non_canonical>; COMMIT;
  5. Verify only 3 current rows remain for the 3 README paths
  6. Consider adding source_id NOT NULL constraint after all NULL rows resolved

Constraint Consideration

After resolving all NULL source_id rows, schema should add:

ALTER TABLE knowledge_documents
  ALTER COLUMN source_id SET NOT NULL;

This prevents future writes from creating NULL source_id rows that bypass the partial UNIQUE constraint.

  • /root/backup/s175-knowledge_documents-20260409T072635Z.sql — P0 snapshot
  • /opt/incomex/docker/agent-data-repo/agent_data/directus_sync.py — atomic writer (fixed in S175 P3)
  • knowledge/current-state/reports/s175-fix-execution-v3.md — S175 P3 report