KB-886A

S175-VACCINE-01 system_issues Dedupe + Partial UNIQUE

14 min read Revision 1
reports175v01vaccinesystem_issuespartial-unique

S175-VACCINE-01 — system_issues Dedupe + Partial UNIQUE

Date: 2026-04-09 Session: S175-VACCINE-01 Agent: Claude Code Mode: Edit allowed (DDL + DML, scope hẹp system_issues only)

Summary

Fixed gốc vĩnh viễn cho public.system_issues — bảng sử dụng "upsert by code" pattern (không có DB-level constraint) đã có 2 OPEN duplicate pairs từ dieu31-runner. Áp dụng pattern S175 + reference từ universal_rule_violations:

  1. Dedupe 2 OPEN orphan rows (set status='archived')
  2. Tạo partial UNIQUE index trên (violation_hash) WHERE status='open' AND violation_hash IS NOT NULL
  3. Defer writer refactor per §FIX GỐC (constraint là gốc fix, fail loud > fail silent)

End-to-end verified bằng 3 hardtest scenarios, gồm 1 lần chạy thật Đ31 cron với 117 dedupe.js calls — 0 OPEN duplicates tạo mới.

Files Modified

  • DB: public.system_issues (2 rows updated, 1 index added)
  • KHÔNG sửa code (writer dedupe.js giữ nguyên)

Backup

/root/backup/s175v01-system_issues-20260409T104916Z.sql (375 KB, data-only dump)


P0 — Discovery

Schema findings

Table "public.system_issues"
- id (PK), code (Directus auto-gen ISS-XXXX, UNIQUE), violation_hash, status, ...
- CHECK constraint: status IN ('open','resolved','archived')
  → 'superseded' KHÔNG được phép, dùng 'archived'
- Existing indexes:
  * system_issues_pkey (id)
  * idx_system_issues_system_issues_code_unique (code)  -- Directus auto-gen, không phải logical id
  * idx_issues_open btree on status WHERE status='open'  -- KHÔNG unique
- Triggers:
  * trg_guard_system_issues_insert: requires source_system or source non-empty
  * Other: birth_registry, auto_code, auto_label

2 OPEN duplicate pairs verified

id   | violation_hash   | code     | first_seen_at        | last_seen_at         | occurrence_count
3896 | 44815e74c3d0f846 | ISS-2380 | 2026-03-28 11:00:10  | 2026-04-09 10:00:25  | 19  ← canonical (active)
3994 | 44815e74c3d0f846 | ISS-2478 | 2026-03-30 10:00:15  | 2026-03-30 10:00:15  | 1   ← orphan (never re-seen)
3884 | 9cc0eb3d7958f6cf | ISS-2368 | 2026-03-28 11:00:08  | 2026-04-09 10:00:17  | 19  ← canonical (active)
3999 | 9cc0eb3d7958f6cf | ISS-2483 | 2026-04-01 04:00:27  | 2026-04-01 04:00:27  | 1   ← orphan

Pattern: Older row in each pair (lower id, first_seen earlier) is the active one that dedupe.js keeps updating. Newer row was created during a temporary state where findExistingIssue() failed to find the active one (probably during a status flicker open→resolved→open by autoResolveStale logic, see followup section).

Reference pattern (universal_rule_violations)

CREATE UNIQUE INDEX uq_open_violation_hash
ON public.universal_rule_violations USING btree (violation_hash)
WHERE ((status)::text = 'open'::text)

Same DB, same dev team, same intent — proven pattern to copy.

Writer found: nuxt-repo/scripts/integrity/dedupe.js

Same upsert-by-code anti-pattern as original S175 directus_sync.py:

// Line 28
const url = `${DIRECTUS_URL}/items/system_issues?filter[violation_hash][_eq]=${violationHash}&filter[status][_eq]=open&limit=1`;
const resp = await fetch(url, ...);
return data?.data?.[0] || null;

// Line 123-141
const existing = await findExistingIssue(violationHash);
if (existing) {
  await updateIssue(existing.id, ...);  // PATCH
} else {
  const issue = await createIssue({...});  // POST  ← race window
}

No DB-level lock, no atomic transaction, no UNIQUE constraint catching the race. Used by S174-FIX-04 DOTs (env-contract-check, logrotate-config-check, rsyslog-health-check) and Đ31 runner main.js (PG-vs-Nuxt 117 measurement check).


P1 — Dedupe (atomic)

BEGIN;
UPDATE public.system_issues
SET status='archived',
    resolution='S175v01: archived as duplicate of canonical id=3896',
    resolved_at=NOW()
WHERE id = 3994;
UPDATE public.system_issues
SET status='archived',
    resolution='S175v01: archived as duplicate of canonical id=3884',
    resolved_at=NOW()
WHERE id = 3999;
SELECT violation_hash, COUNT(*) AS open_dup
FROM public.system_issues
WHERE status='open' AND violation_hash IS NOT NULL
GROUP BY violation_hash HAVING COUNT(*) > 1;
COMMIT;

Output:

BEGIN
UPDATE 1
UPDATE 1
 violation_hash | open_dup
----------------+----------
(0 rows)
COMMIT

Post-commit verify

id   | violation_hash   | status   | resolution                                          | resolved_at
3884 | 9cc0eb3d7958f6cf | open     |                                                     |
3896 | 44815e74c3d0f846 | open     |                                                     |
3994 | 44815e74c3d0f846 | archived | S175v01: archived as duplicate of canonical id=3896 | 2026-04-09 10:57:01.228876
3999 | 9cc0eb3d7958f6cf | archived | S175v01: archived as duplicate of canonical id=3884 | 2026-04-09 10:57:01.228876

Canonical rows (3884, 3896) preserved as 'open'. Orphans (3994, 3999) archived with resolution note. 7 RESOLVED duplicate groups untouched (history preserved).


P2 — Partial UNIQUE Constraint

CREATE UNIQUE INDEX idx_sysissue_open_violation_hash_unique
ON public.system_issues USING btree (violation_hash)
WHERE ((status)::text = 'open'::text) AND violation_hash IS NOT NULL;

Output: CREATE INDEX

Verify index exists

indexname: idx_sysissue_open_violation_hash_unique
indexdef:  CREATE UNIQUE INDEX idx_sysissue_open_violation_hash_unique
           ON public.system_issues USING btree (violation_hash)
           WHERE (((status)::text = 'open'::text) AND (violation_hash IS NOT NULL))

Reject test (BEGIN+ROLLBACK)

BEGIN;
INSERT INTO public.system_issues (
  title, status, severity, source_system, issue_class,
  violation_hash, business_logic_hash, first_seen_at, last_seen_at, occurrence_count
) VALUES (
  'P2 TEST DUPLICATE', 'open', 'WARNING', 'dieu31-runner', 'sync_fault',
  '44815e74c3d0f846', 'aaaaaaaaaaaaaaaa', NOW(), NOW(), 1
);
ROLLBACK;

Output:

BEGIN
ERROR: duplicate key value violates unique constraint "idx_sysissue_open_violation_hash_unique"
DETAIL: Key (violation_hash)=(44815e74c3d0f846) already exists.
ROLLBACK

PG correctly rejects. Constraint enforcing 1-OPEN-row-per-violation_hash invariant.


P3 — Writer Refactor: SKIPPED per §FIX GỐC

Decision

dedupe.js writer refactor DEFERRED. Lý do:

  1. DB partial UNIQUE IS the gốc fix. After P2, dedupe.js race window will hit the constraint at the POST step → DB rejects → fail LOUD (HTTP 4xx + clear error message in container logs) thay vì fail SILENT (tạo duplicate row, không ai biết).

  2. Fail loud > fail silent. The DOT scripts using dedupe.js will log a clear error line whenever the race fires: [ERROR] Create issue failed: HTTP 400 — duplicate key value violates unique constraint This is observable. The previous behavior was silent corruption.

  3. Refactor cost > benefit. dedupe.js is JavaScript using Directus REST API. Refactoring to atomic SQL transaction would require:

    • Add pg npm package to scripts/integrity/
    • Rewrite findExistingIssue/createIssue/updateIssue to use direct PG connection
    • Manage transactions explicitly
    • ~100+ lines of new code The cost is high while the benefit is just "cleaner error handling".
  4. Optional minor improvement (deferred to followup): In createIssue catch block, detect HTTP 400 + UNIQUE violation pattern, then re-fetch existing row and PATCH instead. ~10 lines change, no new dependencies. Rationale: only do this if log noise from UNIQUE violations exceeds 5/day.

Followup trigger

If S175v01-followup needed: monitor docker logs incomex-agent-data and container running dedupe.js for Create issue failed: HTTP 400 patterns. If

5/day, add the optional 10-line catch.


P4 — Hardtest 3 kịch bản

K1: DOT chạy lại không sinh duplicate OPEN mới — PASS

Triggered Đ31 cron-integrity.sh which runs:

  • env-contract-check.sh
  • logrotate-config-check.sh
  • rsyslog-health-check.sh
  • main.js Đ31 PG-driven runner (129 measurements, 117 use dedupe.js for sync_fault)
=== Cron output (last lines) ===
═══════════════════════════════════════
  PASS: 10 | FAIL: 117 | ERROR: 1
  WATCHDOG: alive
  Pass Rate: 7.9% (10/127)
  Issues Created: 0 | Reopened: 117
  Results logged to measurement_log (run_id: cron-20260409-125754)
═══════════════════════════════════════
=== Post-cron OPEN dup check ===
SELECT COUNT(*) AS open_dup_count FROM (
  SELECT violation_hash FROM system_issues
  WHERE status='open' AND violation_hash IS NOT NULL
  GROUP BY violation_hash HAVING COUNT(*) > 1
) x;
→ open_dup_count = 0
=== Total system_issues by status ===
status   | count
---------+-------
archived |     8   (was 6 + 2 from P1)
open     |   402
resolved |   658

117 dedupe calls fired, none created new duplicates. Issues Created: 0 | Reopened: 117 shows all 117 dedupe.js calls correctly went through the UPDATE branch (existing found), not the CREATE branch. No HTTP 400 in logs (no race triggered this run).

K2: INSERT duplicate OPEN hash → DB reject — PASS

BEGIN;
INSERT INTO public.system_issues (..., status='open', violation_hash='9cc0eb3d7958f6cf', ...);
ROLLBACK;

Output:

BEGIN
ROLLBACK
ERROR: duplicate key value violates unique constraint "idx_sysissue_open_violation_hash_unique"
DETAIL: Key (violation_hash)=(9cc0eb3d7958f6cf) already exists.

K3: INSERT same hash with status='resolved' → allowed — PASS

BEGIN;
INSERT INTO public.system_issues (..., status='resolved', violation_hash='44815e74c3d0f846', ...);
SELECT COUNT(*) AS open_dup_after_insert FROM ... HAVING COUNT(*) > 1;
SELECT id, status, violation_hash FROM system_issues WHERE violation_hash = '44815e74c3d0f846' ORDER BY id;
ROLLBACK;

Output:

BEGIN
INSERT 0 1   ← INSERT succeeded (not blocked)
 open_dup_after_insert
-----------------------
                     0   ← still 0 OPEN dups
 id   | status   | violation_hash
------+----------+------------------
 3896 | open     | 44815e74c3d0f846   ← canonical preserved
 3994 | archived | 44815e74c3d0f846   ← P1 archive
 4299 | resolved | 44815e74c3d0f846   ← K3 historical insert OK
ROLLBACK

Constraint correctly chỉ chặn OPEN duplicate, không chặn historical (resolved/archived). Historical inserts work fine. Multiple resolved/archived rows per hash allowed.


Verification Checklist (4 evidence)

  1. SELECT COUNT(*) GROUP BY violation_hash HAVING COUNT>1 WHERE status='open'0
  2. Index exists: idx_sysissue_open_violation_hash_unique confirmed in pg_indexes
  3. Reject test: Clear error duplicate key value violates unique constraint "idx_sysissue_open_violation_hash_unique"
  4. Đ31 cron rerun: 117 dedupe calls, 0 new OPEN duplicates ✓

3 Bài Học (Lessons Learned)

1. Fail loud > fail silent

DB constraint là defense in depth — không sửa được code thì sửa schema. Constraint fail tạo HTTP 4xx error rõ ràng, observable trong logs. Code race tạo silent duplicate corruption không ai biết. Trade-off rõ ràng: thà có error log thừa còn hơn có dữ liệu lệch âm thầm.

2. Reference pattern trong cùng DB là gold

universal_rule_violations.uq_open_violation_hash đã có sẵn trong cùng DB, cùng dev team, cùng intent ("1 OPEN per violation_hash"). Copy nguyên pattern, không cần thiết kế lại. Giảm risk + prove-by-precedent ("ai đó đã làm pattern này, dùng đã chạy → an toàn").

3. Writer refactor không phải lúc nào cũng cần

§FIX GỐC nói "giải quyết vấn đề vĩnh viễn". DB constraint là gốc fix vĩnh viễn cho ANTI-PATTERN "upsert by code race". Refactor writer là fix-for-completeness, không phải fix-for-correctness. Nếu constraint đủ chặn, defer refactor để giảm churn + tránh introduce bugs mới. "Done" ≠ "comprehensive".


Known Limitations / Followups

Out of S175v01 scope (intentional)

  1. dedupe.js writer refactor — Defer per §FIX GỐC. Trigger followup nếu HTTP 400 noise > 5/day in container logs.

  2. 7 RESOLVED duplicate groups — Untouched, history preserved:

    • aa8df24153c26d4b × 4 (resolved)
    • 1bbc0770d083c274 × 4 (resolved)
    • 0653e4f1c9c519d6 × 4 (resolved)
    • b5aa1abecdc9a51c × 3 (resolved)
    • ee6d4f80782fefcc × 3 (resolved)
    • 0259368a80e0eeba × 2 (resolved)
    • bbb961536e55e0ac × 2 (resolved)

    These are historical artifacts. Constraint doesn't block resolved duplicates. Cleanup is cosmetic only (remove older resolved row in each group). Defer.

  3. reconcile-knowledge.py bypass — From S175 closure verify report. Still pending cleanup. Not in this scope.

  4. tasks.code MED priority — From vaccine audit. Defensive only, defer.

  5. README duplicate cleanup — From s175-readme-duplicates-followup.md. Defer.

autoResolveStale interaction note

The mechanism that probably created the original duplicates:

  1. Day N: dedupe.js creates row A (status=open, hash=H)
  2. Day N+1: violation temporarily missing → autoResolveStale resolves row A
  3. Day N+2: violation returns → findExistingIssue (status=open filter) doesn't find → creates row B (status=open)
  4. Day N+3: somehow row A re-opens → 2 OPEN rows with same hash

After P2, step 3 will fail with UNIQUE violation. The dedupe.js code should be audited for whether it re-opens resolved rows correctly (a future improvement).

Image / Container State

No code change. No image rebuild. No container restart. DB changes only:

  • 2 row UPDATEs (P1)
  • 1 INDEX CREATE (P2)

Backup: /root/backup/s175v01-system_issues-20260409T104916Z.sql Rollback: docker exec -i postgres psql -U directus -d directus < <backup> then DROP INDEX idx_sysissue_open_violation_hash_unique