KB-6F1B

S167H-INV Investigation Report: 6 System Issues

5 min read Revision 1
reports167hinvestigationsystem-issuesdieu312026-03-26

S167H-INV — Investigation Report: 6 System Issues

Date: 2026-03-26 | Agent: Claude CLI (claude-go) Mode: INVESTIGATE + FIX | No PR needed (PG-only fixes)

Phan A: DIEU TRA — 6 Issues

Bang phan loai

# ID Title Severity Classification Root Cause Action
1 2224 L1 vs PG registries total drift CRITICAL (c) Eventual consistency PG updates before Nuxt cache. Now matching (19098=19098). Auto-resolved
2 758 Legacy WATCHDOG (CTR-WATCHDOG-01) CRITICAL (c) Orphaned issue Created by legacy runner, PG runner uses MSR-D31-WATCHDOG. Manually resolved
3 2157 PG WATCHDOG — runner song CRITICAL (d) Expected behavior Liveness beacon — always FAIL by design. Keep open
4 2223 GEM_CHAOS_P1 NULL WARNING (b) False positive Chaos artifact measurement queries non-existent row. Disabled, auto-resolved
5 2225 2079 _dot_origin NULL WARNING (a) Real finding 2079 records missing _dot_origin across 18 collections. Keep per 0-AM
6 2226 2040 broken universal_edges WARNING (a) Real finding 2040 edges with broken source/target references. Keep per 0-AM

A3: Root Cause per Issue

Issue #2224 (L1 drift):

  • source_query: SUM(record_count) FROM meta_catalog WHERE identity_class = 'managed'
  • target: /api/registry/counts (Nuxt API)
  • At failure: PG=19053, Nuxt=19054 (off-by-one, cache timing)
  • At investigation: PG=19098, Nuxt=19098 (matching)
  • Verdict: Eventual consistency, auto-resolved on next run.

Issue #2223 (GEM_CHAOS_P1):

  • source_query: SELECT record_count FROM meta_catalog WHERE code = 'GEM-CHAOS-P1'
  • GEM-CHAOS-P1 row cleaned up by Codex in S167F, but measurement_registry entry was NOT
  • Every run: query returns NULL -> FAIL
  • Fix: UPDATE measurement_registry SET enabled = false

Issue #758 (Legacy watchdog):

  • violation_hash 39e10a47 = hash(CTR-WATCHDOG | CTR-WATCHDOG-01 | watchdog_fault) [legacy]
  • violation_hash ea804499 = hash(CTR-WATCHDOG | MSR-D31-WATCHDOG | watchdog_fault) [PG runner]
  • Different checkId -> different hash -> dedupe sees as separate
  • Auto-resolve excludes watchdog_fault (dedupe.js line 184)
  • Fix: Manually resolved.

A4: Watchdog Auto-Resolve Logic

dedupe.js autoResolveStale() fetches all open dieu31-runner issues EXCEPT watchdog_fault. For each: if violation_hash NOT in seenHashes from current run -> resolve. Watchdog excluded by design as liveness beacon. Bug: legacy #758 was orphaned and could never be resolved automatically.

A5: Scanner Last Run

Run ID: s167h-verify (2026-03-26 ~07:29 UTC). 10 measurements (GEM_CHAOS disabled). PASS: 7 | FAIL: 2 | ERROR: 0. Auto-resolved: 2 stale issues (#2224, #2223).

Phan B: FIXES APPLIED

Fix What How
B1 Disable chaos artifact measurement UPDATE measurement_registry SET enabled=false WHERE measurement_id='MSR-AUTO-GEM_CHAOS_P1'
B2 Resolve orphaned legacy watchdog #758 UPDATE system_issues SET status='resolved' WHERE id=758
B3 No scanner code bug found Runner code is correct
B4 Real findings kept per 0-AM #2225 (NULL), #2226 (broken edges) untouched

No code changes — PG-only data corrections.

Phan C: PRODUCTION VERIFICATION

C1: System Issues (production URL)

curl -s https://vps.incomexsaigoncorp.vn/api/registry/system-issues

{"totals":{"all":3,"critical":1,"warning":2,"info":0,"group_count":2}}

Before: 6 open (3 critical, 3 warning) After: 3 open (1 critical = watchdog, 2 warning = real findings)

C2: Health (production URL)

curl -s https://vps.incomexsaigoncorp.vn/api/health

{"status":"healthy","data_integrity":{"document_count":588,"vector_point_count":863,"ratio":1.47,"sync_status":"ok"}}

C3: Runner Output (on-demand)

Run ID: s167h-verify. PASS: 7 | FAIL: 2 | ERROR: 0. WATCHDOG: alive. Pass Rate: 77.8% (7/9). Auto-Resolved: 2 stale issues.

C4: Before/After Comparison

Metric Before After
Open issues 6 3
Critical 3 1 (watchdog)
Warning 3 2 (real findings)
False positives 1 0
Orphaned watchdog 1 0
Measurements 11 10
Pass rate 60.0% 77.8%

Tu Kiem Tra

# Question Result
1 Read Operating Rules? DAT
2 Assembly Gate? DAT
3 6 issues SQL evidence + classification? DAT
4 Root cause clear with proof? DAT
5 Watchdog auto-resolve documented? DAT
6 Fix: PG-only, no code DAT
7 Explanation clear? DAT
8 Production URL evidence? DAT
9 Report at reports/s167h-inv? DAT
10 Real data untouched? DAT

S167H-INV DONE. 6 -> 3 issues. 3 remaining = 1 watchdog + 2 real findings. No code changes needed.