S167H-INV Investigation Report: 6 System Issues
S167H-INV — Investigation Report: 6 System Issues
Date: 2026-03-26 | Agent: Claude CLI (claude-go) Mode: INVESTIGATE + FIX | No PR needed (PG-only fixes)
Phan A: DIEU TRA — 6 Issues
Bang phan loai
| # | ID | Title | Severity | Classification | Root Cause | Action |
|---|---|---|---|---|---|---|
| 1 | 2224 | L1 vs PG registries total drift | CRITICAL | (c) Eventual consistency | PG updates before Nuxt cache. Now matching (19098=19098). | Auto-resolved |
| 2 | 758 | Legacy WATCHDOG (CTR-WATCHDOG-01) | CRITICAL | (c) Orphaned issue | Created by legacy runner, PG runner uses MSR-D31-WATCHDOG. | Manually resolved |
| 3 | 2157 | PG WATCHDOG — runner song | CRITICAL | (d) Expected behavior | Liveness beacon — always FAIL by design. | Keep open |
| 4 | 2223 | GEM_CHAOS_P1 NULL | WARNING | (b) False positive | Chaos artifact measurement queries non-existent row. | Disabled, auto-resolved |
| 5 | 2225 | 2079 _dot_origin NULL | WARNING | (a) Real finding | 2079 records missing _dot_origin across 18 collections. | Keep per 0-AM |
| 6 | 2226 | 2040 broken universal_edges | WARNING | (a) Real finding | 2040 edges with broken source/target references. | Keep per 0-AM |
A3: Root Cause per Issue
Issue #2224 (L1 drift):
- source_query: SUM(record_count) FROM meta_catalog WHERE identity_class = 'managed'
- target: /api/registry/counts (Nuxt API)
- At failure: PG=19053, Nuxt=19054 (off-by-one, cache timing)
- At investigation: PG=19098, Nuxt=19098 (matching)
- Verdict: Eventual consistency, auto-resolved on next run.
Issue #2223 (GEM_CHAOS_P1):
- source_query: SELECT record_count FROM meta_catalog WHERE code = 'GEM-CHAOS-P1'
- GEM-CHAOS-P1 row cleaned up by Codex in S167F, but measurement_registry entry was NOT
- Every run: query returns NULL -> FAIL
- Fix: UPDATE measurement_registry SET enabled = false
Issue #758 (Legacy watchdog):
- violation_hash 39e10a47 = hash(CTR-WATCHDOG | CTR-WATCHDOG-01 | watchdog_fault) [legacy]
- violation_hash ea804499 = hash(CTR-WATCHDOG | MSR-D31-WATCHDOG | watchdog_fault) [PG runner]
- Different checkId -> different hash -> dedupe sees as separate
- Auto-resolve excludes watchdog_fault (dedupe.js line 184)
- Fix: Manually resolved.
A4: Watchdog Auto-Resolve Logic
dedupe.js autoResolveStale() fetches all open dieu31-runner issues EXCEPT watchdog_fault. For each: if violation_hash NOT in seenHashes from current run -> resolve. Watchdog excluded by design as liveness beacon. Bug: legacy #758 was orphaned and could never be resolved automatically.
A5: Scanner Last Run
Run ID: s167h-verify (2026-03-26 ~07:29 UTC). 10 measurements (GEM_CHAOS disabled). PASS: 7 | FAIL: 2 | ERROR: 0. Auto-resolved: 2 stale issues (#2224, #2223).
Phan B: FIXES APPLIED
| Fix | What | How |
|---|---|---|
| B1 | Disable chaos artifact measurement | UPDATE measurement_registry SET enabled=false WHERE measurement_id='MSR-AUTO-GEM_CHAOS_P1' |
| B2 | Resolve orphaned legacy watchdog #758 | UPDATE system_issues SET status='resolved' WHERE id=758 |
| B3 | No scanner code bug found | Runner code is correct |
| B4 | Real findings kept per 0-AM | #2225 (NULL), #2226 (broken edges) untouched |
No code changes — PG-only data corrections.
Phan C: PRODUCTION VERIFICATION
C1: System Issues (production URL)
curl -s https://vps.incomexsaigoncorp.vn/api/registry/system-issues
{"totals":{"all":3,"critical":1,"warning":2,"info":0,"group_count":2}}
Before: 6 open (3 critical, 3 warning) After: 3 open (1 critical = watchdog, 2 warning = real findings)
C2: Health (production URL)
curl -s https://vps.incomexsaigoncorp.vn/api/health
{"status":"healthy","data_integrity":{"document_count":588,"vector_point_count":863,"ratio":1.47,"sync_status":"ok"}}
C3: Runner Output (on-demand)
Run ID: s167h-verify. PASS: 7 | FAIL: 2 | ERROR: 0. WATCHDOG: alive. Pass Rate: 77.8% (7/9). Auto-Resolved: 2 stale issues.
C4: Before/After Comparison
| Metric | Before | After |
|---|---|---|
| Open issues | 6 | 3 |
| Critical | 3 | 1 (watchdog) |
| Warning | 3 | 2 (real findings) |
| False positives | 1 | 0 |
| Orphaned watchdog | 1 | 0 |
| Measurements | 11 | 10 |
| Pass rate | 60.0% | 77.8% |
Tu Kiem Tra
| # | Question | Result |
|---|---|---|
| 1 | Read Operating Rules? | DAT |
| 2 | Assembly Gate? | DAT |
| 3 | 6 issues SQL evidence + classification? | DAT |
| 4 | Root cause clear with proof? | DAT |
| 5 | Watchdog auto-resolve documented? | DAT |
| 6 | Fix: PG-only, no code | DAT |
| 7 | Explanation clear? | DAT |
| 8 | Production URL evidence? | DAT |
| 9 | Report at reports/s167h-inv? | DAT |
| 10 | Real data untouched? | DAT |
S167H-INV DONE. 6 -> 3 issues. 3 remaining = 1 watchdog + 2 real findings. No code changes needed.