KB-391C
S167F Chaos Re-Test Report
13 min read Revision 1
reports167fchaos-testdieu31production
S167F — Chaos Re-Test Round 2 (Expanded)
Date: 2026-03-26
Agent: Codex CLI (codex-webtest)
Repo: web-test
Mode: Production chaos test, no code changes
Mandatory Rule Check
search_knowledge("operating rules SSOT")completed..claude/skills/incomex-rules.mdread.S167D chaos test reportread.S167E chaos hardening reportread.- §0-AK quote used for scoring:
TD-379 Birth Gate: QUYẾT ĐỊNH = KHÔNG BLOCK. Giữ WARN. Điều 31 scanner PHẢI detect _dot_origin NULL và báo.
BEFORE Baseline
PG
| Metric | BEFORE |
|---|---|
trigger_count (trg_% + fn_guard_%) |
141 |
| v_registry_counts rows | 23 |
| open system_issues | 1 |
CAT-ALL.record_count |
18962 |
total dot_tools |
112 |
total entity_dependencies |
141 |
total universal_edges |
2040 |
Nuxt / Agent Data
system-issues.totals:{"all":1,"critical":1,"warning":0,"info":0,"group_count":1}/api/health.data_integrity:document_count=582,vector_point_count=851,ratio=1.46,sync_status=ok/api/health.event_system:enabled=true,listeners=1,events_logged=1330
Pre-clean for stale chaos
S167F found 3 leftover entity_dependencies rows from S167D even though S167E reported clean:
CHAOS-TEST-O4CHAOS-TEST-E3aCHAOS-TEST-E3b
They were deleted before new injections, as required by mission rule #8.
Phase 1 — 21 Original Scenarios
| # | Test | S167D | S167F | Delta | Evidence | Root cause if FAIL |
|---|---|---|---|---|---|---|
| P1 | Phantom meta_catalog | PASS | PASS | same | Inserted CHAOS-TEST-P1; scanner matched CHAOS-TEST-P1 Phantom Test — Lớp 2 thiếu cấu hình bảng registry |
|
| P2 | Phantom system_issue giả | FAIL | PASS | fixed | PG error: system_issues INSERT requires source_system or source field. Anonymous inserts blocked. |
|
| P3 | Phantom v_registry_counts | FAIL | PASS | fixed | PG error: Direct modification of v_registry_counts is blocked. |
|
| O1 | Orphan no code (x6) | FAIL | FAIL | same | dot_tools→DOT-235, taxonomy→LBL-508, checkpoint_types→CP-033, trigger_registry→TRG-087, entity_dependencies→DEP-0336; comments failed NOT NULL; scanner grep on all O1 markers returned no match |
5/6 tables auto-generated code, comments still hard-blocked by NOT NULL, so true orphan-no-code cases never reached scanner |
| O2 | Orphan no _dot_origin |
FAIL | FAIL | same | Inserted CHAOS-TEST-O2 with _dot_origin=NULL; scanner grep for `CHAOS-TEST-O2 |
DOT origin |
| O3 | meta_catalog NULL registry | PASS | PASS | same | Inserted CHAOS-TEST-O3; scanner matched Missing Registry — Lớp 2 thiếu cấu hình bảng registry |
|
| O4 | Broken dependency | FAIL | FAIL | same | Inserted CHAOS-TEST-O4 -> FAKE-SRC-999 -> FAKE-TGT-999; scanner grep returned no match |
No broken-reference detection for entity_dependencies |
| O5 | Broken edge | FAIL | FAIL | same | Inserted FAKE-EDGE-SRC -> FAKE-EDGE-TGT; scanner grep returned no match |
No broken-reference detection for universal_edges |
| N1 | Disable trigger + insert | FAIL | PASS* | improved but unstable | After hidden insert: scanner emitted GEM-CHAOS-P1 ... meta_catalog nói 116 nhưng thực tế 117 |
Detection happened, but mismatch was attributed to a duplicate meta_catalog row (GEM-CHAOS-P1) instead of CAT-006 |
| N2 | CAT-ALL vs sum | FAIL | FAIL | same | CAT-ALL=19127, atom_sum=18709 during Phase 1 |
Query invariant is not stable: CAT-ALL covers all managed rows, not only atom rows, and live production writes changed totals during mission |
| N3 | v_reg vs meta cross | FAIL | FAIL | same | Mismatches observed: CAT-006 113 vs 118, CAT-019 108 vs 107, CAT-023 17941 vs 17635, plus chaos rows while injected |
v_registry_counts and meta_catalog.record_count drift independently; duplicate registry_collection mappings amplify misattribution |
| L1 | Lifecycle open→resolved | FAIL | PASS | fixed | PG 1→2→1; API totals updated after ~30s in both directions (T30=2, RT30=1) |
|
| L2 | Mass corruption rollback | PASS | PASS | same | 500 issue inserts inside transaction gave after_insert_open=501; ROLLBACK restored to 1, chaos_left=0 |
|
| W1 | 3-way PG/Nuxt check | PASS | FAIL (transient) | regressed | At one check: PG open=2, API totals.all=1 |
Eventual-consistency/race during concurrent issue lifecycle made PG and Nuxt diverge momentarily |
| W2 | API consistency | PASS | FAIL (transient) | regressed | Same moment: totals.all=1, groups-sum=2, PG=2 |
system-issues totals lagged system-issues-groups and PG truth |
| E1 | NULL / empty code | PASS | PASS | same | NULL normalized to DOT-238; empty string normalized to DOT-239; blank-code count remained 0 |
|
| E2 | Mass insert 50 | PASS | FAIL | regressed | dot_tools actual 121→171, but CAT-006.record_count stayed 113; scanner only changed description percentage and later misattributed count mismatch via GEM-CHAOS-P1 |
Count update path for dot_tools is broken/stale; duplicate meta_catalog for dot_tools distorts detection |
| E3 | Circular dependency | FAIL | FAIL | same | Inserted CHAOS-TEST-E3a/E3b (DOT-001↔DOT-002); scanner grep returned no match |
No cycle detection on entity_dependencies |
| S1 | Watchdog alive | PASS | PASS | same | ISS-0752 open at start |
|
| S2 | Runner available | FAIL | PASS | fixed enough | Bare shell: node: command not found; with source ~/.nvm/nvm.sh, node /opt/incomex/deploys/web-test/scripts/integrity/main.js --dry-run ran and produced stdout (`PASS: 37 |
FAIL: 0 |
| S3 | Auto-resolve after cleanup | PASS | PASS | same | After cleanup, scanner grep for `CHAOS-TEST | GEM-CHAOS |
Phase 2 — Auto-System Liveness (10 scenarios)
| # | Test | Result | Evidence | Implication |
|---|---|---|---|---|
| V1 | Vector/Document count parity | FAIL | /api/health: document_count=582, vector_point_count=851, ratio=1.46 |
Exposed parity metric is far outside 5% threshold |
| V2 | Vector sync sau CRUD | PASS | Uploaded test doc with unique token; search_knowledge found it after 30s; after delete + 30s, search no longer found it |
Vector create/delete propagation works via Agent Data API path |
| V3 | Orphan vector detection | FAIL | /api/openapi.json exposed no live orphan-vector endpoint; parity gap remained 582 vs 851 |
orphan = 0 cannot be proven from live telemetry |
| A1 | Event system alive | PASS | /api/health.event_system: enabled=true, listeners=1 |
Event loop is alive |
| A2 | Directus sync active | FAIL | Agent Data list_documents(path="knowledge/") -> count=361; Directus knowledge_documents -> 370 |
Sync drift of 9 docs between Directus and Agent Data |
| A3 | Event system kill test (read-only) | PASS | docker inspect incomex-agent-data -> StartedAt=2026-03-23T14:53:04Z, restart_policy=unless-stopped |
Container will auto-restart; liveness not single-shot fragile |
| C1 | Integrity runner cron active | PASS | Crontab lines exist: cron-integrity.sh daily, watchdog-monitor.sh hourly |
Automation is scheduled |
| C2 | Runner last execution | FAIL | Only full run artifact was cron-20260323-200011.log; current cron.log ended with DATABASE_URL: unbound variable and DIRECTUS_TOKEN ... PERMISSION |
Cron exists, but successful runner execution is stale and current schedule path is broken |
| C3 | Scanner last execution | FAIL | No dedicated scanner cron entry and no recent scanner log artifact found | Scanner automation is not independently scheduled/observable |
| C4 | Watchdog heartbeat freshness | PASS | Initial watchdog last_seen_at=2026-03-25 15:35:25.611+00; later a new watchdog row ISS-1647 was opened at 2026-03-26 05:50:12Z |
Heartbeat stayed <24h throughout mission |
Phase 3 — Multi-Round Consistency
| Round | Scenarios re-tested | All PASS? | Anomalies |
|---|---|---|---|
| 1 | Full Phase 1 + Phase 2 | NO | Major misses remained on _dot_origin, broken deps/edges, circular deps, API consistency, vector parity, automation freshness |
| 2 | P2, P3, N1, V2, W1, W2 | NO | P2/P3 blocked again; V2 passed again; W1/W2 stabilized (PG=2, API totals=2, groups-sum=2); N1 did not stably detect after cleanup and only changed description counts |
| 3 | After 5-minute wait: V1, W1, W2, A1, C4, baseline | NO | V1 still failed (582/851, ratio 1.46); W1/W2 stayed stable (PG open=2, API all=2, groups-sum=2); A1 stayed alive; C4 stayed fresh; CAT-ALL continued increasing due live production traffic |
Detection Rates
- Phase 1: 11/21
- Phase 2: 5/10
- Total: 16/31
Comparison:
- S167D:
2/21 - S167F:
11/21on Phase 1, but still not production-ready
New / Confirmed Bugs
CRITICAL
- Scanner still misses
_dot_origin NULLon entity tables (O2) despite §0-AK requiring detection. - Broken
entity_dependenciesanduniversal_edgesremain invisible to Điều 31 (O4,O5). - Circular dependencies remain invisible (
E3). dot_toolscount integrity is unreliable; mass inserts can leavemeta_catalog.record_countstale (E2).- Duplicate
meta_catalog.registry_collection='dot_tools'rows (CAT-006,GEM-CHAOS-P1) plusrefresh_registry_count()usingLIMIT 1cause misattributed or missed count alerts.
HIGH
- API totals can disagree with both PG and groups endpoint (
W1,W2). - Runner cron exists but successful execution is stale; current schedule path is breaking on env/token setup (
C2). - No separate scanner automation evidence (
C3). - Agent Data vs Directus knowledge-doc sync drift remains (
A2). - Vector parity metric is far outside threshold and orphan=0 cannot be proven (
V1,V3).
MEDIUM
- O1 cannot currently exercise true orphan-no-code paths on most tested tables because auto-code or hard constraints intercept the write before scanner can prove coverage.
N1improved from S167D, but detection is not stable across rounds.
LOW
S2runner is usable only after sourcing NVM; bare non-login shell still saysnode: command not found.
Cleanup
Chaos cleanup result
All chaos rows were removed, including concurrent GEM-CHAOS-* data found on production during the mission.
Final zero-check:
meta_catalog=0v_registry_counts=0dot_tools=0taxonomy=0checkpoint_types=0trigger_registry=0entity_dependencies=0universal_edges=0system_issues=0for all%CHAOS%/%FAKE%markers.
AFTER baseline
| Metric | BEFORE | AFTER | Note |
|---|---|---|---|
| trigger_count | 141 | 141 | restored |
| v_registry_counts rows | 23 | 23 | restored |
| open system_issues | 1 | 2 | new watchdog issue ISS-1647 opened at 2026-03-26 05:50:12Z during mission |
CAT-ALL.record_count |
18962 | 19047 | live production traffic increased total during mission |
total dot_tools |
112 | 112 | restored |
total entity_dependencies |
141 | 141 | restored |
total universal_edges |
2040 | 2040 | restored |
Important evidence that AFTER drift is external, not cleanup residue:
- Final chaos-row audit was
0across every tested table. birth_registryended at17990, andCAT-023.record_countalso ended at17990.- The increase in
CAT-ALLfrom18962 -> 19047is exactly+85, matching livebirth_registrygrowth during the mission window.
Final Conclusion
- Detection rate:
16/31overall,11/21on the original S167D scenarios. - Production readiness for Điều 31+: KHÔNG.
- Primary blockers:
- Scanner coverage gaps on
_dot_origin, broken deps/edges, and circular dependencies. - Count integrity for
dot_toolsis still unreliable and misattributed under duplicatemeta_catalogmappings. - Runner/scanner automation is not healthy enough to trust unattended operation.
- API consistency and knowledge/vector parity remain unstable or unprovable.