KB-3CB9
S167H Codex Chaos + Automation Audit Report
16 min read Revision 1
reports167hcodexchaos-testautomation-auditdieu31production2026-03-26
S167H — Codex Chaos Test + Automation Gap Audit
Date: 2026-03-26
Agent: Codex CLI (codex-webtest)
Repo: web-test
Mode: Production chaos test + automation audit, no code changes
Codex prefix: CHAOS-R3-CDX-
Rule Check
search_knowledge("operating rules SSOT")completed..claude/skills/incomex-rules.mdread.handoff S140,S167F chaos retest report,S167H-FIX data quality report,S167G scanner hardening report,automation statusread.- Operational rule used for scoring:
§0-AM: entity-table bad records may enter; scanner must detect.§0-AN: infrastructure-table corruption must be guard-blocked.§0-AP: fix root = investigate → guard → fix → verify.§0-AQ: dual-agent cleanup must delete only own prefix.
BEFORE Baseline
PG
| Metric | BEFORE |
|---|---|
| trigger_count | 141 |
| v_registry_counts rows | 23 |
| open system_issues | 1 |
CAT-ALL.record_count |
19129 |
total dot_tools |
112 |
total entity_dependencies |
141 |
total universal_edges |
2039 |
Production URLs
/api/registry/system-issues:{"all":1,"critical":1,"warning":0,"info":0,"group_count":1}/api/health.data_integrity:document_count=591,vector_point_count=871,ratio=1.47,sync_status=ok/api/health.event_system:enabled=true,listeners=1,events_logged=1432
Chaos Score Card
| # | Test | S167F | S167H | Delta | Evidence |
|---|---|---|---|---|---|
| P1 | Phantom meta_catalog | PASS | PASS | same | Inserted CHAOS-R3-CDX-P1; DOT audit grep matched CHAOS-R3-CDX-P1 ... Lớp 2 thiếu cấu hình bảng registry. |
| P2 | Phantom system_issue | PASS | PASS | same | Direct insert blocked twice; round-2 retest error: system_issues INSERT requires source_system or source field. |
| P3 | Phantom v_registry_counts | PASS | PASS | same | Direct insert blocked twice; round-2 retest error: Direct modification of v_registry_counts is blocked. |
| O1 | Orphan no code (x6) | FAIL | FAIL | same | taxonomy -> LBL-509, trigger_registry -> TRG-088; dot_tools, checkpoint_types, entity_dependencies, task_comments rejected because default _dot_origin='DOT:UNKNOWN' fails validator, not because of code policy. |
| O2 | Orphan no _dot_origin |
FAIL | PASS | improved | Inserted CHAOS-R3-CDX-O2; measurement_log run s167h-cdx-o2 shows `MSR-D31-A1 |
| O3 | meta_catalog NULL registry | PASS | FAIL | regressed | Inserted CHAOS-R3-CDX-O3; scanner grep for CHAOS-R3-CDX-O3 returned no match; row still auto-added into v_registry_counts. |
| O4 | Broken dependency | FAIL | PASS | improved | Inserted CHAOS-R3-CDX-O4; measurement_log run s167h-cdx-o4 shows `MSR-D31-A2 |
| O5 | Broken edge | FAIL | PASS | improved | Inserted valid universal_edges row with fake codes; measurement_log run s167h-cdx-o5c shows `MSR-D31-A3 |
| N1 | Disable trigger + hidden insert | PASS* | FAIL | regressed | CAT-006.record_count 165, live COUNT(dot_tools)=166; runner run s167h-cdx-n1 did not detect a new count fault; D26 checks are still method=1. |
| N2 | CAT-ALL vs sum | FAIL | FAIL | same | After cleanup: CAT-ALL=20640, atom_sum=20226. |
| N3 | v_reg vs meta cross | FAIL | FAIL | same | After cleanup, mismatch still exists: `CAT-023 |
| L1 | Lifecycle open→resolved | PASS | PASS | same | Inserted CHAOS-R3-CDX-L1 with source_system; PG open 4→5→4; API stayed 4, then 5 at 09:05:57Z, then back to 4 at 09:08:11Z. |
| L2 | Mass corruption rollback | PASS | PASS | same | Transaction evidence: BEGIN, open 4, after insert 504, ROLLBACK, restored 4, residue 0. |
| W1 | 3-way PG/Nuxt check | FAIL | PASS | improved | Stable round-3 snapshot: PG open 4, API totals 4, groups total 4. |
| W2 | API consistency | FAIL | PASS | improved | Round-3 /system-issues and /system-issues-groups both report all=4; groups sum=4. |
| E1 | NULL / empty code | PASS | FAIL | regressed | INSERT dot_tools(code=NULL) and code='' both failed before normalization with DOT origin rejected ... got: DOT:UNKNOWN. |
| E2 | Mass insert 50 | FAIL | PASS | improved | Inserted 50 dot_tools; CAT-006 165→215, live count 165→215, inserted row count 50; cleanup restored. |
| E3 | Circular dependency | FAIL | PASS | improved | Inserted CHAOS-R3-CDX-E3A/B; measurement_log run s167h-cdx-e3 shows `MSR-D31-A4 |
| S1 | Watchdog alive | PASS | PASS | same | Open watchdog issue exists: ISS-1647; round-3 freshness 5.1 minutes stale. |
| S2 | Runner available | PASS | FAIL | regressed | Raw mission command runs Node but falls back to legacy dry-run: Token NOT SET, DB NOT SET, PG connection failed, falling back to legacy. |
| S3 | Auto-resolve after cleanup | PASS | PASS | same | Post-cleanup runner s167h-cdx-postcleanup returned A1=0; only Gemini baseline anomalies remained (A2=3, A3=1, A4=2); Codex prefix residue verified 0 across all tables. |
| V1 | Vector/Document parity | FAIL | PASS | improved | /api/health.data_integrity.ratio=1.47; MSR-D31-A6 passes with threshold <=2.0. |
| V2 | Vector sync CRUD | PASS | PASS | same | upload_document rev 1; search_knowledge found CHAOS-R3-CDX-V2-UNIQUE-SYNC-TOKEN; delete_document rev 2; follow-up search no longer returned the deleted document in context. |
| V3 | Orphan vector detection | FAIL | PASS | improved | Production /opt/incomex/dot/bin/dot-vector-audit --cloud reported Status: needs_cleanup and Ghost documents (25). |
| A1 | Event system alive | PASS | PASS | same | /api/health.event_system: enabled=true, listeners=1. |
| A2 | Directus sync active | FAIL | PASS | improved* | Post-cleanup runner MSR-D31-A5 passed: Directus published 377, Agent Data document_count=591. Note: this is one-way logic only. |
| A3 | Container restart policy | PASS | PASS | same | docker inspect incomex-agent-data -> unless-stopped; started 2026-03-23T14:53:04Z. |
| C1 | Runner cron active | PASS | PASS | same | Crontab contains 0 */6 * * * /opt/incomex/deploys/web-test/scripts/integrity/cron-integrity.sh. |
| C2 | Runner last execution | FAIL | PASS | improved | Latest runner artifacts: cron-20260326-064834.log and cron-20260326-065007.log, both exit 0; latest file timestamp same day. |
| C3 | Scanner cron independent | FAIL | FAIL | same | Crontab has one integrity cron plus watchdog only; no separate scanner/vector audit cron entry. |
| C4 | Watchdog heartbeat freshness | PASS | PASS | same | Round-3 query: `ISS-1647 |
Detection Rates
- Phase 1: 14/21
- Phase 2: 9/10
- Total: 23/31
Comparison:
- S167F:
16/31 - S167H Codex:
23/31 - Net change:
+7
Phase 3 — Multi-Round Consistency
| Round | Scenarios re-tested | All PASS? | Anomalies |
|---|---|---|---|
| 1 | Full Codex run across 31 scenarios | NO | Fails remained in O1, O3, N1, N2, N3, E1, S2, C3. |
| 2 | Post-cleanup runner + guard retest | NO | P2/P3 still block correctly; s167h-cdx-postcleanup returned A1=0 but A2=3, A3=1, A4=2 because Gemini chaos remained live. |
| 3 | After >5 min: V1, W1, W2, A1, C4, AFTER baseline |
NO | Stability held: ratio 1.47, API totals 4, groups 4, PG open 4, watchdog fresh. AFTER baseline still drifted from BEFORE due concurrent Gemini writes and live production traffic. |
Automation Gap Registry
| # | Area | Problem | Severity | Current | Need |
|---|---|---|---|---|---|
| 1 | Điều 31 counting | MSR-D26-* are enabled but still method=1, so PG runner ignores count integrity. |
CRITICAL | MSR-D26-001/002/004 -> method=1, runner only loads method=2. |
Move D26 checks into method-2 or add a dedicated PG runner path for method-1. |
| 2 | Điều 31 counting | verify_counts() is broken on production. |
CRITICAL | SELECT COUNT(*) FROM verify_counts() errors on species_collection_map WHERE code ... column "code" does not exist. |
Repair verify_counts() or remove it from active counting doctrine. |
| 3 | Runner automation | Raw mission command is not production-ready without injected env. | HIGH | scripts/integrity/pg-client.js requires DATABASE_URL; raw run falls back to legacy dry-run. |
Provide stable env export on VPS or containerized entrypoint. |
| 4 | Cron env wiring | cron-integrity.sh reads PG_USER/PG_PASSWORD/PG_DATABASE, but production postgres container exposes POSTGRES_*. |
HIGH | Script lines 30-35 expect PG_*; docker inspect postgres shows POSTGRES_USER, POSTGRES_PASSWORD, POSTGRES_DB. |
Align cron env lookup with live container env keys. |
| 5 | Cron docs drift | cron-integrity.sh header says daily 20:00 UTC, but live crontab is 0 */6 * * *. |
MEDIUM | Source comments and production schedule disagree. | Update script header/comments to match live schedule. |
| 6 | Scanner independence | No separate scanner/vector-audit cron exists. | HIGH | Crontab shows one integrity cron and one watchdog cron only. | Add an independent scanner/vector audit schedule with separate logs. |
| 7 | Watchdog auth path | watchdog-monitor.sh exits 0 on missing token, so auth failure becomes silent non-alerting skip. |
HIGH | Log contains repeated WATCHDOG: No token — skipping; script lines 19-21 return success. |
Make missing token a hard failure and page it. |
| 8 | Alerting | dedupe.js only creates/updates system_issues; there is no external notification path for CRITICAL findings. |
HIGH | Code writes Directus rows only. | Add Slack/email/webhook alert fan-out for CRITICAL and runner failure. |
| 9 | Sync drift logic | A5 is one-way only: Agent Data >= Directus published docs always passes. |
HIGH | compareSyncDrift() returns pass when ad >= dc; duplicates/oversync are invisible. |
Add upper-bound / sample-content parity, not only lower-bound deficit detection. |
| 10 | Auto-expansion | A1/A2/A3 are hardcoded table lists, not meta-driven expansion. | MEDIUM | sql/s167g_scanner_hardening.sql enumerates specific collections for _dot_origin, deps, edges. |
Generate checks from registry metadata so new collections are covered automatically. |
| 11 | CI regression guard | Critical-file guard protects a fixed file list only. | MEDIUM | .github/workflows/guard_critical_files.yml hardcodes paths and patterns. |
Derive coverage from contracts/routes/registries rather than fixed filenames. |
| 12 | Smoke coverage | scripts/smoke-test.sh checks a fixed set of pages/APIs only. |
MEDIUM | Static endpoint list; new pages or APIs are invisible until manually added. | Expand smoke generation from routing/contracts. |
| 13 | Threshold inconsistency | sync-check.yml treats vector ratio 2.5-4.5 as healthy while A6 passes <=2.0. |
MEDIUM | GitHub Action and production scanner disagree on what healthy means. | Unify vector parity thresholds across CI and production. |
| 14 | Flow liveness | No sync_heartbeats or flow_heartbeats table exists to prove Directus flow/service freshness. |
HIGH | to_regclass('sync_heartbeats') -> NULL, to_regclass('flow_heartbeats') -> NULL. |
Add heartbeat tables and scanner checks for stale flows/services. |
| 15 | Vector audit automation | Vector orphan visibility exists, but scheduling is local/macOS-centric, not VPS-native. | HIGH | dot-vector-audit-schedule creates a LaunchAgent; VPS crontab has no vector-audit entry. |
Add VPS cron/systemd timer for dot-vector-audit --cloud or equivalent API call. |
| 16 | Log rotation | Integrity logs are not included in /etc/logrotate.d/incomex. |
MEDIUM | Logrotate covers /var/log/mcp-health.log, reconcile logs, backup logs, but not /opt/incomex/logs/integrity/*.log. |
Add integrity log paths to logrotate. |
| 17 | Cleanup operability | Official meta_catalog cleanup path is broken on live schema. |
HIGH | Direct delete is guard-blocked; deprecate_entity()/retire_entity() are broken, so cleanup required temporary guard-trigger disable. |
Repair governed lifecycle/delete functions so test cleanup does not need trigger bypass. |
| 18 | Watchdog design | Watchdog freshness is inferred from system_issues.last_seen_at, not an independent heartbeat channel. |
MEDIUM | Same row acts as alert and heartbeat store. | Split liveness heartbeat from issue state to avoid circular dependence. |
Production Evidence Behind Key Gaps
scripts/integrity/main.js:43-91loads onlymethod=2measurements.scripts/integrity/pg-client.js:15-18hard-fails withoutDATABASE_URL.scripts/integrity/runners/pg-vs-nuxt-check.js:163-174makes A5 one-way (ad >= dc => pass).scripts/integrity/cron-integrity.sh:18-41contains token/DB env assumptions that do not match current production.scripts/integrity/watchdog-monitor.sh:19-21silently skips when token is absent.sql/s133_measurement_framework.sql:316-330still defines D26 checks asmethod=1.sql/s167g_scanner_hardening.sql:13-149hardcodes A1/A2/A3/A4 scope instead of registry-driven scope..github/workflows/guard_critical_files.yml:16-30and:53-62are fixed-list guards.scripts/smoke-test.sh:87-119is fixed-endpoint smoke coverage..github/workflows/sync-check.yml:67-70uses a vector-ratio health range inconsistent with production A6.
Cleanup
Codex-prefix zero residue
Verified all CHAOS-R3-CDX-* residue is gone:
meta_catalog=0v_registry_counts=0system_issues=0dot_tools=0taxonomy=0checkpoint_types=0trigger_registry=0entity_dependencies=0universal_edges=0
Cleanup note
meta_catalog direct delete is guard-blocked on production, and official lifecycle cleanup functions are broken on the live schema. To remove only CHAOS-R3-CDX-P1 and CHAOS-R3-CDX-O3, I disabled only trg_guard_meta_catalog_delete and trg_guard_v_registry_counts inside one transaction, deleted the two Codex rows, and immediately re-enabled both guards.
AFTER Baseline
| Metric | BEFORE | AFTER | Note |
|---|---|---|---|
| trigger_count | 141 | 141 | restored |
| v_registry_counts rows | 23 | 24 | drift from concurrent Gemini data |
| open system_issues | 1 | 4 | concurrent Gemini + reopened sync faults + watchdog |
CAT-ALL.record_count |
19129 | 20640 | live production traffic during mission |
total dot_tools |
112 | 165 | external writes during mission |
total entity_dependencies |
141 | 144 | exactly matches 3 live Gemini rows |
total universal_edges |
2039 | 2040 | external live delta |
Concurrent Gemini evidence at finish:
meta_catalog|1v_registry_counts|1system_issues|1dot_tools|1entity_dependencies|3universal_edges|0forCHAOS-R3-GEM-*/GEM-*markers.
Final Conclusion
- Detection rate:
23/31. - Compared to S167F: improved from
16/31to23/31. - Primary blockers still open:
O1,O3,N1,N2,N3,E1,S2,C3. - Readiness: KHÔNG. Điều 31 is materially stronger than S167F, but counting integrity, raw runner operability, auto-expansion, and automation liveness still leave blind spots that can fail silently.