S174 Hardtest Agent-Data CRUD + Log Stability
S174 Hardtest Agent-Data CRUD + Log Stability
Date: 2026-04-09 Operator: Codex Scope: hardtest MCP CRUD/search path, inspect runtime logs during test, and hậu kiểm Đ31 integrity chain after S174-FIX-04 and S174-FIX-05. Constraint: no fixes performed in this mission.
1. Executive Summary
Test window started at 2026-04-09 05:24:42 CEST on VPS (2026-04-09 10:24:42 ICT).
Overall result:
- Total measured MCP operations:
90 - PASS:
90 - FAIL:
0 - Observed fail rate:
0% - Search path looked stable under both sequential and concurrent load.
- No backend evidence of
500,ERROR,WARNING,Traceback, PostgreSQL slow/deadlock, or Qdrant index/query failure during the test window. - Host/container resources stayed healthy: host RAM available
8.6Gi, swap0Bused, disk/at40%, TCP established5. - One anomaly remains unexplained: a single
patch_documentstep took115323 msclient-observed wall clock, but backend logs in the same window did not show DB/Qdrant/runtime failure. - Đ31 chain is not fully clean yet: latest cron log within
7hshowsenv-contract-checkPASS,logrotate-config-checkPASS,WATCHDOGalive, butrsyslog-health-checkstill reported a fault and the overall run endedPASS: 10 | FAIL: 117 | ERROR: 1.
Bottom line:
agent-dataCRUD/search is operationally stable enough for normal use based on this hardtest.- There is no evidence of silent backend failure during the test window.
- Đ31 recovery is only partial from an operations perspective because the runner lives again, but the cron chain is not fully green.
2. Test Method
The test used direct MCP calls against the knowledge base:
search_knowledgeget_documentupload_documentpatch_documentdelete_document
Latency measurement method:
search_knowledge: tool-reportedusage.latency_msget_document/upload_document/patch_document/delete_document: local wall-clock timing around each call
Cleanup verification:
- Created test files
knowledge/current-state/tests/hardtest-1.mdthroughhardtest-5.md - Deleted all five after lifecycle tests
- Final list check returned an empty directory
3. Part A — Hardtest CRUD Results
3.1 Summary Table
| Group | Count | PASS | FAIL | Avg (ms) | p95 (ms) | p99 (ms) | Max (ms) |
|---|---|---|---|---|---|---|---|
A1 Sequential search_knowledge |
20 | 20 | 0 | 3482.9 | 5638 | 7300 | 7300 |
A1 Sequential get_document |
10 | 10 | 0 | 14257.1 | 17779 | 17779 | 17779 |
A2 Concurrent search_knowledge all rounds |
30 | 30 | 0 | 2078.7 | 4732 | 8698 | 8698 |
| A3 CRUD lifecycle total | 5 | 5 | 0 | 105654.2 | 178823 | 178823 | 178823 |
A3 upload_document |
5 | 5 | 0 | 15554.8 | 19688 | 19688 | 19688 |
A3 get_document |
5 | 5 | 0 | 14548.0 | 17102 | 17102 | 17102 |
A3 patch_document |
5 | 5 | 0 | 37690.0 | 115323 | 115323 | 115323 |
A3 search_knowledge verify |
5 | 5 | 0 | 2043.4 | 3196 | 3196 | 3196 |
A3 delete_document |
5 | 5 | 0 | 9249.8 | 14277 | 14277 | 14277 |
A4 Heavy search_knowledge |
5 | 5 | 0 | 4876.2 | 5407 | 5407 | 5407 |
3.2 A1 Sequential Baseline
Sequential search_knowledge latencies, ms:
504, 3128, 4452, 3515, 2810, 3109, 3395, 3526, 3135, 3429,
5638, 4797, 7300, 2433, 3263, 2304, 3721, 3633, 2849, 2718
Sequential get_document latencies, ms:
12161, 16758, 14756, 10211, 9987, 17779, 15745, 15278, 15742, 14154
Interpretation:
- Baseline search stayed in low single-digit seconds.
get_documentis materially slower thansearch_knowledgein this environment, but it was consistent and had no failures.
3.3 A2 Concurrent Stress
Each round fired 10 search_knowledge requests near-simultaneously, then paused 5s.
Round 1 latencies, ms:
590, 433, 4732, 2778, 2131, 2872, 2783, 3999, 3900, 2381
Round 2 latencies, ms:
751, 460, 644, 507, 8698, 2395, 945, 511, 1402, 4184
Round 3 latencies, ms:
573, 440, 4112, 393, 398, 464, 856, 4388, 2888, 752
Per-round summary:
| Round | Count | PASS | FAIL | Avg (ms) | p95 (ms) | Max (ms) |
|---|---|---|---|---|---|---|
| 1 | 10 | 10 | 0 | 2659.9 | 4732 | 4732 |
| 2 | 10 | 10 | 0 | 2049.7 | 8698 | 8698 |
| 3 | 10 | 10 | 0 | 1526.4 | 4388 | 4388 |
Interpretation:
- Concurrent search did not produce errors or retries.
- Tail latency exists, with one
8698 msoutlier, but overall average under concurrency was better than sequential baseline.
3.4 A3 CRUD Lifecycle
Each lifecycle did:
upload_documentget_documentpatch_documentsearch_knowledgewith unique markerdelete_document
Per-lifecycle timings, ms:
| Iteration | Upload | Get | Patch | Search | Delete | Total |
|---|---|---|---|---|---|---|
| 1 | 19688 | 14091 | 16609 | 3196 | 14277 | 93321 |
| 2 | 16316 | 11910 | 115323 | 1857 | 6007 | 178823 |
| 3 | 16052 | 16589 | 26306 | 1583 | 5204 | 85875 |
| 4 | 12803 | 13048 | 16466 | 1704 | 14022 | 80918 |
| 5 | 12915 | 17102 | 13746 | 1877 | 6739 | 89334 |
Evidence that vector update/read path completed correctly:
- All
5patch steps succeeded. - All
5post-patch searches found the unique marker immediately. - Cleanup check returned zero remaining test files.
Cleanup evidence:
list_documents(path="knowledge/current-state/tests")
=> {"items":[],"count":0}
Interpretation:
- Functional CRUD lifecycle passed
5/5. - The only hard anomaly in Part A is the second
patch_documentwall-clock spike at115323 ms. - Because read-after-write and search-after-write still passed, this is not evidence of data loss or vector-sync failure by itself.
3.5 A4 Heavy Search
Heavy-query latencies, ms:
4990, 5161, 4217, 4606, 5407
Interpretation:
- Complex search stayed roughly
4.2sto5.4s. - No timeout or retry behavior was observed.
4. Part B — Runtime Logs and Resource Evidence
4.1 incomex-agent-data Logs
Command:
ssh root@38.242.240.89 "docker logs incomex-agent-data --since '2026-04-09T05:24:42+02:00' 2>&1 | tail -200"
Representative output:
Qdrant probe OK: 11 points (5ms)
PostgreSQL probe OK (0ms)
INFO: 172.18.0.1:57826 - "GET /health HTTP/1.1" 200 OK
INFO: 172.18.0.1:57842 - "POST /mcp HTTP/1.1" 200 OK
INFO: 172.18.0.1:57842 - "GET /kb/get/knowledge/current-state/tests/hardtest-4.md?full=true&search=false HTTP/1.1" 200 OK
INFO: 172.18.0.1:57852 - "POST /mcp HTTP/1.1" 200 OK
Qdrant probe OK: 11 points (26ms)
Focused log counts during the test window:
http_500=0
http_404=1
error_level=0
warning_level=0
traceback=0
exception=0
The lone 404 observed:
INFO: 172.18.0.7:41816 - "GET /documents/knowledge/dev/architecture/nd-36-01-semantic-relationship-infrastructure-draft.md?full=true&search=false HTTP/1.1" 404 Not Found
Assessment:
- No hardtest-generated
500, exception, or warning was present. - The single
404points to an unrelated missing draft path, not to anyhardtest-*document.
4.2 Qdrant Logs
Command:
ssh root@38.242.240.89 "docker logs incomex-qdrant --since '2026-04-09T05:24:42+02:00' 2>&1 | tail -100"
Representative output:
2026-04-09T03:30:23.219534Z INFO actix_web::middleware::logger: 172.18.0.3 "PUT /collections/documents/points?wait=true&ordering=weak HTTP/1.1" 200 639 "-" "-" 0.074744
2026-04-09T03:30:24.835833Z INFO actix_web::middleware::logger: 172.18.0.3 "POST /collections/documents/points/search HTTP/1.1" 200 2736 "-" "-" 0.004804
2026-04-09T03:30:30.171320Z INFO actix_web::middleware::logger: 172.18.0.3 "POST /collections/documents/points/search HTTP/1.1" 200 2976 "-" "-" 0.012230
2026-04-09T03:30:31.422245Z INFO actix_web::middleware::logger: 172.18.0.3 "POST /collections/documents/points/delete?wait=true&ordering=weak HTTP/1.1" 200 99 "-" "-" 0.025348
Error-focused command:
ssh root@38.242.240.89 "docker logs incomex-qdrant --since '2026-04-09T05:24:42+02:00' 2>&1 | grep -iE 'error|warn|timeout|panic|fail' | tail -100"
Output:
[no output]
Assessment:
- Qdrant handled search, upsert, and delete during the hardtest without visible error.
- Server-side vector operations stayed in the sub-
75msrange in sampled log lines.
4.3 PostgreSQL Logs
Command:
ssh root@38.242.240.89 "docker logs postgres --since '2026-04-09T05:24:42+02:00' 2>&1 | grep -iE 'error|slow|timeout|deadlock' | tail -50"
Output:
[no output]
Assessment:
- No PostgreSQL evidence of slow query, timeout, error, or deadlock during the hardtest window.
4.4 Container and Host Resources
Command:
ssh root@38.242.240.89 "docker stats --no-stream --format 'table {{.Name}}\t{{.CPUPerc}}\t{{.MemUsage}}\t{{.NetIO}}'"
Output:
NAME CPU % MEM USAGE / LIMIT NET I/O
incomex-nuxt 0.00% 84.03MiB / 512MiB 89.2MB / 540MB
uptime-kuma 1.44% 159.1MiB / 11.68GiB 331MB / 20.1MB
incomex-agent-data 1.82% 1.097GiB / 2.5GiB 102GB / 527MB
postgres 0.03% 222.5MiB / 2GiB 1.86GB / 279GB
incomex-nginx 0.00% 32.05MiB / 256MiB 11.5GB / 11.9GB
incomex-directus 6.82% 192.2MiB / 1GiB 1.12GB / 1.42GB
incomex-qdrant 0.49% 114.7MiB / 1GiB 640MB / 839MB
Command:
ssh root@38.242.240.89 "free -h && echo '---' && df -h /"
Output:
total used free shared buff/cache available
Mem: 11Gi 3.1Gi 5.8Gi 214Mi 2.7Gi 8.6Gi
Swap: 2.0Gi 0B 2.0Gi
---
Filesystem Size Used Avail Use% Mounted on
/dev/sda1 96G 38G 59G 40% /
Command:
ssh root@38.242.240.89 "ss -s"
Output:
Total: 249
TCP: 64 (estab 5, closed 49, orphaned 0, timewait 4)
Assessment:
- No CPU, RAM, disk, swap, or socket saturation evidence appeared during the test.
incomex-agent-datamemory at1.097GiB / 2.5GiBleaves meaningful headroom.
5. Part C — Đ31 Chain Hậu Kiểm
5.1 Latest Cron Artifacts
Command:
ssh root@38.242.240.89 "ls -lt /opt/incomex/logs/integrity/cron-*.log | head -3"
Output:
-rw-r--r-- 1 root root 40118 Apr 9 04:49 /opt/incomex/logs/integrity/cron-20260409-044906.log
-rw-r--r-- 1 root root 1040 Apr 9 04:44 /opt/incomex/logs/integrity/cron-20260409-044426.log
-rw-r--r-- 1 root root 494 Apr 9 04:42 /opt/incomex/logs/integrity/cron-20260409-044209.log
This satisfies the user check for a fresh cron log within 7h.
5.2 Latest Cron Summary
Tail evidence from /opt/incomex/logs/integrity/cron-20260409-044906.log:
PASS: 10 | FAIL: 117 | ERROR: 1
Pass Rate: 7.9% (10/127)
Issues Created: 0 | Reopened: 117
WATCHDOG: alive
run_id: cron-20260409-044906
Focused chain grep:
ssh root@38.242.240.89 "grep -nE 'env-contract-check|logrotate-config-check|rsyslog-health-check|WATCHDOG|runner sống|Starting integrity scan|Missing required env' /opt/incomex/logs/integrity/cron-20260409-044906.log | tail -50"
Output:
4:env-contract-check: scanned 5 required vars
8:logrotate-config-check: dry-run complete
11:rsyslog-health-check: status=active, suspend_count_1h=651
14:WARN: rsyslog-health-check detected fault (exit=1). Issue reported. Runner continues.
161: MSR-D31-WATCHDOG [dieu31] WATCHDOG — runner sống
794: ⚡ MSR-D31-WATCHDOG: WATCHDOG — WATCHDOG — runner sống
797: Delta: WATCHDOG — runner alive
804: WATCHDOG: alive
Assessment:
env-contract-check: PASSlogrotate-config-check: PASSrsyslog-health-check: NOT clean, fault detected- Runner: alive
- Watchdog: alive
Therefore the cron chain is not fully PASS end-to-end.
5.3 system_issues Distribution
The user-provided SQL used issue_type, but current rsyslog health code records under issue_class='rsyslog_fault'. The direct query on issue_type returned 0, which is a query mismatch, not proof that rsyslog faults never existed.
Field evidence:
ssh root@38.242.240.89 "docker exec postgres psql -U directus -d directus -Atc \"SELECT issue_class, status, count(*) FROM system_issues WHERE issue_class IN ('config_error','env_drift','logrotate_drift','rsyslog_fault') GROUP BY 1,2 ORDER BY 1,2;\""
Output:
config_error|resolved|1
env_drift|resolved|1
logrotate_drift|resolved|1
rsyslog_fault|resolved|4
Recent rsyslog rows:
ssh root@38.242.240.89 "docker exec postgres psql -U directus -d directus -Atc \"SELECT issue_class, status, code, last_seen_at, resolved_at, occurrence_count FROM system_issues WHERE issue_class='rsyslog_fault' ORDER BY last_seen_at DESC LIMIT 5;\""
Output:
rsyslog_fault|resolved|ISS-2780|2026-04-09 02:49:06+00|2026-04-09 02:49:33.812|1
rsyslog_fault|resolved|ISS-2779|2026-04-09 02:44:27+00|2026-04-09 02:49:33.812|1
rsyslog_fault|resolved|ISS-2778|2026-04-09 02:42:10+00||1
rsyslog_fault|resolved|ISS-2777|2026-04-09 02:41:25+00||1
Interpretation:
- The issue records exist and are currently marked
resolved. - Two
resolvedrows show emptyresolved_at, so lifecycle metadata is not fully normalized.
Implementation evidence from the local repo:
Relevant behavior:
rsyslog-health-check.shcounts journalsuspendevents over the last hour and reportsissue_class":"rsyslog_fault".cron-integrity.shtreats this check as warn-only and allows the runner to continue.
6. Conclusion
6.1 Is agent-data CRUD stable?
Yes, with an important caveat.
- Measured operations:
90/90PASS - Observed fail rate:
0% - Concurrent search hardtest produced no retries or failed calls
- Search p95 stayed at
4732 msunder concurrent load and5638 msin sequential baseline - Heavy search max was
5407 ms - End-to-end CRUD lifecycle passed
5/5
The caveat is one client-observed patch_document latency spike at 115323 ms. That spike is real, but current evidence does not tie it to Qdrant, PostgreSQL, memory pressure, CPU pressure, or backend error logs.
6.2 Is there any silent instability left?
No silent backend failure was found during the hardtest window.
Evidence:
incomex-agent-data: no500,ERROR,WARNING,Traceback, or exceptionpostgres: noerror|slow|timeout|deadlockqdrant: noerror|warn|timeout|panic|fail- Host and container resource headroom remained healthy
The only notable residual signals are:
- the unexplained
patch_documentlatency spike - an unrelated
404on a missing draft knowledge path - latest Đ31 cron still warning on rsyslog history
6.3 Does the Đ31 chain PASS after the fixes?
Not fully.
What is confirmed PASS:
- fresh cron artifact exists within
7h - env contract check ran successfully
- logrotate config dry-run completed
- runner is alive
- watchdog is alive
What is not clean:
- latest cron log still reports
rsyslog-health-check detected fault - latest integrity run still shows
PASS: 10 | FAIL: 117 | ERROR: 1
So the correct conclusion is: Đ31 is revived, but the whole chain is not yet fully green.
7. Unknowns
- Root cause of the
115323 mspatch latency spike is not proven by current evidence. - It is not yet proven whether the rsyslog warning in the latest cron log reflects a still-active problem or a stale one-hour lookback that had already self-recovered before the hardtest window.
- The
system_issueslifecycle metadata forrsyslog_faultis partially inconsistent because someresolvedrows have blankresolved_at.
8. Appendix — Commands and Evidence
8.1 MCP Test Operations
Sequential search test:
20 direct calls to search_knowledge(<distinct query>)
Result: 20/20 PASS
Sequential get test:
10 direct calls to get_document(<distinct path>)
Result: 10/10 PASS
Concurrent search stress:
3 rounds x 10 parallel search_knowledge calls
Pause: 5s between rounds
Result: 30/30 PASS
CRUD lifecycle:
For N=1..5:
- upload_document(knowledge/current-state/tests/hardtest-N.md)
- get_document(...)
- patch_document(...)
- search_knowledge(unique patched marker)
- delete_document(...)
Result: 5/5 lifecycle PASS
Cleanup:
list_documents(path="knowledge/current-state/tests")
=> {"items":[],"count":0}
8.2 VPS Log and Resource Commands
ssh root@38.242.240.89 "docker logs incomex-agent-data --since '2026-04-09T05:24:42+02:00' 2>&1 | tail -200"
ssh root@38.242.240.89 "docker logs incomex-qdrant --since '2026-04-09T05:24:42+02:00' 2>&1 | tail -100"
ssh root@38.242.240.89 "docker logs incomex-qdrant --since '2026-04-09T05:24:42+02:00' 2>&1 | grep -iE 'error|warn|timeout|panic|fail' | tail -100"
ssh root@38.242.240.89 "docker logs postgres --since '2026-04-09T05:24:42+02:00' 2>&1 | grep -iE 'error|slow|timeout|deadlock' | tail -50"
ssh root@38.242.240.89 "docker stats --no-stream --format 'table {{.Name}}\t{{.CPUPerc}}\t{{.MemUsage}}\t{{.NetIO}}'"
ssh root@38.242.240.89 "free -h && echo '---' && df -h /"
ssh root@38.242.240.89 "ss -s"
8.3 Đ31 Commands
ssh root@38.242.240.89 "ls -lt /opt/incomex/logs/integrity/cron-*.log | head -3"
ssh root@38.242.240.89 "grep -nE 'env-contract-check|logrotate-config-check|rsyslog-health-check|WATCHDOG|runner sống|Starting integrity scan|Missing required env' /opt/incomex/logs/integrity/cron-20260409-044906.log | tail -50"
ssh root@38.242.240.89 "docker exec postgres psql -U directus -d directus -Atc \"SELECT issue_class, status, count(*) FROM system_issues WHERE issue_class IN ('config_error','env_drift','logrotate_drift','rsyslog_fault') GROUP BY 1,2 ORDER BY 1,2;\""
ssh root@38.242.240.89 "docker exec postgres psql -U directus -d directus -Atc \"SELECT issue_class, status, code, last_seen_at, resolved_at, occurrence_count FROM system_issues WHERE issue_class='rsyslog_fault' ORDER BY last_seen_at DESC LIMIT 5;\""