S174 Hardtest Agent-Data CRUD + Log Stability

Date: 2026-04-09 Operator: Codex Scope: hardtest MCP CRUD/search path, inspect runtime logs during test, and hậu kiểm Đ31 integrity chain after S174-FIX-04 and S174-FIX-05. Constraint: no fixes performed in this mission.

1. Executive Summary

Test window started at 2026-04-09 05:24:42 CEST on VPS (2026-04-09 10:24:42 ICT).

Overall result:

Total measured MCP operations: 90
PASS: 90
FAIL: 0
Observed fail rate: 0%
Search path looked stable under both sequential and concurrent load.
No backend evidence of 500, ERROR, WARNING, Traceback, PostgreSQL slow/deadlock, or Qdrant index/query failure during the test window.
Host/container resources stayed healthy: host RAM available 8.6Gi, swap 0B used, disk / at 40%, TCP established 5.
One anomaly remains unexplained: a single patch_document step took 115323 ms client-observed wall clock, but backend logs in the same window did not show DB/Qdrant/runtime failure.
Đ31 chain is not fully clean yet: latest cron log within 7h shows env-contract-check PASS, logrotate-config-check PASS, WATCHDOG alive, but rsyslog-health-check still reported a fault and the overall run ended PASS: 10 | FAIL: 117 | ERROR: 1.

Bottom line:

agent-data CRUD/search is operationally stable enough for normal use based on this hardtest.
There is no evidence of silent backend failure during the test window.
Đ31 recovery is only partial from an operations perspective because the runner lives again, but the cron chain is not fully green.

2. Test Method

The test used direct MCP calls against the knowledge base:

search_knowledge
get_document
upload_document
patch_document
delete_document

Latency measurement method:

search_knowledge: tool-reported usage.latency_ms
get_document / upload_document / patch_document / delete_document: local wall-clock timing around each call

Cleanup verification:

Created test files knowledge/current-state/tests/hardtest-1.md through hardtest-5.md
Deleted all five after lifecycle tests
Final list check returned an empty directory

3. Part A — Hardtest CRUD Results

3.1 Summary Table

Group	Count	PASS	Avg (ms)	p95 (ms)	p99 (ms)	Max (ms)
A1 Sequential `search_knowledge`	20	20	3482.9	5638	7300	7300
A1 Sequential `get_document`	10	10	14257.1	17779	17779	17779
A2 Concurrent `search_knowledge` all rounds	30	30	2078.7	4732	8698	8698
A3 CRUD lifecycle total	5	5	105654.2	178823	178823	178823
A3 `upload_document`	5	5	15554.8	19688	19688	19688
A3 `get_document`	5	5	14548.0	17102	17102	17102
A3 `patch_document`	5	5	37690.0	115323	115323	115323
A3 `search_knowledge` verify	5	5	2043.4	3196	3196	3196
A3 `delete_document`	5	5	9249.8	14277	14277	14277
A4 Heavy `search_knowledge`	5	5	4876.2	5407	5407	5407

3.2 A1 Sequential Baseline

Sequential search_knowledge latencies, ms:

504, 3128, 4452, 3515, 2810, 3109, 3395, 3526, 3135, 3429,
5638, 4797, 7300, 2433, 3263, 2304, 3721, 3633, 2849, 2718

Sequential get_document latencies, ms:

12161, 16758, 14756, 10211, 9987, 17779, 15745, 15278, 15742, 14154

Interpretation:

Baseline search stayed in low single-digit seconds.
get_document is materially slower than search_knowledge in this environment, but it was consistent and had no failures.

3.3 A2 Concurrent Stress

Each round fired 10 search_knowledge requests near-simultaneously, then paused 5s.

Round 1 latencies, ms:

590, 433, 4732, 2778, 2131, 2872, 2783, 3999, 3900, 2381

Round 2 latencies, ms:

751, 460, 644, 507, 8698, 2395, 945, 511, 1402, 4184

Round 3 latencies, ms:

573, 440, 4112, 393, 398, 464, 856, 4388, 2888, 752

Per-round summary:

Round	Count	PASS	Avg (ms)	p95 (ms)	Max (ms)
1	10	10	2659.9	4732	4732
2	10	10	2049.7	8698	8698
3	10	10	1526.4	4388	4388

Interpretation:

Concurrent search did not produce errors or retries.
Tail latency exists, with one 8698 ms outlier, but overall average under concurrency was better than sequential baseline.

3.4 A3 CRUD Lifecycle

Each lifecycle did:

upload_document
get_document
patch_document
search_knowledge with unique marker
delete_document

Per-lifecycle timings, ms:

Iteration	Upload	Get	Patch	Search	Delete	Total
1	19688	14091	16609	3196	14277	93321
2	16316	11910	115323	1857	6007	178823
3	16052	16589	26306	1583	5204	85875
4	12803	13048	16466	1704	14022	80918
5	12915	17102	13746	1877	6739	89334

Evidence that vector update/read path completed correctly:

All 5 patch steps succeeded.
All 5 post-patch searches found the unique marker immediately.
Cleanup check returned zero remaining test files.

Cleanup evidence:

list_documents(path="knowledge/current-state/tests")
=> {"items":[],"count":0}

Interpretation:

Functional CRUD lifecycle passed 5/5.
The only hard anomaly in Part A is the second patch_document wall-clock spike at 115323 ms.
Because read-after-write and search-after-write still passed, this is not evidence of data loss or vector-sync failure by itself.

3.5 A4 Heavy Search

Heavy-query latencies, ms:

4990, 5161, 4217, 4606, 5407

Interpretation:

Complex search stayed roughly 4.2s to 5.4s.
No timeout or retry behavior was observed.

4. Part B — Runtime Logs and Resource Evidence

4.1 `incomex-agent-data` Logs

Command:

ssh root@38.242.240.89 "docker logs incomex-agent-data --since '2026-04-09T05:24:42+02:00' 2>&1 | tail -200"

Representative output:

Qdrant probe OK: 11 points (5ms)
PostgreSQL probe OK (0ms)
INFO:     172.18.0.1:57826 - "GET /health HTTP/1.1" 200 OK
INFO:     172.18.0.1:57842 - "POST /mcp HTTP/1.1" 200 OK
INFO:     172.18.0.1:57842 - "GET /kb/get/knowledge/current-state/tests/hardtest-4.md?full=true&search=false HTTP/1.1" 200 OK
INFO:     172.18.0.1:57852 - "POST /mcp HTTP/1.1" 200 OK
Qdrant probe OK: 11 points (26ms)

Focused log counts during the test window:

http_500=0
http_404=1
error_level=0
warning_level=0
traceback=0
exception=0

The lone 404 observed:

INFO: 172.18.0.7:41816 - "GET /documents/knowledge/dev/architecture/nd-36-01-semantic-relationship-infrastructure-draft.md?full=true&search=false HTTP/1.1" 404 Not Found

Assessment:

No hardtest-generated 500, exception, or warning was present.
The single 404 points to an unrelated missing draft path, not to any hardtest-* document.

4.2 Qdrant Logs

Command:

ssh root@38.242.240.89 "docker logs incomex-qdrant --since '2026-04-09T05:24:42+02:00' 2>&1 | tail -100"

Representative output:

2026-04-09T03:30:23.219534Z  INFO actix_web::middleware::logger: 172.18.0.3 "PUT /collections/documents/points?wait=true&ordering=weak HTTP/1.1" 200 639 "-" "-" 0.074744
2026-04-09T03:30:24.835833Z  INFO actix_web::middleware::logger: 172.18.0.3 "POST /collections/documents/points/search HTTP/1.1" 200 2736 "-" "-" 0.004804
2026-04-09T03:30:30.171320Z  INFO actix_web::middleware::logger: 172.18.0.3 "POST /collections/documents/points/search HTTP/1.1" 200 2976 "-" "-" 0.012230
2026-04-09T03:30:31.422245Z  INFO actix_web::middleware::logger: 172.18.0.3 "POST /collections/documents/points/delete?wait=true&ordering=weak HTTP/1.1" 200 99 "-" "-" 0.025348

Error-focused command:

ssh root@38.242.240.89 "docker logs incomex-qdrant --since '2026-04-09T05:24:42+02:00' 2>&1 | grep -iE 'error|warn|timeout|panic|fail' | tail -100"

Output:

[no output]

Assessment:

Qdrant handled search, upsert, and delete during the hardtest without visible error.
Server-side vector operations stayed in the sub-75ms range in sampled log lines.

4.3 PostgreSQL Logs

Command:

ssh root@38.242.240.89 "docker logs postgres --since '2026-04-09T05:24:42+02:00' 2>&1 | grep -iE 'error|slow|timeout|deadlock' | tail -50"

Output:

[no output]

Assessment:

No PostgreSQL evidence of slow query, timeout, error, or deadlock during the hardtest window.

4.4 Container and Host Resources

Command:

ssh root@38.242.240.89 "docker stats --no-stream --format 'table {{.Name}}\t{{.CPUPerc}}\t{{.MemUsage}}\t{{.NetIO}}'"

Output:

NAME                 CPU %   MEM USAGE / LIMIT      NET I/O
incomex-nuxt         0.00%   84.03MiB / 512MiB      89.2MB / 540MB
uptime-kuma          1.44%   159.1MiB / 11.68GiB    331MB / 20.1MB
incomex-agent-data   1.82%   1.097GiB / 2.5GiB      102GB / 527MB
postgres             0.03%   222.5MiB / 2GiB        1.86GB / 279GB
incomex-nginx        0.00%   32.05MiB / 256MiB      11.5GB / 11.9GB
incomex-directus     6.82%   192.2MiB / 1GiB        1.12GB / 1.42GB
incomex-qdrant       0.49%   114.7MiB / 1GiB        640MB / 839MB

Command:

ssh root@38.242.240.89 "free -h && echo '---' && df -h /"

Output:

               total        used        free      shared  buff/cache   available
Mem:            11Gi       3.1Gi       5.8Gi       214Mi       2.7Gi       8.6Gi
Swap:          2.0Gi          0B       2.0Gi
---
Filesystem      Size  Used Avail Use% Mounted on
/dev/sda1        96G   38G   59G  40% /

Command:

ssh root@38.242.240.89 "ss -s"

Output:

Total: 249
TCP:   64 (estab 5, closed 49, orphaned 0, timewait 4)

Assessment:

No CPU, RAM, disk, swap, or socket saturation evidence appeared during the test.
incomex-agent-data memory at 1.097GiB / 2.5GiB leaves meaningful headroom.

5. Part C — Đ31 Chain Hậu Kiểm

5.1 Latest Cron Artifacts

Command:

ssh root@38.242.240.89 "ls -lt /opt/incomex/logs/integrity/cron-*.log | head -3"

Output:

-rw-r--r-- 1 root root 40118 Apr  9 04:49 /opt/incomex/logs/integrity/cron-20260409-044906.log
-rw-r--r-- 1 root root  1040 Apr  9 04:44 /opt/incomex/logs/integrity/cron-20260409-044426.log
-rw-r--r-- 1 root root   494 Apr  9 04:42 /opt/incomex/logs/integrity/cron-20260409-044209.log

This satisfies the user check for a fresh cron log within 7h.

5.2 Latest Cron Summary

Tail evidence from /opt/incomex/logs/integrity/cron-20260409-044906.log:

PASS: 10 | FAIL: 117 | ERROR: 1
Pass Rate: 7.9% (10/127)
Issues Created: 0 | Reopened: 117
WATCHDOG: alive
run_id: cron-20260409-044906

Focused chain grep:

ssh root@38.242.240.89 "grep -nE 'env-contract-check|logrotate-config-check|rsyslog-health-check|WATCHDOG|runner sống|Starting integrity scan|Missing required env' /opt/incomex/logs/integrity/cron-20260409-044906.log | tail -50"

Output:

4:env-contract-check: scanned 5 required vars
8:logrotate-config-check: dry-run complete
11:rsyslog-health-check: status=active, suspend_count_1h=651
14:WARN: rsyslog-health-check detected fault (exit=1). Issue reported. Runner continues.
161:    MSR-D31-WATCHDOG [dieu31] WATCHDOG — runner sống
794:  ⚡ MSR-D31-WATCHDOG: WATCHDOG — WATCHDOG — runner sống
797:    Delta: WATCHDOG — runner alive
804:  WATCHDOG: alive

Assessment:

env-contract-check: PASS
logrotate-config-check: PASS
rsyslog-health-check: NOT clean, fault detected
Runner: alive
Watchdog: alive

Therefore the cron chain is not fully PASS end-to-end.

5.3 `system_issues` Distribution

The user-provided SQL used issue_type, but current rsyslog health code records under issue_class='rsyslog_fault'. The direct query on issue_type returned 0, which is a query mismatch, not proof that rsyslog faults never existed.

Field evidence:

ssh root@38.242.240.89 "docker exec postgres psql -U directus -d directus -Atc \"SELECT issue_class, status, count(*) FROM system_issues WHERE issue_class IN ('config_error','env_drift','logrotate_drift','rsyslog_fault') GROUP BY 1,2 ORDER BY 1,2;\""

Output:

config_error|resolved|1
env_drift|resolved|1
logrotate_drift|resolved|1
rsyslog_fault|resolved|4

Recent rsyslog rows:

ssh root@38.242.240.89 "docker exec postgres psql -U directus -d directus -Atc \"SELECT issue_class, status, code, last_seen_at, resolved_at, occurrence_count FROM system_issues WHERE issue_class='rsyslog_fault' ORDER BY last_seen_at DESC LIMIT 5;\""

Output:

rsyslog_fault|resolved|ISS-2780|2026-04-09 02:49:06+00|2026-04-09 02:49:33.812|1
rsyslog_fault|resolved|ISS-2779|2026-04-09 02:44:27+00|2026-04-09 02:49:33.812|1
rsyslog_fault|resolved|ISS-2778|2026-04-09 02:42:10+00||1
rsyslog_fault|resolved|ISS-2777|2026-04-09 02:41:25+00||1

Interpretation:

The issue records exist and are currently marked resolved.
Two resolved rows show empty resolved_at, so lifecycle metadata is not fully normalized.

Implementation evidence from the local repo:

Relevant behavior:

rsyslog-health-check.sh counts journal suspend events over the last hour and reports issue_class":"rsyslog_fault".
cron-integrity.sh treats this check as warn-only and allows the runner to continue.

6. Conclusion

6.1 Is agent-data CRUD stable?

Yes, with an important caveat.

Measured operations: 90/90 PASS
Observed fail rate: 0%
Concurrent search hardtest produced no retries or failed calls
Search p95 stayed at 4732 ms under concurrent load and 5638 ms in sequential baseline
Heavy search max was 5407 ms
End-to-end CRUD lifecycle passed 5/5

The caveat is one client-observed patch_document latency spike at 115323 ms. That spike is real, but current evidence does not tie it to Qdrant, PostgreSQL, memory pressure, CPU pressure, or backend error logs.

6.2 Is there any silent instability left?

No silent backend failure was found during the hardtest window.

Evidence:

incomex-agent-data: no 500, ERROR, WARNING, Traceback, or exception
postgres: no error|slow|timeout|deadlock
qdrant: no error|warn|timeout|panic|fail
Host and container resource headroom remained healthy

The only notable residual signals are:

the unexplained patch_document latency spike
an unrelated 404 on a missing draft knowledge path
latest Đ31 cron still warning on rsyslog history

6.3 Does the Đ31 chain PASS after the fixes?

Not fully.

What is confirmed PASS:

fresh cron artifact exists within 7h
env contract check ran successfully
logrotate config dry-run completed
runner is alive
watchdog is alive

What is not clean:

latest cron log still reports rsyslog-health-check detected fault
latest integrity run still shows PASS: 10 | FAIL: 117 | ERROR: 1

So the correct conclusion is: Đ31 is revived, but the whole chain is not yet fully green.

7. Unknowns

Root cause of the 115323 ms patch latency spike is not proven by current evidence.
It is not yet proven whether the rsyslog warning in the latest cron log reflects a still-active problem or a stale one-hour lookback that had already self-recovered before the hardtest window.
The system_issues lifecycle metadata for rsyslog_fault is partially inconsistent because some resolved rows have blank resolved_at.

8. Appendix — Commands and Evidence

8.1 MCP Test Operations

Sequential search test:

20 direct calls to search_knowledge(<distinct query>)
Result: 20/20 PASS

Sequential get test:

10 direct calls to get_document(<distinct path>)
Result: 10/10 PASS

Concurrent search stress:

3 rounds x 10 parallel search_knowledge calls
Pause: 5s between rounds
Result: 30/30 PASS

CRUD lifecycle:

For N=1..5:
- upload_document(knowledge/current-state/tests/hardtest-N.md)
- get_document(...)
- patch_document(...)
- search_knowledge(unique patched marker)
- delete_document(...)
Result: 5/5 lifecycle PASS

Cleanup:

list_documents(path="knowledge/current-state/tests")
=> {"items":[],"count":0}

8.2 VPS Log and Resource Commands

ssh root@38.242.240.89 "docker logs incomex-agent-data --since '2026-04-09T05:24:42+02:00' 2>&1 | tail -200"
ssh root@38.242.240.89 "docker logs incomex-qdrant --since '2026-04-09T05:24:42+02:00' 2>&1 | tail -100"
ssh root@38.242.240.89 "docker logs incomex-qdrant --since '2026-04-09T05:24:42+02:00' 2>&1 | grep -iE 'error|warn|timeout|panic|fail' | tail -100"
ssh root@38.242.240.89 "docker logs postgres --since '2026-04-09T05:24:42+02:00' 2>&1 | grep -iE 'error|slow|timeout|deadlock' | tail -50"
ssh root@38.242.240.89 "docker stats --no-stream --format 'table {{.Name}}\t{{.CPUPerc}}\t{{.MemUsage}}\t{{.NetIO}}'"
ssh root@38.242.240.89 "free -h && echo '---' && df -h /"
ssh root@38.242.240.89 "ss -s"

8.3 Đ31 Commands

ssh root@38.242.240.89 "ls -lt /opt/incomex/logs/integrity/cron-*.log | head -3"
ssh root@38.242.240.89 "grep -nE 'env-contract-check|logrotate-config-check|rsyslog-health-check|WATCHDOG|runner sống|Starting integrity scan|Missing required env' /opt/incomex/logs/integrity/cron-20260409-044906.log | tail -50"
ssh root@38.242.240.89 "docker exec postgres psql -U directus -d directus -Atc \"SELECT issue_class, status, count(*) FROM system_issues WHERE issue_class IN ('config_error','env_drift','logrotate_drift','rsyslog_fault') GROUP BY 1,2 ORDER BY 1,2;\""
ssh root@38.242.240.89 "docker exec postgres psql -U directus -d directus -Atc \"SELECT issue_class, status, code, last_seen_at, resolved_at, occurrence_count FROM system_issues WHERE issue_class='rsyslog_fault' ORDER BY last_seen_at DESC LIMIT 5;\""

S174 Hardtest Agent-Data CRUD + Log Stability

1. Executive Summary

2. Test Method

3. Part A — Hardtest CRUD Results

3.1 Summary Table

3.2 A1 Sequential Baseline

3.3 A2 Concurrent Stress

3.4 A3 CRUD Lifecycle

3.5 A4 Heavy Search

4. Part B — Runtime Logs and Resource Evidence

4.1 incomex-agent-data Logs

4.2 Qdrant Logs

4.3 PostgreSQL Logs

4.4 Container and Host Resources

5. Part C — Đ31 Chain Hậu Kiểm

5.1 Latest Cron Artifacts

5.2 Latest Cron Summary

5.3 system_issues Distribution

6. Conclusion

6.1 Is agent-data CRUD stable?

6.2 Is there any silent instability left?

6.3 Does the Đ31 chain PASS after the fixes?

7. Unknowns

8. Appendix — Commands and Evidence

8.1 MCP Test Operations

8.2 VPS Log and Resource Commands

8.3 Đ31 Commands

4.1 `incomex-agent-data` Logs

5.3 `system_issues` Distribution