AgentData MCP Reliability Phase 1 — Route Observability + Timeout Alignment — 2026-05-18
AgentData MCP Reliability Phase 1 — Route Observability + Timeout Alignment — 2026-05-18
0. Scope / governance
Phase: agentdata_mcp_reliability_phase_1_route_observability_and_timeout_alignment
Goal: improve stability and diagnosability of the GPT MCP route for AgentData/KB by aligning route timeouts and enabling sanitized route timing logs.
Production mutation: nginx config only, followed by nginx -t and nginx reload. No DB changes. No AgentData core rewrite. No data deletion. No service restart outside nginx reload.
Secrets policy: no auth headers, request body, API keys, or route secret values are included in this report. GPT route path is redacted.
1. Rules / documents read
.claude/skills/incomex-rules.md: read locally; 36-item workflow acknowledged; no background agents used.search_knowledge("operating rules SSOT"): returnedknowledge/dev/ssot/operating-rules.mdOR v7.58 andknowledge/dev/ssot/vps/vps-operating-rules.mdv1.0.search_knowledge("hiến pháp v4.0 constitution"): current constitution source returnedknowledge/dev/laws/constitution.md, metadata titleHiến pháp Kiến trúc Hệ thống Incomex v4.6.3 BAN HÀNH.- Investigation report read:
knowledge/dev/laws/dieu44-trien-khai/ops/agentdata-mcp-timeout-investigation-2026-05-18.md. - GPT review read:
knowledge/dev/laws/dieu44-trien-khai/ops/agentdata-mcp-timeout-investigation-gpt-review-and-next-actions-2026-05-18.md.
2. 3 cau Tuyen ngon
- Vinh vien: route timeout and request timing are now infrastructure-level signals, not session-specific manual diagnosis.
- Nhầm được không: sanitized access log format excludes auth/header/body/route-secret by construction;
nginx -tgate prevents invalid config reload. - 100% tự động: synthetic probes produce p50/p95/max/status/error evidence; future route failures can be correlated with request_id and upstream timings.
3. Config inspected
Sanitized inspection before patch:
GPT MCP route:
proxy_pass http://agent_data_backend/mcp-gpt-full;
proxy_buffering off;
proxy_read_timeout 60s;
proxy_send_timeout 30s;
proxy_connect_timeout 10s;
access_log off;
Route count: two exact GPT MCP locations: /gpt-mcp/[REDACTED]/mcp and /gpt-mcp/[REDACTED]/mcp/
/api/ AgentData route: proxy_read_timeout 300s
Claude MCP route: already 300s read/send timeout from prior config family
4. Patch proposed/applied
Patch applied, minimal and reversible.
Files changed on VPS host bind mounts:
/opt/incomex/docker/nginx/conf.d/default.conf
/opt/incomex/docker/nginx/secrets/gpt-mcp-route.conf
Backups created:
/opt/incomex/docker/nginx/conf.d/default.conf.bak-agentdata-mcp-phase1-20260518T092354Z
/opt/incomex/docker/nginx/secrets/gpt-mcp-route.conf.bak-agentdata-mcp-phase1-20260518T092354Z
Sanitized effective config after patch:
log_format gpt_mcp_sanitized 'request_id=$request_id status=$status upstream_status=$upstream_status request_time=$request_time upstream_response_time=$upstream_response_time bytes_sent=$bytes_sent';
GPT MCP route:
proxy_pass http://agent_data_backend/mcp-gpt-full;
proxy_buffering off;
proxy_read_timeout 300s;
proxy_send_timeout 300s;
proxy_connect_timeout 10s;
access_log /var/log/nginx/gpt-mcp-access.log gpt_mcp_sanitized;
Forbidden fields not logged:
auth headers: not present
request body: not present
route path/secret: not present
API key: not present
5. nginx -t / reload result
Patch command result:
PATCH_APPLIED_TO_FILES=2
NGINX_T_EXIT=0
nginx: the configuration file /etc/nginx/nginx.conf syntax is ok
nginx: configuration file /etc/nginx/nginx.conf test is successful
NGINX_RELOAD_EXIT=0
2026/05/18 09:23:54 [notice] 2314#2314: signal process started
Post-reload validation:
nginx: the configuration file /etc/nginx/nginx.conf syntax is ok
nginx: configuration file /etc/nginx/nginx.conf test is successful
Rollback was not needed.
6. Before probe results
Probe set: 5 calls each for /api/mcp and GPT route.
BASELINE_PROBE_START
api_mcp list_seed count=5 errors=0 statuses=200 p50=0.037s p95=0.038s max=0.105s sizes=[3490]
api_mcp get_target count=5 errors=0 statuses=200 p50=0.053s p95=0.058s max=0.151s sizes=[10241]
api_mcp list_fabric count=5 errors=0 statuses=200 p50=0.085s p95=0.098s max=0.108s sizes=[11027]
api_mcp batch_small count=5 errors=0 statuses=200 p50=0.028s p95=0.042s max=0.052s sizes=[2340]
gpt_route list_seed count=5 errors=0 statuses=200 p50=0.022s p95=0.034s max=0.078s sizes=[3490]
gpt_route get_target count=5 errors=0 statuses=200 p50=0.029s p95=0.041s max=0.073s sizes=[10241]
gpt_route list_fabric count=5 errors=0 statuses=200 p50=0.045s p95=0.050s max=0.056s sizes=[11027]
gpt_route batch_small count=5 errors=0 statuses=200 p50=0.073s p95=0.118s max=0.149s sizes=[2340]
BASELINE_PROBE_END
7. After probe results
Same probe set after nginx -t + reload.
AFTER_PROBE_START
api_mcp list_seed count=5 errors=0 statuses=200 p50=0.023s p95=0.029s max=0.089s sizes=[3490]
api_mcp get_target count=5 errors=0 statuses=200 p50=0.023s p95=0.030s max=0.046s sizes=[10241]
api_mcp list_fabric count=5 errors=0 statuses=200 p50=0.032s p95=0.047s max=0.057s sizes=[11027]
api_mcp batch_small count=5 errors=0 statuses=200 p50=0.032s p95=0.038s max=0.071s sizes=[2340]
gpt_route list_seed count=5 errors=0 statuses=200 p50=0.028s p95=0.032s max=0.035s sizes=[3490]
gpt_route get_target count=5 errors=0 statuses=200 p50=0.027s p95=0.037s max=0.043s sizes=[10241]
gpt_route list_fabric count=5 errors=0 statuses=200 p50=0.071s p95=0.095s max=0.187s sizes=[11027]
gpt_route batch_small count=5 errors=0 statuses=200 p50=0.026s p95=0.030s max=0.053s sizes=[2340]
AFTER_PROBE_END
8. Timing observability verification
Sanitized GPT route access log is now active.
Sample tail:
request_id=278032cd38e163b451d5b1d60a0a08fc status=200 upstream_status=200 request_time=0.010 upstream_response_time=0.008 bytes_sent=10824
request_id=b34881f19523d693df65f18f71fdc607 status=200 upstream_status=200 request_time=0.090 upstream_response_time=0.062 bytes_sent=11610
request_id=f2249cf217380e4c4d5205f90ec47395 status=200 upstream_status=200 request_time=0.013 upstream_response_time=0.011 bytes_sent=2922
Sensitive pattern check:
SENSITIVE_PATTERN_HITS=0
Checked patterns included X-API-Key, Authorization, jsonrpc, tools/call, gpt-mcp/, and Bearer.
9. Whether latency improved
Route timing stayed healthy before and after. No 503/504 was reproduced in either baseline or after probes.
Observed changes:
/api/mcpprobes generally improved after reload, p95 all below 0.047s except max 0.089s forlist_seed.- GPT route remained healthy.
batch_smallimproved from p95 0.118s to p95 0.030s.list_fabricp95 increased from 0.050s to 0.095s with max 0.187s, still well below risk thresholds. - Primary improvement is not raw latency; it is timeout headroom and route-level timing visibility.
10. Remaining risks
- GPT/OpenAI connector gateway latency can still happen outside VPS/nginx/AgentData; this phase now makes that distinguishable when nginx
request_timeis fast but client tool wall time is slow. - Full document and
batch_read(full=true)remain monolithic payload contracts; large reads can still be fragile through connector layers. - Current log is route-level only; AgentData internal tool duration logging should still be enhanced later for end-to-end request_id correlation.
- Synthetic probes were ad hoc in this phase; a scheduled canary/alert still needs implementation.
11. Next recommendation
Proceed with chunk/cursor API design next, but do it as a separate phase after observing GPT route logs for real traffic.
Recommended next phase:
next_phase: agentdata_mcp_reliability_phase_2_chunk_cursor_and_response_guards
trigger: if GPT route logs show nginx/upstream fast but GPT tool wall time remains slow, or if full/batch reads still fail
scope:
- get_document_chunk(document_id, offset, limit)
- batch_read full=true response byte guard
- partial_available + cursor response shape
- structured AgentData tool duration logs with request_id
Phase 1 status:
status: APPLIED_AND_VERIFIED
nginx_reload: success
route_works_after_patch: yes
request_timing_observable: yes
secrets_logged: no evidence found