GPT MCP Connector Root Cause Timeline Investigation — 2026-05-12
GPT MCP Connector Root Cause Timeline Investigation
Date: 2026-05-12 Agent: Codex/CoreX Scope: GPT connector / Agent Data MCP ClientResponseError investigation Mode: read-mostly investigation; no VPS restart, no Qdrant/PG mutation, no OGV-2C rollback
Executive conclusion
root_cause=GPT_CONNECTOR_WRAPPER_SCHEMA_DRIFT_AND_LIST_RESPONSE_SIZE
The evidence does not support a VPS, Agent Data, Qdrant, PG, nginx upstream, or OGV-2C server-side incident. During the failure window, Agent Data kept returning 200 for /api/mcp, /api/health, /documents/..., and OpenAPI discovery. Container uptime is stable, no deploy/restart occurred in the window, and nginx/Agent Data logs show no 5xx pattern.
The failure is concentrated at the GPT connector/action wrapper layer:
healthCheckis a GPT/OpenAPI-style REST action, not a current MCP tool. Current MCP tool discovery exposes snake_case tools and nohealthCheck.listDocumentsis unsafe in the GPT/action wrapper because it can produce oversized responses. The REST route expectsprefix; if any wrapper mapsprefixincorrectly topath, Agent Data ignores the filter and returns the full KB. Even with correctprefix, thereviewsprefix is large enough to create a 167KB JSON body, and MCP wrapping was observed at 192639 bytes.searchKnowledgeandgetDocumentTruncateduse narrower request/response paths, which explains why they can work whilehealthCheckandlistDocumentsfail.
Fix was not applied because the writable GPT connector registry/gateway is not accessible from this Codex environment. The safe root fix is to update/rebind the GPT connector schema/wrapper, not restart or rollback server infrastructure.
Rules and sources read
.claude/skills/incomex-rules.mdsearch_knowledge("operating rules SSOT")returnedknowledge/dev/ssot/operating-rules.mdwith version marker v7.58.search_knowledge("hiến pháp v4.0 constitution")returnedknowledge/dev/laws/constitution.mdwith version marker v4.6.3.search_knowledge("MCP Agent Data connector healthCheck listDocuments ClientResponseError GPT")returned prior Agent Data MCP diagnostic documents.knowledge/dev/laws/dieu44-trien-khai/notes/opus-diagnostic-agent-data-mcp-connection-gpt-client-side-2026-05-12.mdknowledge/current-state/reports/gpt-mcp-agent-data-healthcheck-2026-04-20.md
Timeline, absolute timestamps
Timezone notes:
- Agent Data and nginx Docker logs below are UTC.
- Vietnam local time = UTC+07.
- VPS local time = CEST UTC+02.
| Timestamp ICT | Timestamp UTC | Event | Evidence/log source | Conclusion |
|---|---|---|---|---|
| 2026-05-12 10:02:05 | 03:02:05Z | /kb/list?prefix=knowledge/dev/laws/dieu44-trien-khai/reports/ returned 200. |
Agent Data log: GET /kb/list?prefix=...reports... 200 OK from connector network. |
KB list route was alive before incident escalation. |
| 2026-05-12 10:05:22 | 03:05:22Z | /kb/list?prefix=.../notes/ returned 200. |
Agent Data log: GET /kb/list?prefix=...notes... 200 OK. |
Correct prefix filtering path worked. |
| 2026-05-12 10:06-10:16 | 03:06-03:16Z | Repeated /mcp and /health 200. |
Agent Data/nginx logs show recurring POST /mcp 200 OK and GET /api/health 200. |
Server was not down. |
| 2026-05-12 10:10:03 | 03:10:03Z | Client attempted OAuth discovery under /api/mcp/.well-known/oauth-authorization-server; received 404; immediately followed by /api/mcp 200 and 204/200 sequence. |
nginx + Agent Data logs. | Connector/proxy performed discovery/reconnect behavior; 404 is on discovery path only, not /mcp failure. |
| 2026-05-12 10:17:08 | 03:17:08Z | Same OAuth discovery 404, followed by /api/mcp 200, 204, 200, 200, 200. |
nginx + Agent Data logs. | Reconnect/schema negotiation continued to work at MCP route. |
| 2026-05-12 10:19:12-10:19:15 | 03:19:12-03:19:15Z | Three /api/mcp calls returned 200 with response sizes 862, 1042, 6222 bytes. |
nginx logs from client IP 123.24.178.152. | Search/get-like MCP calls were succeeding during the window. |
| 2026-05-12 10:20:03-10:20:06 | 03:20:03-03:20:06Z | /api/mcp 200 and /api/health 200 from curl source. |
nginx logs from 38.242.240.89. | Independent health smoke succeeded. |
| 2026-05-12 10:21:21 | 03:21:21Z | GET /api/documents/...opus-gate-review...?... returned 200, 894 bytes, user agent ChatGPT-User/1.0. |
nginx log. | Direct get-document path worked from GPT/browser-style client during the incident window. |
| 2026-05-12 10:21:57 | 03:21:57Z | POST /api/mcp returned 200 but 192639 bytes; nginx warned upstream response buffered to temp file. |
nginx log: upstream response is buffered to a temporary file then POST /api/mcp 200 192639. |
list_documents/large result behavior is confirmed. Server returned 200; client-side size/parsing limits can still fail. |
| 2026-05-12 10:22:49 | 03:22:49Z | Five unauthenticated local JSON-RPC probes returned 401. | Agent Data log from 127.0.0.1; this was Codex's negative probe without API key. |
Not GPT incident evidence; confirms auth gate works. |
| 2026-05-12 10:22:50-10:22:54 | 03:22:50-03:22:54Z | Internal route-size probes showed prefix vs path behavior. |
Codex internal probes, see evidence section. | Wrong REST parameter path returns full KB. |
| 2026-05-12 10:23:51 | 03:23:51Z | OAuth discovery 404 again, immediately followed by /api/mcp 200/204/200/200/200. |
nginx + Agent Data logs. | Discovery 404 is non-fatal; /mcp route kept working. |
| 2026-05-12 10:25:01-10:25:03 | 03:25:01-03:25:03Z | /api/mcp 200 and /api/health 200 from curl source. |
nginx logs. | Server still healthy. |
| 2026-05-12 10:28:53-10:28:56 | 03:28:53-03:28:56Z | Codex direct MCP search_knowledge calls returned 200 with 7362/7958/6589 bytes. |
nginx + Agent Data logs from current Codex session. | Codex MCP connection and search path healthy. |
| 2026-05-12 10:32:25-10:32:34 | 03:32:25-03:32:34Z | Codex-side acceptance smoke: search_knowledge, get_document, list_documents(path=notes) all succeeded. |
Codex MCP tool results; nginx/Agent Data /mcp 200 logs. |
Agent Data MCP server usable from Codex side after incident report. |
Server and deploy state
Observed container state:
incomex-agent-data agent-data-local:latest Up 24 hours (healthy)
incomex-nginx nginx:alpine Up 11 days
postgres postgres:16 Up 3 weeks (healthy)
incomex-qdrant qdrant/qdrant:latest Up 7 weeks (healthy)
agent-data StartedAt=2026-05-11T03:17:28.63541742Z
agent-data Image=sha256:be5a82c4caee3eaed0bbcf5efff51dcf07243e9079b5f47ea77001b7dc67a731
Recent Agent Data git history:
eaf2140 | 2026-05-11 05:19:31 +0200 | P3D: harden vector search rerank
ff2fc25 | 2026-05-11 04:40:41 +0200 | P3D vector search: app-layer path/title boost rerank
a40b217 | 2026-05-07 07:23:52 +0200 | OGV-2C: write gate
Conclusion: no Agent Data deploy, restart, image change, or OGV-2C rollback-worthy event occurred in the 30-minute window.
Tool schema and wrapper evidence
Current MCP tool source exposes:
agent_data/server.py:
MCP_TOOLS includes:
- search_knowledge(query, limit)
- list_documents(path)
- get_document(document_id)
- get_document_for_rewrite(document_id)
No healthCheck MCP tool is defined.
_dispatch_mcp_tool:
if tool_name == "list_documents":
return await list_kb_documents(prefix=args.get("path", "docs"))
Current REST/OpenAPI source exposes GPT/action style operations:
docs/api/openapi.yaml:
POST /chat operationId: searchKnowledge
GET /kb/list operationId: listDocuments, query parameter: prefix
GET /health operationId: healthCheck
Live public OpenAPI check:
https://vps.incomexsaigoncorp.vn/api/openapi.json 200 application/json
listDocuments params [('prefix', 'query')]
https://vps.incomexsaigoncorp.vn/api/health 200 application/json
Interpretation:
- Codex MCP connector uses
mcp__agent_data__.search_knowledge,list_documents(path),get_document(...). - GPT/action reports use camelCase names
searchKnowledge,getDocumentTruncated,listDocuments(prefix),healthCheck. - A GPT wrapper calling
healthCheckas an MCP tool is stale/nonexistent for current MCP. - A GPT wrapper calling REST
/kb/listwithpathinstead ofprefixloses filtering.
Response-size and parameter evidence
Read-only internal route probes against Agent Data:
/kb/list?prefix=knowledge/dev/laws/dieu44-trien-khai/reviews 200 167057 application/json
/kb/list?path=knowledge/dev/laws/dieu44-trien-khai/reviews 200 709367 application/json
/kb/list?prefix=knowledge/dev/laws/dieu44-trien-khai/notes 200 688 application/json
/kb/list?path=knowledge/dev/laws/dieu44-trien-khai/notes 200 709367 application/json
Nginx observed the same class of risk through MCP:
2026-05-12T03:21:57Z warn upstream response is buffered to temp file while reading upstream, request: "POST /api/mcp"
2026-05-12T03:21:57Z "POST /api/mcp HTTP/1.1" 200 192639
Prior KB evidence from 2026-04-20:
listDocuments(prefix="knowledge/") -> ResponseTooLargeError
searchKnowledge and getDocumentTruncated succeeded in the same report
Conclusion: listDocuments has an established oversized-response failure mode. If wrapper mapping is wrong, even a narrow prefix like notes returns full KB. If mapping is correct but prefix is broad, the response can still exceed GPT connector limits.
GPT gateway log access
Direct GPT connector / MCP gateway logs were not accessible from this Codex workspace. Local Codex logs show rmcp request/response for this Codex session and successful Agent Data calls, but they do not contain the GPT-side ClientResponseError stack, request id, trace id, or wrapper parser failure.
Therefore:
- exact GPT
ClientResponseErrorstack: NOT_ACCESSIBLE - GPT connector cached schema before/after refresh: NOT_ACCESSIBLE
- GPT gateway response parser/body limit log: NOT_ACCESSIBLE
This is the remaining uncertainty. The root cause above is still supported by server/proxy/source evidence and by prior GPT Action KB evidence.
Hypothesis matrix
| Hypothesis | Result | Evidence |
|---|---|---|
| H1. GPT connector uses stale schema after schema/tool registry refresh. | PARTIAL_PASS | GPT-facing names are camelCase REST actions; current MCP exposes snake_case only and no healthCheck. OAuth discovery/reconnect events appeared at 03:10, 03:17, 03:23 UTC, but GPT schema cache logs are not accessible. |
H2. GPT wrapper calls healthCheck/listDocuments that no longer match current MCP schema. |
PASS | Current MCP source has no healthCheck; list_documents takes path, while REST OpenAPI listDocuments takes prefix. |
H3. listDocuments wrapper maps prefix to REST path, returning full KB. |
PASS_RISK_CONFIRMED | /kb/list?path=...notes returned 709367 bytes while correct prefix returned 688 bytes. Exact GPT wrapper request body is not accessible, so mapping mistake is confirmed as a live hazard, not directly captured in GPT logs. |
| H4. GPT session/token expired or refresh failed after idle. | NOT_PRIMARY | Server logs show authenticated /api/mcp 200 and /documents 200 in the window. Only 401s observed were Codex's unauthenticated local negative probes from 127.0.0.1. |
| H5. MCP transport/proxy stale connection pool. | NOT_PROVEN | Discovery/reconnect events occurred, but every follow-up /api/mcp completed 200/204. No upstream 5xx/timeout evidence. |
H6. Response size/timeout causes ClientResponseError. |
PASS | Historical ResponseTooLargeError; current /api/mcp 192639 bytes with nginx temp buffering; REST wrong-param full KB is 709367 bytes. |
| H7. Tool registry/config changed in last 30 minutes. | FAIL_NOT_FOUND | No deploy/restart/git change in window. Current source/docs show a standing dual-surface schema drift, not a new server change. |
| H8. OGV-2C a40b217 affected more than create path. | FAIL | OGV-2C was 2026-05-07; Agent Data writes/read/search/list health succeeded after it; no server errors point to write gate. |
| H9. Server-side 500/502. | FAIL | nginx and Agent Data logs show 200/204/401 expected probes; no 500/502/504 pattern in incident window. |
Fix status
No server-side fix was applied in this pack:
- no code change
- no deploy
- no restart
- no Qdrant mutation
- no PG mutation
- no OGV-2C rollback
- no connector registry edit, because GPT connector wrapper registry is not available from this Codex environment
The root fix must be applied in the GPT connector/action wrapper registry:
- Remove stale MCP
healthCheckwrapper, or explicitly map GPThealthCheckto RESTGET /api/health. Do not expose it as an MCP tool unless Agent Data adds an official MCP health tool. - For
listDocuments, use exactly one supported path:- MCP: call
list_documentswithpath. - REST/OpenAPI: call
/api/kb/listwith query parameterprefix.
- MCP: call
- Add default pagination or limit to
listDocuments; do not allow full-KB returns by default. - Add a hard response-size guard with a clear error such as
LIST_DOCUMENTS_RESPONSE_TOO_LARGE_USE_NARROWER_PREFIX, instead of surfacing opaqueClientResponseError. - Add connector schema version/hash and fail fast when GPT cached wrapper names do not match the current route/tool schema.
- Log per call: wrapper name, upstream route, mapped query/body keys, status, response bytes, and parse/body-limit failure reason. Do not log tokens or raw document bodies.
Acceptance tests
GPT-side required 3-round acceptance was not executable from this Codex environment because the GPT connector gateway/wrapper registry is not exposed as a callable surface here. The report must not falsely claim GPT-side PASS.
Codex-side smoke after investigation:
search_knowledge("OGV-2C write gate a40b217") -> PASS, qdrant_hits=5, top result opus-gate-review-ogv-2c-case-closure-2026-05-07.md
get_document("knowledge/dev/laws/dieu44-trien-khai/reviews/opus-gate-review-ogv-2c-case-closure-2026-05-07.md") -> PASS, revision=1, truncated=true
list_documents(path="knowledge/dev/laws/dieu44-trien-khai/notes") -> PASS, count=2
HTTP /api/health -> PASS, 200 application/json
Required GPT-side post-fix test remains:
Round 1-3, spaced a few minutes:
1. searchKnowledge("OGV-2C write gate a40b217")
2. getDocumentTruncated("knowledge/dev/laws/dieu44-trien-khai/reviews/opus-gate-review-ogv-2c-case-closure-2026-05-07.md")
3. listDocuments(prefix="knowledge/dev/laws/dieu44-trien-khai/notes")
4. listDocuments(prefix="knowledge/dev/laws/dieu44-trien-khai/reviews", limit=10) if wrapper supports limit
5. health endpoint/tool officially mapped to HTTP /api/health, or remove stale healthCheck MCP wrapper
PASS criteria after connector fix:
0 ClientResponseError in 3 rounds
logs show listDocuments notes uses prefix/path correctly and does not return full KB
logs show reviews request is limited/paginated or rejected with explicit size error
logs show schema version/hash used by GPT connector
healthCheck either maps to HTTP /api/health or is removed as stale MCP wrapper
Prevention
Permanent prevention should be implemented at connector/schema level:
- publish one canonical connector manifest, generated from the live server schema;
- include schema version/hash in every connector session;
- reject stale wrappers when operation names or input keys drift;
- add contract tests for
listDocumentsmapping:prefixmust not be sent aspathon REST, and MCPpathmust map to serverprefix; - add default
limit/pagination to list routes; - add response-size tests for
notes,reviews, and rootknowledge/; - add log fields for upstream route, mapped query keys, status, bytes, and parser failure class;
- keep
healthCheckas HTTP health unless an official MCP health tool is added.
Final status
phase_status=PARTIAL_BLOCKED_EXTERNAL_GPT_CONNECTOR root_cause=GPT_CONNECTOR_WRAPPER_SCHEMA_DRIFT_AND_LIST_RESPONSE_SIZE confidence=medium_high server_side_incident=false ogv_2c_rollback_needed=false fix_applied=false no_mutation_performed=true_except_report_upload recommended_next_action=GPT_CONNECTOR_SCHEMA_REBIND_AND_LISTDOCUMENTS_PAGINATION_FIX