Phase 2F-C Rev2 — MCP Production Patch Design (2026-05-14)
Phase 2F-C Rev2 — MCP Production Patch Design (revised after GPT + Opus + Codex review)
Date: 2026-05-14
Status: Design only. Supersedes Rev1 (phase-2f-c-mcp-production-patch-design-2026-05-13.md).
Author: Claude Code xhigh (Phase 2F-C Rev2)
Reviewers incorporated: GPT council, Opus, Codex.
Scope: Production patches D1..D4 for AgentData MCP. No code, no commit, no restart, no nginx reload, no ChatGPT reconnection.
Part 1 — New process
Production design changes for AgentData MCP follow this order, no shortcuts:
- Investigate + design — Claude Code (xhigh effort) reads code, schema, logs, runs read-only EXPLAIN/queries, writes a markdown design report to
knowledge/current-state/reports/. - Review — GPT, Opus, and Codex review the design. Each may APPROVE, APPROVE WITH CHANGES, or REJECT.
- Revise — Claude Code addresses all change requests in a new revision (e.g. Rev2) and re-uploads to the same KB folder.
- Approve — once at least one of {GPT, Opus, Codex} marks "approve" without remaining blockers, implementation is unblocked.
- Implement — code change in its own branch/PR. Two-Hat process.
- Test — functional, latency, transport, stress tests per the design's pass criteria. Capture results in
current-state/reports/. - Deploy + verify — VPS deploy, smoke test. Only then re-test external clients (ChatGPT, Claude.ai).
No design report approved => no code merged. This document is the Rev2 of D1..D4 designs awaiting approval.
Part 2 — D1: list_documents final design (Rev2)
2.1 Scope discipline
D1 ships alone, not bundled with D2a. D1 changes only the list_documents data path; it does not change transport, schema, write path, or any other tool.
2.2 Preflight findings (read-only inspection of VPS PG, 2026-05-14)
| Check | Result | Implication |
|---|---|---|
kb_documents total rows |
5 040 | — |
Alive rows (deleted_at IS NULL) |
2 907 | — |
Rows missing document_id |
1 (empty key + empty document_id, ~4 KB body) | Must be filtered: WHERE (data->>'document_id') IS NOT NULL AND length(data->>'document_id') > 0 |
Rows with non-numeric revision JSON |
0 | Safe to (data->>'revision')::int with COALESCE(..., 0) |
key↔document_id mismatch (replace(key,'__','/') = document_id) |
0 | _fs_key mapping is consistent |
document_id containing _ |
1 184 | LIKE escaping of _ is mandatory |
document_id containing % |
0 | Defensive escape still required |
document_id containing \ |
0 | Defensive escape still required |
Conclusion: data is clean enough to push down. The single junk row + 1 184 docs with _ justify both the IS NOT NULL filter and the LIKE escape.
2.3 Final SQL (parameterized)
SELECT
data->>'document_id' AS document_id,
data->>'parent_id' AS parent_id,
COALESCE(data->'metadata'->>'title', '') AS title,
COALESCE(data->'metadata'->'tags', '[]'::jsonb) AS tags,
COALESCE(NULLIF(data->>'revision','')::int, 0) AS revision
FROM kb_documents
WHERE (data->>'deleted_at') IS NULL
AND (data->>'document_id') IS NOT NULL
AND length(data->>'document_id') > 0
AND (
%(prefix)s = ''
OR (data->>'document_id') LIKE %(prefix_like)s ESCAPE '\'
)
ORDER BY (data->>'document_id')
LIMIT %(limit)s OFFSET %(offset)s;
prefix_like is built in Python:
def _like_prefix(prefix: str) -> str:
return (prefix
.replace('\\', '\\\\') # escape backslash FIRST
.replace('%', '\\%')
.replace('_', '\\_')
) + '%'
ESCAPE '\'is set explicitly so PG treats\%and\_as literals.NULLIF(...,'')::intis defensive: even though current data is clean, future inserts that stash an empty string would otherwise raise.data->>'document_id' IS NOT NULLis redundant given current data, but documents intent and survives the one anomalous row.
2.4 total_count semantics
Rev1 left this open. Decision: drop total_count from the response.
- Rationale: dropping it removes a separate
COUNT(*) ... LIKEquery (5–20 ms wall, same shared buffers as the main query) and matches REST convention (next_offset+items+truncated). - Backward compatibility: current MCP description doesn't promise an exact
count. The fieldcountis renamed-in-place to mean "number of items returned" — same name, narrower semantics. Document this in the tool description. - If a client genuinely needs the full count, expose a
?count=trueflag in a later phase; not now.
2.5 Collation & index strategy
Option chosen: partial expression index on ((data->>'document_id') COLLATE "C") text_pattern_ops, plus query ORDER BY/WHERE using the same COLLATE "C". Rationale:
- Default UTF-8 collation makes
LIKE 'prefix%'a filter, not a range scan, because'knowledge/' <= 'knowledge/x' < 'knowledge0'is false under UTF-8 ICU rules (verified —'knowledge/x' < 'knowledge0' = falsein default,trueinCOLLATE "C"). text_pattern_opsalone fixes LIKE-prefix range scans, but the query's ORDER BY must use a comparison compatible with the index forIndex Scan ... no Sort. AddingCOLLATE "C"to both sides keeps the plan a single Index Scan with no sort.
Final migration:
CREATE INDEX CONCURRENTLY IF NOT EXISTS idx_kb_documents_doc_id_c_pattern
ON kb_documents (((data->>'document_id') COLLATE "C") text_pattern_ops)
WHERE (data->>'deleted_at') IS NULL
AND (data->>'document_id') IS NOT NULL
AND length(data->>'document_id') > 0;
Query updated accordingly:
WHERE (data->>'deleted_at') IS NULL
AND (data->>'document_id') IS NOT NULL
AND length(data->>'document_id') > 0
AND ( %(prefix)s = '' OR (data->>'document_id') COLLATE "C" LIKE %(prefix_like)s ESCAPE '\' )
ORDER BY (data->>'document_id') COLLATE "C"
LIMIT %(limit)s OFFSET %(offset)s;
Trade-off documented:
- Sort order becomes byte-wise (ASCII order). Today's behaviour is also byte-wise because Python
str.sort()is byte-wise. So no semantic change. Add a regression test that returns the same first-50 list before and after migration to confirm. - Existing
idx_kb_documents_doc_idstays as a safety net — do not drop until 7 days of clean operation.
2.6 Migration plan
CREATE INDEX CONCURRENTLY rules:
- Cannot be inside a transaction. Run as a single statement via
psql(or a one-off admin script), not via Alembic/transactional migration. - Acquires
SHARE UPDATE EXCLUSIVElock; reads and writes proceed normally. - On failure (e.g. duplicate key, killed connection), leaves an
INVALIDindex. Cleanup:DROP INDEX CONCURRENTLY idx_kb_documents_doc_id_c_pattern;then re-run. - Build time estimate at 2 907 alive rows: <5 s. No measurable load.
Rollback:
DROP INDEX CONCURRENTLY IF EXISTS idx_kb_documents_doc_id_c_pattern;
Also non-blocking. If we also reverted the Python code, the planner falls back to idx_kb_documents_doc_id (current behaviour).
2.7 Pagination policy
- Default
limit= 50 (unchanged). - Max
limit= 100 (unchanged; already enforced server.py:2033). - Default
offset= 0. - Max
offset= 10 000 (new cap). Above this, return422 INVALID_ARGUMENTwith message "offset too large — use a narrower prefix or wait for keyset pagination". Rationale: OFFSET cost grows linearly; capping it prevents O(n^2) DOS. - Keyset cursor is not in D1. Defer to D1-follow-up (2F-D1b) only if real-world prefix usage trips the offset cap. Today's largest prefix (
knowledge/= 2 570 rows) fits in 26 limit-100 pages, well under the cap.
2.8 Backward compatibility
- MCP arg
pathis still accepted and mapped toprefix(today's dispatcher already does this — keep). - MCP args
limit,offsetalready in tool schema (server.py:2756) — keep. - Response shape: identical keys;
countnow means returned-items. Old clients readingcountfor "page size" stay correct; clients readingcountfor "total" need to switch tonext_offset == nullas the "no more pages" signal.
2.9 Risks register
- R1. LIKE escape order matters: backslash first, then
%, then_. Wrong order corrupts the escape. Tested in unit tests. - R2. Partial index
WHERE deleted_at IS NULLrequires the same predicate in the query for the planner to pick it. Already in the SQL above. - R3.
(data->>'document_id') COLLATE "C"is an expression; an expression index must match the query's expression exactly. The migration and the query both use the same form. - R4.
stream_docsis still used byreindex_kb_documents,cleanup_orphan_vectors,_compute_data_integrity,audit_sync. D1 does not touch those. Document a follow-up to switch them to a server-side cursor (SELECT ... FOR FETCH FORWARDor psycopg2 named cursor) so a future 30k-row table doesn't kill them. Not blocking. - R5.
LIST_DOCUMENTS_MAX_RESPONSE_BYTESguard (server.py:2085) stays in place. With LIMIT 100 + lean projection, response is well under the cap.
2.10 Test matrix for D1
| # | Test | Expectation |
|---|---|---|
| T1 | prefix='', limit=50 |
first 50 alive docs, byte-sorted; next_offset=50 |
| T2 | prefix='knowledge/', limit=50 |
first 50 knowledge/* docs; next_offset=50; no leak from knowledgeX |
| T3 | prefix='knowledge_/' (literal underscore expected) |
matches ONLY docs starting knowledge_/, NOT knowledge/ (escape works) |
| T4 | prefix='100%' (literal %) |
matches only literal 100%...; no wildcard expansion |
| T5 | prefix='no/such/path/' |
returns {items:[], count:0, truncated:false, next_offset:null} |
| T6 | soft-deleted doc | excluded from any prefix list |
| T7 | offset=0/24/49/50/99/100/9999 | correct pagination |
| T8 | offset=10001 | 422 INVALID_ARGUMENT |
| T9 | limit=0 / limit=101 | 422 INVALID_ARGUMENT |
| T10 | EXPLAIN ANALYZE on prefix='knowledge/' | plan begins with Limit ... -> Index Scan using idx_kb_documents_doc_id_c_pattern, no Sort, no Seq Scan, no Rows Removed by Filter > 100 |
| T11 | concurrency 5 burst, 60 s | p95 < 250 ms |
| T12 | concurrency 10 burst, 60 s | p95 < 400 ms |
| T13 | backward compat | MCP args.path still works identical to args.prefix |
| T14 | order-stability | same call twice returns same first-50 ordering |
| T15 | anomalous row exclusion | the one empty-key/empty-doc_id row is not returned for any prefix |
2.11 Pass criteria for D1
- All tests T1..T15 green.
- EXPLAIN plan for any of
prefix='',prefix='knowledge/',prefix='knowledge/current-state/reports/'returnsIndex Scanonidx_kb_documents_doc_id_c_patternwithBuffers shared hit + read < 50andExecution Time < 15 ms. - p95 latency:
- single client: < 100 ms
- concurrency 5: < 250 ms
- concurrency 10: < 400 ms
- Zero 5xx in 5 minutes of mixed traffic at concurrency 10.
- No regression in other tools' p95 (sanity: search_knowledge, get_document, patch_document baseline unchanged).
2.12 D1 implementation surface
| Layer | Files | Change type |
|---|---|---|
| DB | one SQL statement (CREATE INDEX CONCURRENTLY ...) applied via SRE |
additive |
| Python — store | agent_data/pg_store.py add list_kb_docs(prefix, limit, offset) -> list[dict] (does NOT replace stream_docs) |
new function |
| Python — server | agent_data/server.py rewrite _list_kb_documents_page to call pg_store.list_kb_docs(...), build response, keep LIST_DOCUMENTS_MAX_RESPONSE_BYTES guard |
logic-only |
| Docs | MCP tool description for list_documents: clarify count semantics; mention next_offset |
doc-only |
| Tests | new pytest module under tests/ covering T1..T15 |
new |
Rollback: revert PR + DROP INDEX CONCURRENTLY idx_kb_documents_doc_id_c_pattern;. Application falls back to stream_docs path automatically.
2.13 Expected impact
| Scenario | Before | After D1 only | After D1 + index |
|---|---|---|---|
| Single, prefix='knowledge/' | ~900 ms | ~90 ms | <30 ms |
| Concurrency 5 p95 | 4 470 ms | ~250 ms | <150 ms |
| Concurrency 10 p95 | (untested, would degrade) | ~400 ms | <250 ms |
Part 3 — D2a: transport / content negotiation (Rev2)
3.1 Honest framing (Opus / GPT correction)
D2a is a production-compliance and compatibility patch. It is NOT proven to be the root cause of the observed ChatGPT MCP App patch UI hang. The fact that list_documents works via the same endpoint over JSON-RPC argues that the JSON-only response is not a hard handshake blocker for all clients. The hang may equally well be:
- A list-latency issue (the 4.47 s p95 from D1) timing out the GPT App's per-tool budget.
- A GPT-side bug in the patch-tool UI rendering.
- A separate gateway/router stuck-session pattern documented elsewhere.
D2a is the right thing to do because the MCP spec (rev 2025-03-26 Streamable HTTP) requires honouring Accept: text/event-stream, and several clients (notably ChatGPT MCP App in some configurations, future Claude.ai connector versions, MCP Inspector) prefer or strictly require SSE. Ship D2a for compliance, then retest ChatGPT in a fresh conversation. If the hang persists, that is its own diagnosis loop and is not a Rev2 deliverable.
3.2 Accept negotiation rules (D2a)
Parse Accept per RFC 9110 S12.5.1 (treat absent or */* as "anything"). Compute two boolean signals from the parsed media-type list:
wants_sse = there exists an item with type "text/event-stream" and effective q > 0
wants_json = there exists an item with type "application/json" or "*/*" and effective q > 0
Decision table:
wants_sse |
wants_json |
Outcome |
|---|---|---|
| false | true (or default) | JSON response (today's behaviour) |
| false | false (only some other type, e.g. application/xml) |
406 Not Acceptable, JSON-RPC error envelope with code: -32700 and message: "Acceptable media types: application/json, text/event-stream" |
| true | false | SSE response |
| true | true | prefer JSON when the JSON q >= SSE q; prefer SSE when SSE q is strictly greater. Tie-break to JSON to preserve current curl/FastMCP behaviour. |
q-value absence implies q=1.0. Malformed Accept (parse error) -> treat as */*, return JSON. Never 500 on a bad Accept.
This rule:
- Keeps
Accept: application/json(today's curl path) byte-identical. - Keeps
Accept: application/json, text/event-stream(Claude.ai connector mode) -> JSON (no behaviour change for any existing tested client). - Activates SSE for
Accept: text/event-stream(ChatGPT MCP App preferred mode) and forAccept: text/event-stream;q=1, application/json;q=0.5. - Returns 406 only for truly incompatible clients — almost no real client.
3.3 SSE frame format (D2a)
Chosen format:
event: message
data: <JSON-RPC envelope>
\n
(One blank line = end-of-event. Connection closes after the single event for a request/response operation.)
Why event: message:
- MCP Streamable HTTP spec (rev 2025-03-26) recommends
messageas the event name for tool/initialise/notification responses. - Required by some implementations of EventSource that filter by event name.
- Adds ~10 bytes per response — negligible.
Headers:
| Header | Value | Reason |
|---|---|---|
Content-Type |
text/event-stream; charset=utf-8 |
spec |
Cache-Control |
no-cache, no-transform |
prevent intermediaries from buffering |
X-Accel-Buffering |
no |
nginx hint, defensive (we already have proxy_buffering off on /gpt-mcp/...) |
Connection |
omit (HTTP/1.1 default is keep-alive; clients close per request) | — |
Body framing rule: serialize the JSON-RPC envelope with json.dumps(payload, default=str, ensure_ascii=False), then build:
"event: message\n" + "data: " + serialized + "\n\n"
For non-tools/call methods that today return 204 No Content (i.e. notifications/*):
- Keep
204 No Contentregardless of Accept. SSE without an envelope is meaningless.
For tools/call errors: produce the in-stream JSON-RPC envelope with result.isError=true and the error text inside result.content[0].text, identical to today's JSON path. Never return HTTP 500. Bad-request parse errors -> JSON-RPC envelope with error.code: -32700.
3.4 GET /mcp (Streamable HTTP server-initiated channel)
D2a decision: GET /mcp returns the existing server info document as JSON (today's behaviour). If Accept: text/event-stream is sent on a GET /mcp, return 405 Method Not Allowed with Allow: POST and a JSON-RPC error body explaining "GET stream not supported in D2a; use POST".
Full Streamable HTTP with server-pushed events (resumable GET stream) is D2b, out of scope for D2a.
3.5 Notification handling
notifications/initialized and other notifications/*:
- Spec: notifications have no
id. Server must accept and return either204 No Contentor no body. - D2a: keep current
204 No Content. Do not switch to202 Accepted(the spec note prefers 204). - Validation: if a JSON-RPC request with no
idand amethodstarting withnotifications/arrives, return 204 immediately, no envelope, no SSE.
3.6 Nginx
/gpt-mcp/{token}/mcplocation — already hasproxy_buffering off,proxy_cache off,chunked_transfer_encoding on. No change required for D2a.- Optional improvement (defer to D2a deploy time): add
proxy_read_timeout 120s;(currently 60s) to allow future longer SSE sessions. Not blocking. /api/location — currently does NOT haveproxy_buffering off. If we ever route MCP through/api/, an SSE response would be buffered. Today's MCP traffic does not use/api/mcp, so no fix required for D2a. Document as a follow-up if/api/mcpis ever published.
3.7 Risks (D2a)
- R6. Decision table edge cases: some HTTP clients send
Accept: *(one-char). Parser must coerce to*/*to avoid 406. Add a unit test. - R7.
event: messagemay render differently across SSE clients; if a future client expects unnamed events, the bodydata: ...\n\nstill parses because EventSource defaults the event name tomessage. Choice is forward-compatible. - R8. Streaming-related Python:
StreamingResponse(iter([body]), ...)in FastAPI flushes one chunk and closes. Confirm uvicorn does NOT add chunked-encoding length headers wrongly (it doesn't; it setsTransfer-Encoding: chunkedautomatically). - R9. Mid-stream errors: if
_dispatch_mcp_toolraises after the firstdata:line is partially sent, the client sees a half-frame. Mitigation: build the envelope dict fully (catch in the JSON layer like today), then convert to the SSE frame string in one shot. - R10. D2a may not unblock ChatGPT MCP App (Opus correction). Production-compliance is its own win. Re-test after D1+D2a; if hang persists, open a separate diagnosis ticket. Do not advertise D2a as the ChatGPT fix.
3.8 Test matrix for D2a
| # | Request | Expected response |
|---|---|---|
| U1 | curl -H 'Accept: application/json' POST /mcp (tools/list) |
Content-Type: application/json; charset=utf-8, single JSON body |
| U2 | curl -H 'Accept: application/json, text/event-stream' POST /mcp (Claude.ai connector mode) |
application/json (tie-break) |
| U3 | curl -H 'Accept: application/json;q=0.5, text/event-stream;q=1' POST /mcp |
text/event-stream, one event: message + data: frame |
| U4 | curl --no-buffer -H 'Accept: text/event-stream' POST /mcp (tools/call) |
text/event-stream, body event: message\ndata: <json>\n\n, connection closes |
| U5 | curl -H 'Accept: application/xml' POST /mcp |
406 with JSON-RPC error envelope |
| U6 | curl -H 'Accept: *' POST /mcp |
JSON (treat as */*) |
| U7 | curl POST /mcp (no Accept header) |
JSON (default) |
| U8 | curl POST /mcp with malformed Accept: ;;; |
JSON (fallback) |
| U9 | POST notifications/initialized (no id) | 204 No Content, no body, regardless of Accept |
| U10 | curl POST /mcp with malformed body (invalid JSON) |
JSON-RPC error.code: -32700, HTTP 200, content-type JSON |
| U11 | GET /mcp with Accept: text/event-stream |
405 Allow: POST |
| U12 | GET /mcp with Accept: application/json |
current JSON info response |
| U13 | FastMCP client (initialize + tools/list + one tools/call) | succeeds against /mcp AND /mcp-gpt AND /mcp-readonly |
| U14 | nginx pass-through: curl --no-buffer ... POST /gpt-mcp/<token>/mcp with Accept: text/event-stream |
first-byte time <50 ms, full body delivered |
| U15 | After D1+D2a deploy: ChatGPT MCP App fresh conversation, attempt search + list + patch on knowledge/test/... |
observed but explicitly not part of the D2a pass criteria (separate diagnosis if it fails) |
3.9 Pass criteria for D2a
- All U1..U14 pass. U15 is observed but not gating.
- Zero regression on existing JSON-only clients (Codex, internal callers, current Claude.ai connector).
- p95 latency for tools/list and a small tools/call <= JSON baseline + 5 ms (SSE adds one string concat).
- No 5xx, no truncated frames.
3.10 D2a implementation surface
| Layer | Files | Change type |
|---|---|---|
| Python — server | agent_data/server.py: add _parse_accept(hdr) returning (wants_json, wants_sse, prefer); add _mcp_respond(payload, accept_header) returning either JSONResponse or StreamingResponse; apply at 4 return sites in mcp_jsonrpc and the 4 return sites in _mcp_filtered_handler |
new helpers + 8 callsite changes |
| Tests | unit + integration covering U1..U14 | new |
| Docs | none (MCP tool schema unchanged) | — |
| Nginx | none required | — |
Rollback: revert PR. Behaviour returns to JSON-only.
Part 4 — D3: move_document policy (Rev2)
4.1 Mismatch summary
- Runtime (
agent_data/server.py:1624-1636):raise _error(501, NOT_IMPLEMENTED, ...)unconditionally. - MCP schema (
MCP_TOOLS): advertisesmove_documentwith full inputSchema. - Allowlists:
move_documentis inMCP_GPT_FULL_TOOLS(not exposed publicly via nginx); NOT inMCP_GPT_TOOLSorMCP_READONLY_TOOLS. - Dispatcher (
_dispatch_mcp_tool): includes themove_documentbranch (still routes to the 501-raising endpoint).
4.2 Policy options
- A. Remove from
MCP_TOOLS(advertised schema) and from_dispatch_mcp_tooland fromMCP_GPT_FULL_TOOLS. KeepPOST /documents/{x}/movereturning 501. - B. Remove only from
MCP_GPT_FULL_TOOLS(so/mcp-gpt-fullno longer routes it); keep advertised in/mcpinternal. - C. Keep advertised everywhere; prefix description with
[DEPRECATED].
4.3 Choice and rationale
Choose A (full removal from MCP schema + dispatcher; keep HTTP endpoint returning 501).
- The MCP tools list is consumed by every LLM client (Claude, Codex, ChatGPT, Cursor) on every session start. Advertising a tool that always errors wastes context budget and produces audit-log noise (
mcp_*_rejectwarnings on every attempt). - Schema/runtime correctness is a one-shot benefit and reads well in audits.
- The
POST /documents/{x}/moveHTTP endpoint stays in place and still returns the structured 501. Any internal script that calls it directly continues to receive the same error code/message — no breakage for known internal callers.
4.4 Backward compatibility risk
- Risk for
/mcpinternal: a Claude / Codex script that has cached a tools/list response containingmove_documentwould attempttools/call name=move_documentand now receive JSON-RPC-32601 Tool not allowedinstead ofresult.content[0].text = "...NOT_IMPLEMENTED...". Both are clearly errors; agents already handle "not allowed" semantics. BumpingconnectorSchemaVersionand_mcp_schema_hash()(auto) signals the change. - Risk for
/mcp-gpt: zero. Already excluded. - Risk for
/mcp-gpt-full: zero public exposure today.
4.5 Tests for D3
| # | Test | Expectation |
|---|---|---|
| V1 | POST /mcp tools/list |
response does NOT include move_document |
| V2 | POST /mcp-gpt-full tools/list |
response does NOT include move_document |
| V3 | POST /mcp tools/call name=move_document |
error.code: -32601, message contains Unknown tool: move_document |
| V4 | POST /mcp-gpt-full tools/call name=move_document |
error.code: -32601, message contains not allowed |
| V5 | POST /documents/x/move directly |
HTTP 501 NOT_IMPLEMENTED (unchanged) |
| V6 | _mcp_schema_hash() value |
changes; CONNECTOR_SCHEMA_VERSION bumped |
4.6 Implementation surface
| Layer | Files | Change |
|---|---|---|
| Python | agent_data/server.py: delete the move_document block from MCP_TOOLS (lines around 2722 + the JSON object); remove "move_document" from MCP_GPT_FULL_TOOLS frozenset; delete the move_document branch in _dispatch_mcp_tool; bump CONNECTOR_SCHEMA_VERSION to e.g. gpt-agent-data-2026-05-14.1 |
edits in 4 places |
| Docs | none | — |
| Tests | V1..V6 | new |
Rollback: revert PR. Schema returns to current state. Dead code below the 501-raise is unchanged.
4.7 Pass criteria for D3
- All V1..V6 pass.
- One PR, one revert path, no DB impact, no nginx impact.
Part 5 — D4: write double-work — canary gate only
5.1 What is NOT shipped in this revision
D4 (removing inline _sync_vector_entry / _delete_vector_entry from create/update/patch/delete handlers) is not designed for implementation in Rev2. Per GPT/Codex feedback, prior to designing the removal we must prove the listener behaviour and avoid creating a duplicated indexing-loss / loop hazard.
5.2 Required evidence (canary)
Before D4 implementation design lands, capture the following from a controlled experiment in a low-traffic window. The canary uses knowledge/test/mcp-stress/ namespace, never test the production data.
| Probe | How | Pass criterion |
|---|---|---|
C1. OpenAI embed calls per upload_document |
log vector_store.embed_calls counter delta before/after a single upload (small doc, 1 chunk) |
exactly 2 today (1 inline, 1 listener) -> must be 1 after D4 |
C2. Qdrant upserts per upload_document |
inspect Qdrant collection counts before/after; assert one chunk:0 point id only |
exactly 1 unique point_id (deterministic via uuid5) — current behaviour already idempotent because of uuid5, but it costs 2 writes |
| C3. PG-NOTIFY listener is alive and consuming | tail container log: pg-vector-sync lines appear within 2 s of write |
every test write has matching listener log line |
C4. vector_status loop check |
run 10 uploads, sample data->>'vector_status'; ensure no pending -> ready -> pending flapping |
no flapping; eventual ready for all |
| C5. Listener resilience | restart pg_vector_listener thread (or simulate disconnect via DB-side pg_terminate_backend); verify reconnect + catchup |
listener reconnects within 5 s; /kb/reindex-missing recovers any gap |
| C6. Lag observation | record MAX(updated_at) for vector_status='pending' at 30 s intervals during a 5-min stress test (10 writes/s) |
p95 lag < 5 s; p99 < 15 s |
| C7. Skip-rule alignment | upload to operations/tasks/comments/... and registries/... paths |
trigger skips (per fn_kb_notify_vector_sync rules); no embed, no Qdrant write — must remain true |
5.3 Risks that the canary must eliminate
- K1. Listener silently drops notifications under PG restart -> vectors fall behind, search misses new docs.
- K2.
vector_status: "pending"written by_sync_vector_entry(inline) might race with listener writing"ready"-> potential UPDATE loop firing NOTIFY again. Confirm trigger filter (the trigger fires only whendata IS DISTINCT FROM, butvector_statuschange qualifies). - K3. Two consecutive
_sync_vector_entrycalls (one inline, one listener) produce the same uuid5 point ids; Qdrant upsert is idempotent — confirm Qdrant client behaves identically with same payload (it does); but listener carries authoritativevector_statuswrite — confirm last-writer-wins behaviour. - K4. Soft-delete race: app writes
deleted_at; listener might receive the prior INSERT notify (queued) and re-embed the about-to-be-deleted doc. Trigger has a guard (v_new_deleted IS NOT NULL-> emit DELETE op), but verify under concurrency.
5.4 Consistency contract (proposed for after canary)
If canary passes:
get_documentafter write: strongly consistent (PG is SSOT, returns latest immediately).search_knowledgeafter write: eventual consistency, target <=2 s p95 lag, 15 s p99 lag.list_documents: strongly consistent (driven by PG).vector_statusstate machine:pendingset by write handler (already in code for create; will be added to update/patch in D4)readyset by listener after Qdrant upserterrorset by listener if embed/upsert fails (already in code)skippedset if content empty (already in code)
- Health metric
vector_sync_lag_seconds = NOW() - MAX(updated_at) FILTER (WHERE vector_status='pending'). Alert at > 30 s. - Recovery: cron-driven
/kb/reindex-missingevery 15 min; on-demand admin trigger.
5.5 D4 implementation gate
D4 design + PR may proceed when:
- Canary C1..C7 are recorded in a follow-up KB report.
- K1..K4 risks are explicitly accepted by GPT/Opus/Codex.
- Lag metric + alert are wired in (or D4 PR includes them).
Until then, D4 is parked.
Part 6 — Revised implementation sequence
Rev2 approved ---> 2F-D1 (list_documents SQL + index) ---> smoke/load test ---> approve
|
v
---> 2F-D2a (transport JSON/SSE) ---> smoke + ChatGPT retry ---> diagnose if hang persists
|
v
---> 2F-D3 (move_document schema cleanup) ---> schema-diff test ---> done
|
v
---> 2F-D4 canary ---> if green, design D4 implementation in Rev3
Each step:
2F-D1 — list_documents SQL pushdown + partial pattern index
- Files:
agent_data/pg_store.py(newlist_kb_docs),agent_data/server.py(rewrite_list_kb_documents_page). - Migration:
CREATE INDEX CONCURRENTLY idx_kb_documents_doc_id_c_pattern ...(one SQL statement, applied out-of-band by SRE; non-transactional). - Tests: T1..T15 per S2.10.
- Rollback: revert PR +
DROP INDEX CONCURRENTLY .... - Pass criteria: S2.11.
- Expected impact: list_documents p95 4.47 s -> <250 ms at concurrency 5; <400 ms at concurrency 10.
2F-D2a — transport content negotiation
- Files:
agent_data/server.py(_parse_accept,_mcp_respond, 8 callsite replacements inmcp_jsonrpc+_mcp_filtered_handler). - Migration: none.
- Tests: U1..U14 per S3.8.
- Rollback: revert PR.
- Pass criteria: S3.9.
- Expected impact: spec-compliant transport. May or may not unblock ChatGPT MCP App; retest in fresh conversation.
2F-D3 — move_document schema cleanup
- Files:
agent_data/server.py(4 small edits — schema list, allowlist, dispatcher branch, version bump). - Migration: none.
- Tests: V1..V6 per S4.5.
- Rollback: revert PR.
- Pass criteria: S4.7.
- Expected impact: correctness only.
2F-D4 canary (NOT implementation)
- Files: none modified (canary uses existing endpoints).
- Migration: none.
- Tests: C1..C7 captured in a follow-up report.
- Rollback: n/a (read-only).
- Pass criteria: S5.3 risks K1..K4 demonstrably bounded; lag metric proposal accepted.
- Expected outcome: Rev3 designs the actual async write change OR documents why staying sync is correct.
Part 7 — Production gate (Rev2)
7.1 Functional (after D1+D2a+D3 — D4 not yet)
| Tool | Pass |
|---|---|
search_knowledge |
>=1 hit on 'operating rules SSOT', p95 < 2 s |
list_documents (prefix='', knowledge/, knowledge/test/, escaped) |
byte-correct results; pagination; no junk row |
get_document |
found = full payload; missing = {error} |
get_document_for_rewrite |
full body returned |
batch_read (3 paths) |
3 items, truncated by default |
upload_document |
sync writes; appears in get_document immediately; in search eventually |
update_document |
revision+1; readback immediate |
patch_document |
one-replace succeeds; ambiguous returns 409 |
delete_document |
soft-delete; excluded from list; vector removed eventually |
move_document |
NOT present in tools/list on any profile (D3) |
ingest_document |
unchanged; only via internal /mcp-gpt-full, not nginx-exposed |
7.2 Latency targets (p95, after D1+D2a+D3 with index)
| Path | Target |
|---|---|
initialize |
< 50 ms |
tools/list |
< 50 ms |
get_document |
< 200 ms |
batch_read (3 paths) |
< 300 ms |
list_documents concurrency 5 |
< 250 ms |
list_documents concurrency 10 |
< 400 ms |
upload_document |
< 1 200 ms (unchanged until D4) |
update_document |
< 1 200 ms (unchanged until D4) |
patch_document |
< 1 200 ms (unchanged until D4) |
search_knowledge |
< 2 000 ms (OpenAI bound, unchanged) |
7.3 Transport pass
Accept: application/json-> JSON, byte-identical to today.Accept: application/json, text/event-stream-> JSON (tie-break, today's clients unchanged).Accept: application/json;q=0.5, text/event-stream-> SSE.Accept: text/event-stream-> SSE with oneevent: message\ndata: <json>\n\nframe.Accept: application/xml-> 406 JSON-RPC error envelope.notifications/*->204 No Contentregardless of Accept.GET /mcpwith SSE Accept -> 405.- FastMCP client full handshake green against
/mcp,/mcp-gpt,/mcp-readonly. - ChatGPT MCP App fresh conversation: observed; not pass-gating. If broken, file a follow-up.
7.4 Stress
- Mixed-tool concurrency 5 and 10 for 60 s.
- Zero 5xx; zero truncated frames; zero Qdrant
wait=Truetimeouts on any read path. - Container CPU < 50 % avg; RSS stable; no thread leak.
7.5 Safety
- No secret in logs (verify after each PR — grep for env values in last 10 min of container log).
/mcp-gpt-fullnot nginx-exposed.delete_documentaudited at nginx layer (re-enable access_log on the/gpt-mcp/...location; recommend low-cardinality format).- Rotate the
/gpt-mcp/{token}URL after D1+D2a+D3 ship (separate, tracked task; not in this design). - All four phases have one-revert rollback documented.
Appendix A — Preflight facts (read-only, 2026-05-14)
kb_documents: 5 040 total, 2 907 alive, 27 MB.- 1 anomalous row: empty key, empty
document_id, 4.4 KB body -> filtered in D1. - 1 184 alive docs contain
_in theirdocument_id-> LIKE escaping mandatory. - 0 alive docs contain
%or\indocument_id-> escapes are defensive but cheap. - 0 alive docs have non-numeric
revisionJSON -> cast safe. - 0 alive docs have
replace(key,'__','/') != document_id->_fs_keymapping consistent. - PG-NOTIFY listener
pg_vector_listeneris already running in production (PG->Qdrant vector sync listener startedin container logs). - nginx
/gpt-mcp/{token}/mcpalready hasproxy_buffering off,chunked_transfer_encoding on,proxy_cache off— SSE-ready. - 6 PG triggers on
kb_documents;trg_kb_vector_syncfirespg_notify('kb_vector_sync', json)on INSERT/UPDATE/DELETE. move_documentraises 501 unconditionally at function entry (server.py:1624-1636); dead code below.
Appendix B — Open follow-ups (post-Rev2)
- Decide if
countsemantics inlist_documentsresponse need explicit alias (returned_countalready present alongside). - Bump
proxy_read_timeouton/gpt-mcp/{token}to 120 s when D2a deploys (optional). - Add structured access log on
/gpt-mcp/{token}for audit (recommend yes). - Rotate
/gpt-mcp/{token}URL after D1+D2a+D3 ship (separate ticket). - Run D4 canary C1..C7 and publish results in a new KB report; gates D4 design.
- Switch non-
list_documentscallers ofstream_docs(reindex_kb_documents,cleanup_orphan_vectors,_compute_data_integrity,audit_sync) to a server-side cursor when row count >> 10k. Not blocking today. - If ChatGPT MCP App still hangs after D1+D2a, open a separate diagnosis ticket (don't blame D2a in the report).
Summary
| Phase | Goal | Status |
|---|---|---|
| 2F-D1 | list_documents SQL + index | designed (Rev2); awaits approval |
| 2F-D2a | transport JSON+SSE | designed (Rev2); awaits approval; NOT proven to fix ChatGPT |
| 2F-D3 | move_document schema cleanup | designed (Rev2); awaits approval |
| 2F-D4 canary | prove listener safety | required before any D4 implementation design |
| 2F-D4 implementation | async vector via listener | parked until canary green |
Implementation order: D1 -> D2a -> D3 -> D4 canary -> (later) D4 design.
End of Rev2. No code modified. No commit. No restart. No nginx reload. No ChatGPT reconnection.