KB-1EF2

Phase 2F-C Rev2 — MCP Production Patch Design (2026-05-14)

38 min read Revision 1
phase-2fmcpdesignrev2productionagent-datalist_documentstransportssemove_documentwrite-latencygpt-reviewopus-reviewcodex-review

Phase 2F-C Rev2 — MCP Production Patch Design (revised after GPT + Opus + Codex review)

Date: 2026-05-14 Status: Design only. Supersedes Rev1 (phase-2f-c-mcp-production-patch-design-2026-05-13.md). Author: Claude Code xhigh (Phase 2F-C Rev2) Reviewers incorporated: GPT council, Opus, Codex. Scope: Production patches D1..D4 for AgentData MCP. No code, no commit, no restart, no nginx reload, no ChatGPT reconnection.


Part 1 — New process

Production design changes for AgentData MCP follow this order, no shortcuts:

  1. Investigate + design — Claude Code (xhigh effort) reads code, schema, logs, runs read-only EXPLAIN/queries, writes a markdown design report to knowledge/current-state/reports/.
  2. Review — GPT, Opus, and Codex review the design. Each may APPROVE, APPROVE WITH CHANGES, or REJECT.
  3. Revise — Claude Code addresses all change requests in a new revision (e.g. Rev2) and re-uploads to the same KB folder.
  4. Approve — once at least one of {GPT, Opus, Codex} marks "approve" without remaining blockers, implementation is unblocked.
  5. Implement — code change in its own branch/PR. Two-Hat process.
  6. Test — functional, latency, transport, stress tests per the design's pass criteria. Capture results in current-state/reports/.
  7. Deploy + verify — VPS deploy, smoke test. Only then re-test external clients (ChatGPT, Claude.ai).

No design report approved => no code merged. This document is the Rev2 of D1..D4 designs awaiting approval.


Part 2 — D1: list_documents final design (Rev2)

2.1 Scope discipline

D1 ships alone, not bundled with D2a. D1 changes only the list_documents data path; it does not change transport, schema, write path, or any other tool.

2.2 Preflight findings (read-only inspection of VPS PG, 2026-05-14)

Check Result Implication
kb_documents total rows 5 040
Alive rows (deleted_at IS NULL) 2 907
Rows missing document_id 1 (empty key + empty document_id, ~4 KB body) Must be filtered: WHERE (data->>'document_id') IS NOT NULL AND length(data->>'document_id') > 0
Rows with non-numeric revision JSON 0 Safe to (data->>'revision')::int with COALESCE(..., 0)
keydocument_id mismatch (replace(key,'__','/') = document_id) 0 _fs_key mapping is consistent
document_id containing _ 1 184 LIKE escaping of _ is mandatory
document_id containing % 0 Defensive escape still required
document_id containing \ 0 Defensive escape still required

Conclusion: data is clean enough to push down. The single junk row + 1 184 docs with _ justify both the IS NOT NULL filter and the LIKE escape.

2.3 Final SQL (parameterized)

SELECT
  data->>'document_id'                           AS document_id,
  data->>'parent_id'                             AS parent_id,
  COALESCE(data->'metadata'->>'title', '')       AS title,
  COALESCE(data->'metadata'->'tags', '[]'::jsonb) AS tags,
  COALESCE(NULLIF(data->>'revision','')::int, 0) AS revision
FROM kb_documents
WHERE (data->>'deleted_at') IS NULL
  AND (data->>'document_id') IS NOT NULL
  AND length(data->>'document_id') > 0
  AND (
       %(prefix)s = ''
       OR (data->>'document_id') LIKE %(prefix_like)s ESCAPE '\'
  )
ORDER BY (data->>'document_id')
LIMIT %(limit)s OFFSET %(offset)s;

prefix_like is built in Python:

def _like_prefix(prefix: str) -> str:
    return (prefix
            .replace('\\', '\\\\')   # escape backslash FIRST
            .replace('%',  '\\%')
            .replace('_',  '\\_')
            ) + '%'
  • ESCAPE '\' is set explicitly so PG treats \% and \_ as literals.
  • NULLIF(...,'')::int is defensive: even though current data is clean, future inserts that stash an empty string would otherwise raise.
  • data->>'document_id' IS NOT NULL is redundant given current data, but documents intent and survives the one anomalous row.

2.4 total_count semantics

Rev1 left this open. Decision: drop total_count from the response.

  • Rationale: dropping it removes a separate COUNT(*) ... LIKE query (5–20 ms wall, same shared buffers as the main query) and matches REST convention (next_offset + items + truncated).
  • Backward compatibility: current MCP description doesn't promise an exact count. The field count is renamed-in-place to mean "number of items returned" — same name, narrower semantics. Document this in the tool description.
  • If a client genuinely needs the full count, expose a ?count=true flag in a later phase; not now.

2.5 Collation & index strategy

Option chosen: partial expression index on ((data->>'document_id') COLLATE "C") text_pattern_ops, plus query ORDER BY/WHERE using the same COLLATE "C". Rationale:

  • Default UTF-8 collation makes LIKE 'prefix%' a filter, not a range scan, because 'knowledge/' <= 'knowledge/x' < 'knowledge0' is false under UTF-8 ICU rules (verified — 'knowledge/x' < 'knowledge0' = false in default, true in COLLATE "C").
  • text_pattern_ops alone fixes LIKE-prefix range scans, but the query's ORDER BY must use a comparison compatible with the index for Index Scan ... no Sort. Adding COLLATE "C" to both sides keeps the plan a single Index Scan with no sort.

Final migration:

CREATE INDEX CONCURRENTLY IF NOT EXISTS idx_kb_documents_doc_id_c_pattern
  ON kb_documents (((data->>'document_id') COLLATE "C") text_pattern_ops)
  WHERE (data->>'deleted_at') IS NULL
    AND (data->>'document_id') IS NOT NULL
    AND length(data->>'document_id') > 0;

Query updated accordingly:

WHERE (data->>'deleted_at') IS NULL
  AND (data->>'document_id') IS NOT NULL
  AND length(data->>'document_id') > 0
  AND ( %(prefix)s = '' OR (data->>'document_id') COLLATE "C" LIKE %(prefix_like)s ESCAPE '\' )
ORDER BY (data->>'document_id') COLLATE "C"
LIMIT %(limit)s OFFSET %(offset)s;

Trade-off documented:

  • Sort order becomes byte-wise (ASCII order). Today's behaviour is also byte-wise because Python str.sort() is byte-wise. So no semantic change. Add a regression test that returns the same first-50 list before and after migration to confirm.
  • Existing idx_kb_documents_doc_id stays as a safety net — do not drop until 7 days of clean operation.

2.6 Migration plan

CREATE INDEX CONCURRENTLY rules:

  • Cannot be inside a transaction. Run as a single statement via psql (or a one-off admin script), not via Alembic/transactional migration.
  • Acquires SHARE UPDATE EXCLUSIVE lock; reads and writes proceed normally.
  • On failure (e.g. duplicate key, killed connection), leaves an INVALID index. Cleanup: DROP INDEX CONCURRENTLY idx_kb_documents_doc_id_c_pattern; then re-run.
  • Build time estimate at 2 907 alive rows: <5 s. No measurable load.

Rollback:

DROP INDEX CONCURRENTLY IF EXISTS idx_kb_documents_doc_id_c_pattern;

Also non-blocking. If we also reverted the Python code, the planner falls back to idx_kb_documents_doc_id (current behaviour).

2.7 Pagination policy

  • Default limit = 50 (unchanged).
  • Max limit = 100 (unchanged; already enforced server.py:2033).
  • Default offset = 0.
  • Max offset = 10 000 (new cap). Above this, return 422 INVALID_ARGUMENT with message "offset too large — use a narrower prefix or wait for keyset pagination". Rationale: OFFSET cost grows linearly; capping it prevents O(n^2) DOS.
  • Keyset cursor is not in D1. Defer to D1-follow-up (2F-D1b) only if real-world prefix usage trips the offset cap. Today's largest prefix (knowledge/ = 2 570 rows) fits in 26 limit-100 pages, well under the cap.

2.8 Backward compatibility

  • MCP arg path is still accepted and mapped to prefix (today's dispatcher already does this — keep).
  • MCP args limit, offset already in tool schema (server.py:2756) — keep.
  • Response shape: identical keys; count now means returned-items. Old clients reading count for "page size" stay correct; clients reading count for "total" need to switch to next_offset == null as the "no more pages" signal.

2.9 Risks register

  • R1. LIKE escape order matters: backslash first, then %, then _. Wrong order corrupts the escape. Tested in unit tests.
  • R2. Partial index WHERE deleted_at IS NULL requires the same predicate in the query for the planner to pick it. Already in the SQL above.
  • R3. (data->>'document_id') COLLATE "C" is an expression; an expression index must match the query's expression exactly. The migration and the query both use the same form.
  • R4. stream_docs is still used by reindex_kb_documents, cleanup_orphan_vectors, _compute_data_integrity, audit_sync. D1 does not touch those. Document a follow-up to switch them to a server-side cursor (SELECT ... FOR FETCH FORWARD or psycopg2 named cursor) so a future 30k-row table doesn't kill them. Not blocking.
  • R5. LIST_DOCUMENTS_MAX_RESPONSE_BYTES guard (server.py:2085) stays in place. With LIMIT 100 + lean projection, response is well under the cap.

2.10 Test matrix for D1

# Test Expectation
T1 prefix='', limit=50 first 50 alive docs, byte-sorted; next_offset=50
T2 prefix='knowledge/', limit=50 first 50 knowledge/* docs; next_offset=50; no leak from knowledgeX
T3 prefix='knowledge_/' (literal underscore expected) matches ONLY docs starting knowledge_/, NOT knowledge/ (escape works)
T4 prefix='100%' (literal %) matches only literal 100%...; no wildcard expansion
T5 prefix='no/such/path/' returns {items:[], count:0, truncated:false, next_offset:null}
T6 soft-deleted doc excluded from any prefix list
T7 offset=0/24/49/50/99/100/9999 correct pagination
T8 offset=10001 422 INVALID_ARGUMENT
T9 limit=0 / limit=101 422 INVALID_ARGUMENT
T10 EXPLAIN ANALYZE on prefix='knowledge/' plan begins with Limit ... -> Index Scan using idx_kb_documents_doc_id_c_pattern, no Sort, no Seq Scan, no Rows Removed by Filter > 100
T11 concurrency 5 burst, 60 s p95 < 250 ms
T12 concurrency 10 burst, 60 s p95 < 400 ms
T13 backward compat MCP args.path still works identical to args.prefix
T14 order-stability same call twice returns same first-50 ordering
T15 anomalous row exclusion the one empty-key/empty-doc_id row is not returned for any prefix

2.11 Pass criteria for D1

  • All tests T1..T15 green.
  • EXPLAIN plan for any of prefix='', prefix='knowledge/', prefix='knowledge/current-state/reports/' returns Index Scan on idx_kb_documents_doc_id_c_pattern with Buffers shared hit + read < 50 and Execution Time < 15 ms.
  • p95 latency:
    • single client: < 100 ms
    • concurrency 5: < 250 ms
    • concurrency 10: < 400 ms
  • Zero 5xx in 5 minutes of mixed traffic at concurrency 10.
  • No regression in other tools' p95 (sanity: search_knowledge, get_document, patch_document baseline unchanged).

2.12 D1 implementation surface

Layer Files Change type
DB one SQL statement (CREATE INDEX CONCURRENTLY ...) applied via SRE additive
Python — store agent_data/pg_store.py add list_kb_docs(prefix, limit, offset) -> list[dict] (does NOT replace stream_docs) new function
Python — server agent_data/server.py rewrite _list_kb_documents_page to call pg_store.list_kb_docs(...), build response, keep LIST_DOCUMENTS_MAX_RESPONSE_BYTES guard logic-only
Docs MCP tool description for list_documents: clarify count semantics; mention next_offset doc-only
Tests new pytest module under tests/ covering T1..T15 new

Rollback: revert PR + DROP INDEX CONCURRENTLY idx_kb_documents_doc_id_c_pattern;. Application falls back to stream_docs path automatically.

2.13 Expected impact

Scenario Before After D1 only After D1 + index
Single, prefix='knowledge/' ~900 ms ~90 ms <30 ms
Concurrency 5 p95 4 470 ms ~250 ms <150 ms
Concurrency 10 p95 (untested, would degrade) ~400 ms <250 ms

Part 3 — D2a: transport / content negotiation (Rev2)

3.1 Honest framing (Opus / GPT correction)

D2a is a production-compliance and compatibility patch. It is NOT proven to be the root cause of the observed ChatGPT MCP App patch UI hang. The fact that list_documents works via the same endpoint over JSON-RPC argues that the JSON-only response is not a hard handshake blocker for all clients. The hang may equally well be:

  • A list-latency issue (the 4.47 s p95 from D1) timing out the GPT App's per-tool budget.
  • A GPT-side bug in the patch-tool UI rendering.
  • A separate gateway/router stuck-session pattern documented elsewhere.

D2a is the right thing to do because the MCP spec (rev 2025-03-26 Streamable HTTP) requires honouring Accept: text/event-stream, and several clients (notably ChatGPT MCP App in some configurations, future Claude.ai connector versions, MCP Inspector) prefer or strictly require SSE. Ship D2a for compliance, then retest ChatGPT in a fresh conversation. If the hang persists, that is its own diagnosis loop and is not a Rev2 deliverable.

3.2 Accept negotiation rules (D2a)

Parse Accept per RFC 9110 S12.5.1 (treat absent or */* as "anything"). Compute two boolean signals from the parsed media-type list:

wants_sse  = there exists an item with type "text/event-stream" and effective q > 0
wants_json = there exists an item with type "application/json" or "*/*" and effective q > 0

Decision table:

wants_sse wants_json Outcome
false true (or default) JSON response (today's behaviour)
false false (only some other type, e.g. application/xml) 406 Not Acceptable, JSON-RPC error envelope with code: -32700 and message: "Acceptable media types: application/json, text/event-stream"
true false SSE response
true true prefer JSON when the JSON q >= SSE q; prefer SSE when SSE q is strictly greater. Tie-break to JSON to preserve current curl/FastMCP behaviour.

q-value absence implies q=1.0. Malformed Accept (parse error) -> treat as */*, return JSON. Never 500 on a bad Accept.

This rule:

  • Keeps Accept: application/json (today's curl path) byte-identical.
  • Keeps Accept: application/json, text/event-stream (Claude.ai connector mode) -> JSON (no behaviour change for any existing tested client).
  • Activates SSE for Accept: text/event-stream (ChatGPT MCP App preferred mode) and for Accept: text/event-stream;q=1, application/json;q=0.5.
  • Returns 406 only for truly incompatible clients — almost no real client.

3.3 SSE frame format (D2a)

Chosen format:

event: message
data: <JSON-RPC envelope>
\n

(One blank line = end-of-event. Connection closes after the single event for a request/response operation.)

Why event: message:

  • MCP Streamable HTTP spec (rev 2025-03-26) recommends message as the event name for tool/initialise/notification responses.
  • Required by some implementations of EventSource that filter by event name.
  • Adds ~10 bytes per response — negligible.

Headers:

Header Value Reason
Content-Type text/event-stream; charset=utf-8 spec
Cache-Control no-cache, no-transform prevent intermediaries from buffering
X-Accel-Buffering no nginx hint, defensive (we already have proxy_buffering off on /gpt-mcp/...)
Connection omit (HTTP/1.1 default is keep-alive; clients close per request)

Body framing rule: serialize the JSON-RPC envelope with json.dumps(payload, default=str, ensure_ascii=False), then build:

"event: message\n" + "data: " + serialized + "\n\n"

For non-tools/call methods that today return 204 No Content (i.e. notifications/*):

  • Keep 204 No Content regardless of Accept. SSE without an envelope is meaningless.

For tools/call errors: produce the in-stream JSON-RPC envelope with result.isError=true and the error text inside result.content[0].text, identical to today's JSON path. Never return HTTP 500. Bad-request parse errors -> JSON-RPC envelope with error.code: -32700.

3.4 GET /mcp (Streamable HTTP server-initiated channel)

D2a decision: GET /mcp returns the existing server info document as JSON (today's behaviour). If Accept: text/event-stream is sent on a GET /mcp, return 405 Method Not Allowed with Allow: POST and a JSON-RPC error body explaining "GET stream not supported in D2a; use POST".

Full Streamable HTTP with server-pushed events (resumable GET stream) is D2b, out of scope for D2a.

3.5 Notification handling

notifications/initialized and other notifications/*:

  • Spec: notifications have no id. Server must accept and return either 204 No Content or no body.
  • D2a: keep current 204 No Content. Do not switch to 202 Accepted (the spec note prefers 204).
  • Validation: if a JSON-RPC request with no id and a method starting with notifications/ arrives, return 204 immediately, no envelope, no SSE.

3.6 Nginx

  • /gpt-mcp/{token}/mcp location — already has proxy_buffering off, proxy_cache off, chunked_transfer_encoding on. No change required for D2a.
  • Optional improvement (defer to D2a deploy time): add proxy_read_timeout 120s; (currently 60s) to allow future longer SSE sessions. Not blocking.
  • /api/ location — currently does NOT have proxy_buffering off. If we ever route MCP through /api/, an SSE response would be buffered. Today's MCP traffic does not use /api/mcp, so no fix required for D2a. Document as a follow-up if /api/mcp is ever published.

3.7 Risks (D2a)

  • R6. Decision table edge cases: some HTTP clients send Accept: * (one-char). Parser must coerce to */* to avoid 406. Add a unit test.
  • R7. event: message may render differently across SSE clients; if a future client expects unnamed events, the body data: ...\n\n still parses because EventSource defaults the event name to message. Choice is forward-compatible.
  • R8. Streaming-related Python: StreamingResponse(iter([body]), ...) in FastAPI flushes one chunk and closes. Confirm uvicorn does NOT add chunked-encoding length headers wrongly (it doesn't; it sets Transfer-Encoding: chunked automatically).
  • R9. Mid-stream errors: if _dispatch_mcp_tool raises after the first data: line is partially sent, the client sees a half-frame. Mitigation: build the envelope dict fully (catch in the JSON layer like today), then convert to the SSE frame string in one shot.
  • R10. D2a may not unblock ChatGPT MCP App (Opus correction). Production-compliance is its own win. Re-test after D1+D2a; if hang persists, open a separate diagnosis ticket. Do not advertise D2a as the ChatGPT fix.

3.8 Test matrix for D2a

# Request Expected response
U1 curl -H 'Accept: application/json' POST /mcp (tools/list) Content-Type: application/json; charset=utf-8, single JSON body
U2 curl -H 'Accept: application/json, text/event-stream' POST /mcp (Claude.ai connector mode) application/json (tie-break)
U3 curl -H 'Accept: application/json;q=0.5, text/event-stream;q=1' POST /mcp text/event-stream, one event: message + data: frame
U4 curl --no-buffer -H 'Accept: text/event-stream' POST /mcp (tools/call) text/event-stream, body event: message\ndata: <json>\n\n, connection closes
U5 curl -H 'Accept: application/xml' POST /mcp 406 with JSON-RPC error envelope
U6 curl -H 'Accept: *' POST /mcp JSON (treat as */*)
U7 curl POST /mcp (no Accept header) JSON (default)
U8 curl POST /mcp with malformed Accept: ;;; JSON (fallback)
U9 POST notifications/initialized (no id) 204 No Content, no body, regardless of Accept
U10 curl POST /mcp with malformed body (invalid JSON) JSON-RPC error.code: -32700, HTTP 200, content-type JSON
U11 GET /mcp with Accept: text/event-stream 405 Allow: POST
U12 GET /mcp with Accept: application/json current JSON info response
U13 FastMCP client (initialize + tools/list + one tools/call) succeeds against /mcp AND /mcp-gpt AND /mcp-readonly
U14 nginx pass-through: curl --no-buffer ... POST /gpt-mcp/<token>/mcp with Accept: text/event-stream first-byte time <50 ms, full body delivered
U15 After D1+D2a deploy: ChatGPT MCP App fresh conversation, attempt search + list + patch on knowledge/test/... observed but explicitly not part of the D2a pass criteria (separate diagnosis if it fails)

3.9 Pass criteria for D2a

  • All U1..U14 pass. U15 is observed but not gating.
  • Zero regression on existing JSON-only clients (Codex, internal callers, current Claude.ai connector).
  • p95 latency for tools/list and a small tools/call <= JSON baseline + 5 ms (SSE adds one string concat).
  • No 5xx, no truncated frames.

3.10 D2a implementation surface

Layer Files Change type
Python — server agent_data/server.py: add _parse_accept(hdr) returning (wants_json, wants_sse, prefer); add _mcp_respond(payload, accept_header) returning either JSONResponse or StreamingResponse; apply at 4 return sites in mcp_jsonrpc and the 4 return sites in _mcp_filtered_handler new helpers + 8 callsite changes
Tests unit + integration covering U1..U14 new
Docs none (MCP tool schema unchanged)
Nginx none required

Rollback: revert PR. Behaviour returns to JSON-only.


Part 4 — D3: move_document policy (Rev2)

4.1 Mismatch summary

  • Runtime (agent_data/server.py:1624-1636): raise _error(501, NOT_IMPLEMENTED, ...) unconditionally.
  • MCP schema (MCP_TOOLS): advertises move_document with full inputSchema.
  • Allowlists: move_document is in MCP_GPT_FULL_TOOLS (not exposed publicly via nginx); NOT in MCP_GPT_TOOLS or MCP_READONLY_TOOLS.
  • Dispatcher (_dispatch_mcp_tool): includes the move_document branch (still routes to the 501-raising endpoint).

4.2 Policy options

  • A. Remove from MCP_TOOLS (advertised schema) and from _dispatch_mcp_tool and from MCP_GPT_FULL_TOOLS. Keep POST /documents/{x}/move returning 501.
  • B. Remove only from MCP_GPT_FULL_TOOLS (so /mcp-gpt-full no longer routes it); keep advertised in /mcp internal.
  • C. Keep advertised everywhere; prefix description with [DEPRECATED].

4.3 Choice and rationale

Choose A (full removal from MCP schema + dispatcher; keep HTTP endpoint returning 501).

  • The MCP tools list is consumed by every LLM client (Claude, Codex, ChatGPT, Cursor) on every session start. Advertising a tool that always errors wastes context budget and produces audit-log noise (mcp_*_reject warnings on every attempt).
  • Schema/runtime correctness is a one-shot benefit and reads well in audits.
  • The POST /documents/{x}/move HTTP endpoint stays in place and still returns the structured 501. Any internal script that calls it directly continues to receive the same error code/message — no breakage for known internal callers.

4.4 Backward compatibility risk

  • Risk for /mcp internal: a Claude / Codex script that has cached a tools/list response containing move_document would attempt tools/call name=move_document and now receive JSON-RPC -32601 Tool not allowed instead of result.content[0].text = "...NOT_IMPLEMENTED...". Both are clearly errors; agents already handle "not allowed" semantics. Bumping connectorSchemaVersion and _mcp_schema_hash() (auto) signals the change.
  • Risk for /mcp-gpt: zero. Already excluded.
  • Risk for /mcp-gpt-full: zero public exposure today.

4.5 Tests for D3

# Test Expectation
V1 POST /mcp tools/list response does NOT include move_document
V2 POST /mcp-gpt-full tools/list response does NOT include move_document
V3 POST /mcp tools/call name=move_document error.code: -32601, message contains Unknown tool: move_document
V4 POST /mcp-gpt-full tools/call name=move_document error.code: -32601, message contains not allowed
V5 POST /documents/x/move directly HTTP 501 NOT_IMPLEMENTED (unchanged)
V6 _mcp_schema_hash() value changes; CONNECTOR_SCHEMA_VERSION bumped

4.6 Implementation surface

Layer Files Change
Python agent_data/server.py: delete the move_document block from MCP_TOOLS (lines around 2722 + the JSON object); remove "move_document" from MCP_GPT_FULL_TOOLS frozenset; delete the move_document branch in _dispatch_mcp_tool; bump CONNECTOR_SCHEMA_VERSION to e.g. gpt-agent-data-2026-05-14.1 edits in 4 places
Docs none
Tests V1..V6 new

Rollback: revert PR. Schema returns to current state. Dead code below the 501-raise is unchanged.

4.7 Pass criteria for D3

  • All V1..V6 pass.
  • One PR, one revert path, no DB impact, no nginx impact.

Part 5 — D4: write double-work — canary gate only

5.1 What is NOT shipped in this revision

D4 (removing inline _sync_vector_entry / _delete_vector_entry from create/update/patch/delete handlers) is not designed for implementation in Rev2. Per GPT/Codex feedback, prior to designing the removal we must prove the listener behaviour and avoid creating a duplicated indexing-loss / loop hazard.

5.2 Required evidence (canary)

Before D4 implementation design lands, capture the following from a controlled experiment in a low-traffic window. The canary uses knowledge/test/mcp-stress/ namespace, never test the production data.

Probe How Pass criterion
C1. OpenAI embed calls per upload_document log vector_store.embed_calls counter delta before/after a single upload (small doc, 1 chunk) exactly 2 today (1 inline, 1 listener) -> must be 1 after D4
C2. Qdrant upserts per upload_document inspect Qdrant collection counts before/after; assert one chunk:0 point id only exactly 1 unique point_id (deterministic via uuid5) — current behaviour already idempotent because of uuid5, but it costs 2 writes
C3. PG-NOTIFY listener is alive and consuming tail container log: pg-vector-sync lines appear within 2 s of write every test write has matching listener log line
C4. vector_status loop check run 10 uploads, sample data->>'vector_status'; ensure no pending -> ready -> pending flapping no flapping; eventual ready for all
C5. Listener resilience restart pg_vector_listener thread (or simulate disconnect via DB-side pg_terminate_backend); verify reconnect + catchup listener reconnects within 5 s; /kb/reindex-missing recovers any gap
C6. Lag observation record MAX(updated_at) for vector_status='pending' at 30 s intervals during a 5-min stress test (10 writes/s) p95 lag < 5 s; p99 < 15 s
C7. Skip-rule alignment upload to operations/tasks/comments/... and registries/... paths trigger skips (per fn_kb_notify_vector_sync rules); no embed, no Qdrant write — must remain true

5.3 Risks that the canary must eliminate

  • K1. Listener silently drops notifications under PG restart -> vectors fall behind, search misses new docs.
  • K2. vector_status: "pending" written by _sync_vector_entry (inline) might race with listener writing "ready" -> potential UPDATE loop firing NOTIFY again. Confirm trigger filter (the trigger fires only when data IS DISTINCT FROM, but vector_status change qualifies).
  • K3. Two consecutive _sync_vector_entry calls (one inline, one listener) produce the same uuid5 point ids; Qdrant upsert is idempotent — confirm Qdrant client behaves identically with same payload (it does); but listener carries authoritative vector_status write — confirm last-writer-wins behaviour.
  • K4. Soft-delete race: app writes deleted_at; listener might receive the prior INSERT notify (queued) and re-embed the about-to-be-deleted doc. Trigger has a guard (v_new_deleted IS NOT NULL -> emit DELETE op), but verify under concurrency.

5.4 Consistency contract (proposed for after canary)

If canary passes:

  • get_document after write: strongly consistent (PG is SSOT, returns latest immediately).
  • search_knowledge after write: eventual consistency, target <=2 s p95 lag, 15 s p99 lag.
  • list_documents: strongly consistent (driven by PG).
  • vector_status state machine:
    • pending set by write handler (already in code for create; will be added to update/patch in D4)
    • ready set by listener after Qdrant upsert
    • error set by listener if embed/upsert fails (already in code)
    • skipped set if content empty (already in code)
  • Health metric vector_sync_lag_seconds = NOW() - MAX(updated_at) FILTER (WHERE vector_status='pending'). Alert at > 30 s.
  • Recovery: cron-driven /kb/reindex-missing every 15 min; on-demand admin trigger.

5.5 D4 implementation gate

D4 design + PR may proceed when:

  1. Canary C1..C7 are recorded in a follow-up KB report.
  2. K1..K4 risks are explicitly accepted by GPT/Opus/Codex.
  3. Lag metric + alert are wired in (or D4 PR includes them).

Until then, D4 is parked.


Part 6 — Revised implementation sequence

Rev2 approved   --->  2F-D1 (list_documents SQL + index)   --->  smoke/load test  --->  approve
                                                                                          |
                                                                                          v
                --->  2F-D2a (transport JSON/SSE)          --->  smoke + ChatGPT retry  --->  diagnose if hang persists
                                                                                          |
                                                                                          v
                --->  2F-D3 (move_document schema cleanup) --->  schema-diff test       --->  done
                                                                                          |
                                                                                          v
                --->  2F-D4 canary  --->  if green, design D4 implementation in Rev3

Each step:

2F-D1 — list_documents SQL pushdown + partial pattern index

  • Files: agent_data/pg_store.py (new list_kb_docs), agent_data/server.py (rewrite _list_kb_documents_page).
  • Migration: CREATE INDEX CONCURRENTLY idx_kb_documents_doc_id_c_pattern ... (one SQL statement, applied out-of-band by SRE; non-transactional).
  • Tests: T1..T15 per S2.10.
  • Rollback: revert PR + DROP INDEX CONCURRENTLY ....
  • Pass criteria: S2.11.
  • Expected impact: list_documents p95 4.47 s -> <250 ms at concurrency 5; <400 ms at concurrency 10.

2F-D2a — transport content negotiation

  • Files: agent_data/server.py (_parse_accept, _mcp_respond, 8 callsite replacements in mcp_jsonrpc + _mcp_filtered_handler).
  • Migration: none.
  • Tests: U1..U14 per S3.8.
  • Rollback: revert PR.
  • Pass criteria: S3.9.
  • Expected impact: spec-compliant transport. May or may not unblock ChatGPT MCP App; retest in fresh conversation.

2F-D3 — move_document schema cleanup

  • Files: agent_data/server.py (4 small edits — schema list, allowlist, dispatcher branch, version bump).
  • Migration: none.
  • Tests: V1..V6 per S4.5.
  • Rollback: revert PR.
  • Pass criteria: S4.7.
  • Expected impact: correctness only.

2F-D4 canary (NOT implementation)

  • Files: none modified (canary uses existing endpoints).
  • Migration: none.
  • Tests: C1..C7 captured in a follow-up report.
  • Rollback: n/a (read-only).
  • Pass criteria: S5.3 risks K1..K4 demonstrably bounded; lag metric proposal accepted.
  • Expected outcome: Rev3 designs the actual async write change OR documents why staying sync is correct.

Part 7 — Production gate (Rev2)

7.1 Functional (after D1+D2a+D3 — D4 not yet)

Tool Pass
search_knowledge >=1 hit on 'operating rules SSOT', p95 < 2 s
list_documents (prefix='', knowledge/, knowledge/test/, escaped) byte-correct results; pagination; no junk row
get_document found = full payload; missing = {error}
get_document_for_rewrite full body returned
batch_read (3 paths) 3 items, truncated by default
upload_document sync writes; appears in get_document immediately; in search eventually
update_document revision+1; readback immediate
patch_document one-replace succeeds; ambiguous returns 409
delete_document soft-delete; excluded from list; vector removed eventually
move_document NOT present in tools/list on any profile (D3)
ingest_document unchanged; only via internal /mcp-gpt-full, not nginx-exposed

7.2 Latency targets (p95, after D1+D2a+D3 with index)

Path Target
initialize < 50 ms
tools/list < 50 ms
get_document < 200 ms
batch_read (3 paths) < 300 ms
list_documents concurrency 5 < 250 ms
list_documents concurrency 10 < 400 ms
upload_document < 1 200 ms (unchanged until D4)
update_document < 1 200 ms (unchanged until D4)
patch_document < 1 200 ms (unchanged until D4)
search_knowledge < 2 000 ms (OpenAI bound, unchanged)

7.3 Transport pass

  • Accept: application/json -> JSON, byte-identical to today.
  • Accept: application/json, text/event-stream -> JSON (tie-break, today's clients unchanged).
  • Accept: application/json;q=0.5, text/event-stream -> SSE.
  • Accept: text/event-stream -> SSE with one event: message\ndata: <json>\n\n frame.
  • Accept: application/xml -> 406 JSON-RPC error envelope.
  • notifications/* -> 204 No Content regardless of Accept.
  • GET /mcp with SSE Accept -> 405.
  • FastMCP client full handshake green against /mcp, /mcp-gpt, /mcp-readonly.
  • ChatGPT MCP App fresh conversation: observed; not pass-gating. If broken, file a follow-up.

7.4 Stress

  • Mixed-tool concurrency 5 and 10 for 60 s.
  • Zero 5xx; zero truncated frames; zero Qdrant wait=True timeouts on any read path.
  • Container CPU < 50 % avg; RSS stable; no thread leak.

7.5 Safety

  • No secret in logs (verify after each PR — grep for env values in last 10 min of container log).
  • /mcp-gpt-full not nginx-exposed.
  • delete_document audited at nginx layer (re-enable access_log on the /gpt-mcp/... location; recommend low-cardinality format).
  • Rotate the /gpt-mcp/{token} URL after D1+D2a+D3 ship (separate, tracked task; not in this design).
  • All four phases have one-revert rollback documented.

Appendix A — Preflight facts (read-only, 2026-05-14)

  • kb_documents: 5 040 total, 2 907 alive, 27 MB.
  • 1 anomalous row: empty key, empty document_id, 4.4 KB body -> filtered in D1.
  • 1 184 alive docs contain _ in their document_id -> LIKE escaping mandatory.
  • 0 alive docs contain % or \ in document_id -> escapes are defensive but cheap.
  • 0 alive docs have non-numeric revision JSON -> cast safe.
  • 0 alive docs have replace(key,'__','/') != document_id -> _fs_key mapping consistent.
  • PG-NOTIFY listener pg_vector_listener is already running in production (PG->Qdrant vector sync listener started in container logs).
  • nginx /gpt-mcp/{token}/mcp already has proxy_buffering off, chunked_transfer_encoding on, proxy_cache off — SSE-ready.
  • 6 PG triggers on kb_documents; trg_kb_vector_sync fires pg_notify('kb_vector_sync', json) on INSERT/UPDATE/DELETE.
  • move_document raises 501 unconditionally at function entry (server.py:1624-1636); dead code below.

Appendix B — Open follow-ups (post-Rev2)

  1. Decide if count semantics in list_documents response need explicit alias (returned_count already present alongside).
  2. Bump proxy_read_timeout on /gpt-mcp/{token} to 120 s when D2a deploys (optional).
  3. Add structured access log on /gpt-mcp/{token} for audit (recommend yes).
  4. Rotate /gpt-mcp/{token} URL after D1+D2a+D3 ship (separate ticket).
  5. Run D4 canary C1..C7 and publish results in a new KB report; gates D4 design.
  6. Switch non-list_documents callers of stream_docs (reindex_kb_documents, cleanup_orphan_vectors, _compute_data_integrity, audit_sync) to a server-side cursor when row count >> 10k. Not blocking today.
  7. If ChatGPT MCP App still hangs after D1+D2a, open a separate diagnosis ticket (don't blame D2a in the report).

Summary

Phase Goal Status
2F-D1 list_documents SQL + index designed (Rev2); awaits approval
2F-D2a transport JSON+SSE designed (Rev2); awaits approval; NOT proven to fix ChatGPT
2F-D3 move_document schema cleanup designed (Rev2); awaits approval
2F-D4 canary prove listener safety required before any D4 implementation design
2F-D4 implementation async vector via listener parked until canary green

Implementation order: D1 -> D2a -> D3 -> D4 canary -> (later) D4 design.

End of Rev2. No code modified. No commit. No restart. No nginx reload. No ChatGPT reconnection.