Phase 2F-C Rev2 — MCP Production Patch Design (revised after GPT + Opus + Codex review)

Date: 2026-05-14 Status: Design only. Supersedes Rev1 (phase-2f-c-mcp-production-patch-design-2026-05-13.md). Author: Claude Code xhigh (Phase 2F-C Rev2) Reviewers incorporated: GPT council, Opus, Codex. Scope: Production patches D1..D4 for AgentData MCP. No code, no commit, no restart, no nginx reload, no ChatGPT reconnection.

Part 1 — New process

Production design changes for AgentData MCP follow this order, no shortcuts:

Investigate + design — Claude Code (xhigh effort) reads code, schema, logs, runs read-only EXPLAIN/queries, writes a markdown design report to knowledge/current-state/reports/.
Review — GPT, Opus, and Codex review the design. Each may APPROVE, APPROVE WITH CHANGES, or REJECT.
Revise — Claude Code addresses all change requests in a new revision (e.g. Rev2) and re-uploads to the same KB folder.
Approve — once at least one of {GPT, Opus, Codex} marks "approve" without remaining blockers, implementation is unblocked.
Implement — code change in its own branch/PR. Two-Hat process.
Test — functional, latency, transport, stress tests per the design's pass criteria. Capture results in current-state/reports/.
Deploy + verify — VPS deploy, smoke test. Only then re-test external clients (ChatGPT, Claude.ai).

No design report approved => no code merged. This document is the Rev2 of D1..D4 designs awaiting approval.

Part 2 — D1: `list_documents` final design (Rev2)

2.1 Scope discipline

D1 ships alone, not bundled with D2a. D1 changes only the list_documents data path; it does not change transport, schema, write path, or any other tool.

2.2 Preflight findings (read-only inspection of VPS PG, 2026-05-14)

Check	Result	Implication
`kb_documents` total rows	5 040	—
Alive rows (`deleted_at IS NULL`)	2 907	—
Rows missing `document_id`	1 (empty key + empty document_id, ~4 KB body)	Must be filtered: `WHERE (data->>'document_id') IS NOT NULL AND length(data->>'document_id') > 0`
Rows with non-numeric `revision` JSON	0	Safe to `(data->>'revision')::int` with `COALESCE(..., 0)`
`key`↔`document_id` mismatch (`replace(key,'__','/') = document_id`)	0	`_fs_key` mapping is consistent
`document_id` containing `_`	1 184	LIKE escaping of `_` is mandatory
`document_id` containing `%`	0	Defensive escape still required
`document_id` containing `\`	0	Defensive escape still required

Conclusion: data is clean enough to push down. The single junk row + 1 184 docs with _ justify both the IS NOT NULL filter and the LIKE escape.

2.3 Final SQL (parameterized)

SELECT
  data->>'document_id'                           AS document_id,
  data->>'parent_id'                             AS parent_id,
  COALESCE(data->'metadata'->>'title', '')       AS title,
  COALESCE(data->'metadata'->'tags', '[]'::jsonb) AS tags,
  COALESCE(NULLIF(data->>'revision','')::int, 0) AS revision
FROM kb_documents
WHERE (data->>'deleted_at') IS NULL
  AND (data->>'document_id') IS NOT NULL
  AND length(data->>'document_id') > 0
  AND (
       %(prefix)s = ''
       OR (data->>'document_id') LIKE %(prefix_like)s ESCAPE '\'
  )
ORDER BY (data->>'document_id')
LIMIT %(limit)s OFFSET %(offset)s;

prefix_like is built in Python:

def _like_prefix(prefix: str) -> str:
    return (prefix
            .replace('\\', '\\\\')   # escape backslash FIRST
            .replace('%',  '\\%')
            .replace('_',  '\\_')
            ) + '%'

ESCAPE '\' is set explicitly so PG treats \% and \_ as literals.
NULLIF(...,'')::int is defensive: even though current data is clean, future inserts that stash an empty string would otherwise raise.
data->>'document_id' IS NOT NULL is redundant given current data, but documents intent and survives the one anomalous row.

2.4 `total_count` semantics

Rev1 left this open. Decision: drop total_count from the response.

Rationale: dropping it removes a separate COUNT(*) ... LIKE query (5–20 ms wall, same shared buffers as the main query) and matches REST convention (next_offset + items + truncated).
Backward compatibility: current MCP description doesn't promise an exact count. The field count is renamed-in-place to mean "number of items returned" — same name, narrower semantics. Document this in the tool description.
If a client genuinely needs the full count, expose a ?count=true flag in a later phase; not now.

2.5 Collation & index strategy

Option chosen: partial expression index on ((data->>'document_id') COLLATE "C") text_pattern_ops, plus query ORDER BY/WHERE using the same COLLATE "C". Rationale:

Default UTF-8 collation makes LIKE 'prefix%' a filter, not a range scan, because 'knowledge/' <= 'knowledge/x' < 'knowledge0' is false under UTF-8 ICU rules (verified — 'knowledge/x' < 'knowledge0' = false in default, true in COLLATE "C").
text_pattern_ops alone fixes LIKE-prefix range scans, but the query's ORDER BY must use a comparison compatible with the index for Index Scan ... no Sort. Adding COLLATE "C" to both sides keeps the plan a single Index Scan with no sort.

Final migration:

CREATE INDEX CONCURRENTLY IF NOT EXISTS idx_kb_documents_doc_id_c_pattern
  ON kb_documents (((data->>'document_id') COLLATE "C") text_pattern_ops)
  WHERE (data->>'deleted_at') IS NULL
    AND (data->>'document_id') IS NOT NULL
    AND length(data->>'document_id') > 0;

Query updated accordingly:

WHERE (data->>'deleted_at') IS NULL
  AND (data->>'document_id') IS NOT NULL
  AND length(data->>'document_id') > 0
  AND ( %(prefix)s = '' OR (data->>'document_id') COLLATE "C" LIKE %(prefix_like)s ESCAPE '\' )
ORDER BY (data->>'document_id') COLLATE "C"
LIMIT %(limit)s OFFSET %(offset)s;

Trade-off documented:

Sort order becomes byte-wise (ASCII order). Today's behaviour is also byte-wise because Python str.sort() is byte-wise. So no semantic change. Add a regression test that returns the same first-50 list before and after migration to confirm.
Existing idx_kb_documents_doc_id stays as a safety net — do not drop until 7 days of clean operation.

2.6 Migration plan

CREATE INDEX CONCURRENTLY rules:

Cannot be inside a transaction. Run as a single statement via psql (or a one-off admin script), not via Alembic/transactional migration.
Acquires SHARE UPDATE EXCLUSIVE lock; reads and writes proceed normally.
On failure (e.g. duplicate key, killed connection), leaves an INVALID index. Cleanup: DROP INDEX CONCURRENTLY idx_kb_documents_doc_id_c_pattern; then re-run.
Build time estimate at 2 907 alive rows: <5 s. No measurable load.

Rollback:

DROP INDEX CONCURRENTLY IF EXISTS idx_kb_documents_doc_id_c_pattern;

Also non-blocking. If we also reverted the Python code, the planner falls back to idx_kb_documents_doc_id (current behaviour).

2.7 Pagination policy

Default limit = 50 (unchanged).
Max limit = 100 (unchanged; already enforced server.py:2033).
Default offset = 0.
Max offset = 10 000 (new cap). Above this, return 422 INVALID_ARGUMENT with message "offset too large — use a narrower prefix or wait for keyset pagination". Rationale: OFFSET cost grows linearly; capping it prevents O(n^2) DOS.
Keyset cursor is not in D1. Defer to D1-follow-up (2F-D1b) only if real-world prefix usage trips the offset cap. Today's largest prefix (knowledge/ = 2 570 rows) fits in 26 limit-100 pages, well under the cap.

2.8 Backward compatibility

MCP arg path is still accepted and mapped to prefix (today's dispatcher already does this — keep).
MCP args limit, offset already in tool schema (server.py:2756) — keep.
Response shape: identical keys; count now means returned-items. Old clients reading count for "page size" stay correct; clients reading count for "total" need to switch to next_offset == null as the "no more pages" signal.

2.9 Risks register

R1. LIKE escape order matters: backslash first, then %, then _. Wrong order corrupts the escape. Tested in unit tests.
R2. Partial index WHERE deleted_at IS NULL requires the same predicate in the query for the planner to pick it. Already in the SQL above.
R3. (data->>'document_id') COLLATE "C" is an expression; an expression index must match the query's expression exactly. The migration and the query both use the same form.
R4. stream_docs is still used by reindex_kb_documents, cleanup_orphan_vectors, _compute_data_integrity, audit_sync. D1 does not touch those. Document a follow-up to switch them to a server-side cursor (SELECT ... FOR FETCH FORWARD or psycopg2 named cursor) so a future 30k-row table doesn't kill them. Not blocking.
R5. LIST_DOCUMENTS_MAX_RESPONSE_BYTES guard (server.py:2085) stays in place. With LIMIT 100 + lean projection, response is well under the cap.

2.10 Test matrix for D1

#	Test	Expectation
T1	`prefix=''`, limit=50	first 50 alive docs, byte-sorted; `next_offset=50`
T2	`prefix='knowledge/'`, limit=50	first 50 `knowledge/*` docs; `next_offset=50`; no leak from `knowledgeX`
T3	`prefix='knowledge_/'` (literal underscore expected)	matches ONLY docs starting `knowledge_/`, NOT `knowledge/` (escape works)
T4	`prefix='100%'` (literal `%`)	matches only literal `100%...`; no wildcard expansion
T5	`prefix='no/such/path/'`	returns `{items:[], count:0, truncated:false, next_offset:null}`
T6	soft-deleted doc	excluded from any prefix list
T7	offset=0/24/49/50/99/100/9999	correct pagination
T8	offset=10001	`422 INVALID_ARGUMENT`
T9	limit=0 / limit=101	`422 INVALID_ARGUMENT`
T10	EXPLAIN ANALYZE on prefix='knowledge/'	plan begins with `Limit ... -> Index Scan using idx_kb_documents_doc_id_c_pattern`, no `Sort`, no `Seq Scan`, no `Rows Removed by Filter > 100`
T11	concurrency 5 burst, 60 s	p95 < 250 ms
T12	concurrency 10 burst, 60 s	p95 < 400 ms
T13	backward compat	MCP `args.path` still works identical to `args.prefix`
T14	order-stability	same call twice returns same first-50 ordering
T15	anomalous row exclusion	the one empty-key/empty-doc_id row is not returned for any prefix

2.11 Pass criteria for D1

All tests T1..T15 green.
EXPLAIN plan for any of prefix='', prefix='knowledge/', prefix='knowledge/current-state/reports/' returns Index Scan on idx_kb_documents_doc_id_c_pattern with Buffers shared hit + read < 50 and Execution Time < 15 ms.
p95 latency:
- single client: < 100 ms
- concurrency 5: < 250 ms
- concurrency 10: < 400 ms
Zero 5xx in 5 minutes of mixed traffic at concurrency 10.
No regression in other tools' p95 (sanity: search_knowledge, get_document, patch_document baseline unchanged).

2.12 D1 implementation surface

Layer	Files	Change type
DB	one SQL statement (`CREATE INDEX CONCURRENTLY ...`) applied via SRE	additive
Python — store	`agent_data/pg_store.py` add `list_kb_docs(prefix, limit, offset) -> list[dict]` (does NOT replace `stream_docs`)	new function
Python — server	`agent_data/server.py` rewrite `_list_kb_documents_page` to call `pg_store.list_kb_docs(...)`, build response, keep `LIST_DOCUMENTS_MAX_RESPONSE_BYTES` guard	logic-only
Docs	MCP tool description for `list_documents`: clarify `count` semantics; mention `next_offset`	doc-only
Tests	new pytest module under `tests/` covering T1..T15	new

Rollback: revert PR + DROP INDEX CONCURRENTLY idx_kb_documents_doc_id_c_pattern;. Application falls back to stream_docs path automatically.

2.13 Expected impact

Scenario	Before	After D1 only	After D1 + index
Single, prefix='knowledge/'	~900 ms	~90 ms	<30 ms
Concurrency 5 p95	4 470 ms	~250 ms	<150 ms
Concurrency 10 p95	(untested, would degrade)	~400 ms	<250 ms

Part 3 — D2a: transport / content negotiation (Rev2)

3.1 Honest framing (Opus / GPT correction)

D2a is a production-compliance and compatibility patch. It is NOT proven to be the root cause of the observed ChatGPT MCP App patch UI hang. The fact that list_documents works via the same endpoint over JSON-RPC argues that the JSON-only response is not a hard handshake blocker for all clients. The hang may equally well be:

A list-latency issue (the 4.47 s p95 from D1) timing out the GPT App's per-tool budget.
A GPT-side bug in the patch-tool UI rendering.
A separate gateway/router stuck-session pattern documented elsewhere.

D2a is the right thing to do because the MCP spec (rev 2025-03-26 Streamable HTTP) requires honouring Accept: text/event-stream, and several clients (notably ChatGPT MCP App in some configurations, future Claude.ai connector versions, MCP Inspector) prefer or strictly require SSE. Ship D2a for compliance, then retest ChatGPT in a fresh conversation. If the hang persists, that is its own diagnosis loop and is not a Rev2 deliverable.

3.2 Accept negotiation rules (D2a)

Parse Accept per RFC 9110 S12.5.1 (treat absent or */* as "anything"). Compute two boolean signals from the parsed media-type list:

wants_sse  = there exists an item with type "text/event-stream" and effective q > 0
wants_json = there exists an item with type "application/json" or "*/*" and effective q > 0

Decision table:

`wants_sse`	`wants_json`	Outcome
false	true (or default)	JSON response (today's behaviour)
false	false (only some other type, e.g. `application/xml`)	406 Not Acceptable, JSON-RPC error envelope with `code: -32700` and `message: "Acceptable media types: application/json, text/event-stream"`
true	false	SSE response
true	true	prefer JSON when the JSON q >= SSE q; prefer SSE when SSE q is strictly greater. Tie-break to JSON to preserve current curl/FastMCP behaviour.

q-value absence implies q=1.0. Malformed Accept (parse error) -> treat as */*, return JSON. Never 500 on a bad Accept.

This rule:

Keeps Accept: application/json (today's curl path) byte-identical.
Keeps Accept: application/json, text/event-stream (Claude.ai connector mode) -> JSON (no behaviour change for any existing tested client).
Activates SSE for Accept: text/event-stream (ChatGPT MCP App preferred mode) and for Accept: text/event-stream;q=1, application/json;q=0.5.
Returns 406 only for truly incompatible clients — almost no real client.

3.3 SSE frame format (D2a)

Chosen format:

event: message
data: <JSON-RPC envelope>
\n

(One blank line = end-of-event. Connection closes after the single event for a request/response operation.)

Why event: message:

MCP Streamable HTTP spec (rev 2025-03-26) recommends message as the event name for tool/initialise/notification responses.
Required by some implementations of EventSource that filter by event name.
Adds ~10 bytes per response — negligible.

Headers:

Header	Value	Reason
`Content-Type`	`text/event-stream; charset=utf-8`	spec
`Cache-Control`	`no-cache, no-transform`	prevent intermediaries from buffering
`X-Accel-Buffering`	`no`	nginx hint, defensive (we already have `proxy_buffering off` on `/gpt-mcp/...`)
`Connection`	omit (HTTP/1.1 default is keep-alive; clients close per request)	—

Body framing rule: serialize the JSON-RPC envelope with json.dumps(payload, default=str, ensure_ascii=False), then build:

"event: message\n" + "data: " + serialized + "\n\n"

For non-tools/call methods that today return 204 No Content (i.e. notifications/*):

Keep 204 No Content regardless of Accept. SSE without an envelope is meaningless.

For tools/call errors: produce the in-stream JSON-RPC envelope with result.isError=true and the error text inside result.content[0].text, identical to today's JSON path. Never return HTTP 500. Bad-request parse errors -> JSON-RPC envelope with error.code: -32700.

3.4 GET `/mcp` (Streamable HTTP server-initiated channel)

D2a decision: GET /mcp returns the existing server info document as JSON (today's behaviour). If Accept: text/event-stream is sent on a GET /mcp, return 405 Method Not Allowed with Allow: POST and a JSON-RPC error body explaining "GET stream not supported in D2a; use POST".

Full Streamable HTTP with server-pushed events (resumable GET stream) is D2b, out of scope for D2a.

3.5 Notification handling

notifications/initialized and other notifications/*:

Spec: notifications have no id. Server must accept and return either 204 No Content or no body.
D2a: keep current 204 No Content. Do not switch to 202 Accepted (the spec note prefers 204).
Validation: if a JSON-RPC request with no id and a method starting with notifications/ arrives, return 204 immediately, no envelope, no SSE.

3.6 Nginx

/gpt-mcp/{token}/mcp location — already has proxy_buffering off, proxy_cache off, chunked_transfer_encoding on. No change required for D2a.
Optional improvement (defer to D2a deploy time): add proxy_read_timeout 120s; (currently 60s) to allow future longer SSE sessions. Not blocking.
/api/ location — currently does NOT have proxy_buffering off. If we ever route MCP through /api/, an SSE response would be buffered. Today's MCP traffic does not use /api/mcp, so no fix required for D2a. Document as a follow-up if /api/mcp is ever published.

3.7 Risks (D2a)

R6. Decision table edge cases: some HTTP clients send Accept: * (one-char). Parser must coerce to */* to avoid 406. Add a unit test.
R7. event: message may render differently across SSE clients; if a future client expects unnamed events, the body data: ...\n\n still parses because EventSource defaults the event name to message. Choice is forward-compatible.
R8. Streaming-related Python: StreamingResponse(iter([body]), ...) in FastAPI flushes one chunk and closes. Confirm uvicorn does NOT add chunked-encoding length headers wrongly (it doesn't; it sets Transfer-Encoding: chunked automatically).
R9. Mid-stream errors: if _dispatch_mcp_tool raises after the first data: line is partially sent, the client sees a half-frame. Mitigation: build the envelope dict fully (catch in the JSON layer like today), then convert to the SSE frame string in one shot.
R10. D2a may not unblock ChatGPT MCP App (Opus correction). Production-compliance is its own win. Re-test after D1+D2a; if hang persists, open a separate diagnosis ticket. Do not advertise D2a as the ChatGPT fix.

3.8 Test matrix for D2a

#	Request	Expected response
U1	`curl -H 'Accept: application/json' POST /mcp` (tools/list)	`Content-Type: application/json; charset=utf-8`, single JSON body
U2	`curl -H 'Accept: application/json, text/event-stream' POST /mcp` (Claude.ai connector mode)	`application/json` (tie-break)
U3	`curl -H 'Accept: application/json;q=0.5, text/event-stream;q=1' POST /mcp`	`text/event-stream`, one `event: message` + `data:` frame
U4	`curl --no-buffer -H 'Accept: text/event-stream' POST /mcp` (tools/call)	`text/event-stream`, body `event: message\ndata: <json>\n\n`, connection closes
U5	`curl -H 'Accept: application/xml' POST /mcp`	406 with JSON-RPC error envelope
U6	`curl -H 'Accept: *' POST /mcp`	JSON (treat as `/`)
U7	`curl POST /mcp` (no Accept header)	JSON (default)
U8	`curl POST /mcp` with malformed `Accept: ;;;`	JSON (fallback)
U9	POST notifications/initialized (no id)	`204 No Content`, no body, regardless of Accept
U10	`curl POST /mcp` with malformed body (invalid JSON)	JSON-RPC `error.code: -32700`, HTTP 200, content-type JSON
U11	`GET /mcp` with `Accept: text/event-stream`	405 `Allow: POST`
U12	`GET /mcp` with `Accept: application/json`	current JSON info response
U13	FastMCP client (initialize + tools/list + one tools/call)	succeeds against `/mcp` AND `/mcp-gpt` AND `/mcp-readonly`
U14	nginx pass-through: `curl --no-buffer ... POST /gpt-mcp/<token>/mcp` with `Accept: text/event-stream`	first-byte time <50 ms, full body delivered
U15	After D1+D2a deploy: ChatGPT MCP App fresh conversation, attempt search + list + patch on `knowledge/test/...`	observed but explicitly not part of the D2a pass criteria (separate diagnosis if it fails)

3.9 Pass criteria for D2a

All U1..U14 pass. U15 is observed but not gating.
Zero regression on existing JSON-only clients (Codex, internal callers, current Claude.ai connector).
p95 latency for tools/list and a small tools/call <= JSON baseline + 5 ms (SSE adds one string concat).
No 5xx, no truncated frames.

3.10 D2a implementation surface

Layer	Files	Change type
Python — server	`agent_data/server.py`: add `_parse_accept(hdr)` returning (wants_json, wants_sse, prefer); add `_mcp_respond(payload, accept_header)` returning either `JSONResponse` or `StreamingResponse`; apply at 4 return sites in `mcp_jsonrpc` and the 4 return sites in `_mcp_filtered_handler`	new helpers + 8 callsite changes
Tests	unit + integration covering U1..U14	new
Docs	none (MCP tool schema unchanged)	—
Nginx	none required	—

Rollback: revert PR. Behaviour returns to JSON-only.

Part 4 — D3: `move_document` policy (Rev2)

4.1 Mismatch summary

Runtime (agent_data/server.py:1624-1636): raise _error(501, NOT_IMPLEMENTED, ...) unconditionally.
MCP schema (MCP_TOOLS): advertises move_document with full inputSchema.
Allowlists: move_document is in MCP_GPT_FULL_TOOLS (not exposed publicly via nginx); NOT in MCP_GPT_TOOLS or MCP_READONLY_TOOLS.
Dispatcher (_dispatch_mcp_tool): includes the move_document branch (still routes to the 501-raising endpoint).

4.2 Policy options

A. Remove from MCP_TOOLS (advertised schema) and from _dispatch_mcp_tool and from MCP_GPT_FULL_TOOLS. Keep POST /documents/{x}/move returning 501.
B. Remove only from MCP_GPT_FULL_TOOLS (so /mcp-gpt-full no longer routes it); keep advertised in /mcp internal.
C. Keep advertised everywhere; prefix description with [DEPRECATED].

4.3 Choice and rationale

Choose A (full removal from MCP schema + dispatcher; keep HTTP endpoint returning 501).

The MCP tools list is consumed by every LLM client (Claude, Codex, ChatGPT, Cursor) on every session start. Advertising a tool that always errors wastes context budget and produces audit-log noise (mcp_*_reject warnings on every attempt).
Schema/runtime correctness is a one-shot benefit and reads well in audits.
The POST /documents/{x}/move HTTP endpoint stays in place and still returns the structured 501. Any internal script that calls it directly continues to receive the same error code/message — no breakage for known internal callers.

4.4 Backward compatibility risk

Risk for /mcp internal: a Claude / Codex script that has cached a tools/list response containing move_document would attempt tools/call name=move_document and now receive JSON-RPC -32601 Tool not allowed instead of result.content[0].text = "...NOT_IMPLEMENTED...". Both are clearly errors; agents already handle "not allowed" semantics. Bumping connectorSchemaVersion and _mcp_schema_hash() (auto) signals the change.
Risk for /mcp-gpt: zero. Already excluded.
Risk for /mcp-gpt-full: zero public exposure today.

4.5 Tests for D3

#	Test	Expectation
V1	`POST /mcp tools/list`	response does NOT include `move_document`
V2	`POST /mcp-gpt-full tools/list`	response does NOT include `move_document`
V3	`POST /mcp tools/call name=move_document`	`error.code: -32601`, message contains `Unknown tool: move_document`
V4	`POST /mcp-gpt-full tools/call name=move_document`	`error.code: -32601`, message contains `not allowed`
V5	`POST /documents/x/move` directly	HTTP 501 `NOT_IMPLEMENTED` (unchanged)
V6	`_mcp_schema_hash()` value	changes; `CONNECTOR_SCHEMA_VERSION` bumped

4.6 Implementation surface

Layer	Files	Change
Python	`agent_data/server.py`: delete the `move_document` block from `MCP_TOOLS` (lines around 2722 + the JSON object); remove `"move_document"` from `MCP_GPT_FULL_TOOLS` frozenset; delete the `move_document` branch in `_dispatch_mcp_tool`; bump `CONNECTOR_SCHEMA_VERSION` to e.g. `gpt-agent-data-2026-05-14.1`	edits in 4 places
Docs	none	—
Tests	V1..V6	new

Rollback: revert PR. Schema returns to current state. Dead code below the 501-raise is unchanged.

4.7 Pass criteria for D3

All V1..V6 pass.
One PR, one revert path, no DB impact, no nginx impact.

Part 5 — D4: write double-work — canary gate only

5.1 What is NOT shipped in this revision

D4 (removing inline _sync_vector_entry / _delete_vector_entry from create/update/patch/delete handlers) is not designed for implementation in Rev2. Per GPT/Codex feedback, prior to designing the removal we must prove the listener behaviour and avoid creating a duplicated indexing-loss / loop hazard.

5.2 Required evidence (canary)

Before D4 implementation design lands, capture the following from a controlled experiment in a low-traffic window. The canary uses knowledge/test/mcp-stress/ namespace, never test the production data.

Probe	How	Pass criterion
C1. OpenAI embed calls per `upload_document`	log `vector_store.embed_calls` counter delta before/after a single upload (small doc, 1 chunk)	exactly 2 today (1 inline, 1 listener) -> must be 1 after D4
C2. Qdrant upserts per `upload_document`	inspect Qdrant collection counts before/after; assert one `chunk:0` point id only	exactly 1 unique point_id (deterministic via `uuid5`) — current behaviour already idempotent because of uuid5, but it costs 2 writes
C3. PG-NOTIFY listener is alive and consuming	tail container log: `pg-vector-sync` lines appear within 2 s of write	every test write has matching listener log line
C4. `vector_status` loop check	run 10 uploads, sample `data->>'vector_status'`; ensure no `pending -> ready -> pending` flapping	no flapping; eventual `ready` for all
C5. Listener resilience	restart `pg_vector_listener` thread (or simulate disconnect via DB-side `pg_terminate_backend`); verify reconnect + catchup	listener reconnects within 5 s; `/kb/reindex-missing` recovers any gap
C6. Lag observation	record `MAX(updated_at)` for `vector_status='pending'` at 30 s intervals during a 5-min stress test (10 writes/s)	p95 lag < 5 s; p99 < 15 s
C7. Skip-rule alignment	upload to `operations/tasks/comments/...` and `registries/...` paths	trigger skips (per `fn_kb_notify_vector_sync` rules); no embed, no Qdrant write — must remain true

5.3 Risks that the canary must eliminate

K1. Listener silently drops notifications under PG restart -> vectors fall behind, search misses new docs.
K2. vector_status: "pending" written by _sync_vector_entry (inline) might race with listener writing "ready" -> potential UPDATE loop firing NOTIFY again. Confirm trigger filter (the trigger fires only when data IS DISTINCT FROM, but vector_status change qualifies).
K3. Two consecutive _sync_vector_entry calls (one inline, one listener) produce the same uuid5 point ids; Qdrant upsert is idempotent — confirm Qdrant client behaves identically with same payload (it does); but listener carries authoritative vector_status write — confirm last-writer-wins behaviour.
K4. Soft-delete race: app writes deleted_at; listener might receive the prior INSERT notify (queued) and re-embed the about-to-be-deleted doc. Trigger has a guard (v_new_deleted IS NOT NULL -> emit DELETE op), but verify under concurrency.

5.4 Consistency contract (proposed for after canary)

If canary passes:

get_document after write: strongly consistent (PG is SSOT, returns latest immediately).
search_knowledge after write: eventual consistency, target <=2 s p95 lag, 15 s p99 lag.
list_documents: strongly consistent (driven by PG).
vector_status state machine:
- pending set by write handler (already in code for create; will be added to update/patch in D4)
- ready set by listener after Qdrant upsert
- error set by listener if embed/upsert fails (already in code)
- skipped set if content empty (already in code)
Health metric vector_sync_lag_seconds = NOW() - MAX(updated_at) FILTER (WHERE vector_status='pending'). Alert at > 30 s.
Recovery: cron-driven /kb/reindex-missing every 15 min; on-demand admin trigger.

5.5 D4 implementation gate

D4 design + PR may proceed when:

Canary C1..C7 are recorded in a follow-up KB report.
K1..K4 risks are explicitly accepted by GPT/Opus/Codex.
Lag metric + alert are wired in (or D4 PR includes them).

Until then, D4 is parked.

Part 6 — Revised implementation sequence

Rev2 approved   --->  2F-D1 (list_documents SQL + index)   --->  smoke/load test  --->  approve
                                                                                          |
                                                                                          v
                --->  2F-D2a (transport JSON/SSE)          --->  smoke + ChatGPT retry  --->  diagnose if hang persists
                                                                                          |
                                                                                          v
                --->  2F-D3 (move_document schema cleanup) --->  schema-diff test       --->  done
                                                                                          |
                                                                                          v
                --->  2F-D4 canary  --->  if green, design D4 implementation in Rev3

Each step:

2F-D1 — list_documents SQL pushdown + partial pattern index

Files: agent_data/pg_store.py (new list_kb_docs), agent_data/server.py (rewrite _list_kb_documents_page).
Migration: CREATE INDEX CONCURRENTLY idx_kb_documents_doc_id_c_pattern ... (one SQL statement, applied out-of-band by SRE; non-transactional).
Tests: T1..T15 per S2.10.
Rollback: revert PR + DROP INDEX CONCURRENTLY ....
Pass criteria: S2.11.
Expected impact: list_documents p95 4.47 s -> <250 ms at concurrency 5; <400 ms at concurrency 10.

2F-D2a — transport content negotiation

Files: agent_data/server.py (_parse_accept, _mcp_respond, 8 callsite replacements in mcp_jsonrpc + _mcp_filtered_handler).
Migration: none.
Tests: U1..U14 per S3.8.
Rollback: revert PR.
Pass criteria: S3.9.
Expected impact: spec-compliant transport. May or may not unblock ChatGPT MCP App; retest in fresh conversation.

2F-D3 — move_document schema cleanup

Files: agent_data/server.py (4 small edits — schema list, allowlist, dispatcher branch, version bump).
Migration: none.
Tests: V1..V6 per S4.5.
Rollback: revert PR.
Pass criteria: S4.7.
Expected impact: correctness only.

2F-D4 canary (NOT implementation)

Files: none modified (canary uses existing endpoints).
Migration: none.
Tests: C1..C7 captured in a follow-up report.
Rollback: n/a (read-only).
Pass criteria: S5.3 risks K1..K4 demonstrably bounded; lag metric proposal accepted.
Expected outcome: Rev3 designs the actual async write change OR documents why staying sync is correct.

Part 7 — Production gate (Rev2)

7.1 Functional (after D1+D2a+D3 — D4 not yet)

Tool	Pass
`search_knowledge`	>=1 hit on `'operating rules SSOT'`, p95 < 2 s
`list_documents` (prefix='', `knowledge/`, `knowledge/test/`, escaped)	byte-correct results; pagination; no junk row
`get_document`	found = full payload; missing = `{error}`
`get_document_for_rewrite`	full body returned
`batch_read` (3 paths)	3 items, truncated by default
`upload_document`	sync writes; appears in `get_document` immediately; in `search` eventually
`update_document`	revision+1; readback immediate
`patch_document`	one-replace succeeds; ambiguous returns 409
`delete_document`	soft-delete; excluded from list; vector removed eventually
`move_document`	NOT present in `tools/list` on any profile (D3)
`ingest_document`	unchanged; only via internal `/mcp-gpt-full`, not nginx-exposed

7.2 Latency targets (p95, after D1+D2a+D3 with index)

Path	Target
`initialize`	< 50 ms
`tools/list`	< 50 ms
`get_document`	< 200 ms
`batch_read` (3 paths)	< 300 ms
`list_documents` concurrency 5	< 250 ms
`list_documents` concurrency 10	< 400 ms
`upload_document`	< 1 200 ms (unchanged until D4)
`update_document`	< 1 200 ms (unchanged until D4)
`patch_document`	< 1 200 ms (unchanged until D4)
`search_knowledge`	< 2 000 ms (OpenAI bound, unchanged)

7.3 Transport pass

Accept: application/json -> JSON, byte-identical to today.
Accept: application/json, text/event-stream -> JSON (tie-break, today's clients unchanged).
Accept: application/json;q=0.5, text/event-stream -> SSE.
Accept: text/event-stream -> SSE with one event: message\ndata: <json>\n\n frame.
Accept: application/xml -> 406 JSON-RPC error envelope.
notifications/* -> 204 No Content regardless of Accept.
GET /mcp with SSE Accept -> 405.
FastMCP client full handshake green against /mcp, /mcp-gpt, /mcp-readonly.
ChatGPT MCP App fresh conversation: observed; not pass-gating. If broken, file a follow-up.

7.4 Stress

Mixed-tool concurrency 5 and 10 for 60 s.
Zero 5xx; zero truncated frames; zero Qdrant wait=True timeouts on any read path.
Container CPU < 50 % avg; RSS stable; no thread leak.

7.5 Safety

No secret in logs (verify after each PR — grep for env values in last 10 min of container log).
/mcp-gpt-full not nginx-exposed.
delete_document audited at nginx layer (re-enable access_log on the /gpt-mcp/... location; recommend low-cardinality format).
Rotate the /gpt-mcp/{token} URL after D1+D2a+D3 ship (separate, tracked task; not in this design).
All four phases have one-revert rollback documented.

Appendix A — Preflight facts (read-only, 2026-05-14)

kb_documents: 5 040 total, 2 907 alive, 27 MB.
1 anomalous row: empty key, empty document_id, 4.4 KB body -> filtered in D1.
1 184 alive docs contain _ in their document_id -> LIKE escaping mandatory.
0 alive docs contain % or \ in document_id -> escapes are defensive but cheap.
0 alive docs have non-numeric revision JSON -> cast safe.
0 alive docs have replace(key,'__','/') != document_id -> _fs_key mapping consistent.
PG-NOTIFY listener pg_vector_listener is already running in production (PG->Qdrant vector sync listener started in container logs).
nginx /gpt-mcp/{token}/mcp already has proxy_buffering off, chunked_transfer_encoding on, proxy_cache off — SSE-ready.
6 PG triggers on kb_documents; trg_kb_vector_sync fires pg_notify('kb_vector_sync', json) on INSERT/UPDATE/DELETE.
move_document raises 501 unconditionally at function entry (server.py:1624-1636); dead code below.

Appendix B — Open follow-ups (post-Rev2)

Decide if count semantics in list_documents response need explicit alias (returned_count already present alongside).
Bump proxy_read_timeout on /gpt-mcp/{token} to 120 s when D2a deploys (optional).
Add structured access log on /gpt-mcp/{token} for audit (recommend yes).
Rotate /gpt-mcp/{token} URL after D1+D2a+D3 ship (separate ticket).
Run D4 canary C1..C7 and publish results in a new KB report; gates D4 design.
Switch non-list_documents callers of stream_docs (reindex_kb_documents, cleanup_orphan_vectors, _compute_data_integrity, audit_sync) to a server-side cursor when row count >> 10k. Not blocking today.
If ChatGPT MCP App still hangs after D1+D2a, open a separate diagnosis ticket (don't blame D2a in the report).

Summary

Phase	Goal	Status
2F-D1	list_documents SQL + index	designed (Rev2); awaits approval
2F-D2a	transport JSON+SSE	designed (Rev2); awaits approval; NOT proven to fix ChatGPT
2F-D3	move_document schema cleanup	designed (Rev2); awaits approval
2F-D4 canary	prove listener safety	required before any D4 implementation design
2F-D4 implementation	async vector via listener	parked until canary green

Implementation order: D1 -> D2a -> D3 -> D4 canary -> (later) D4 design.

End of Rev2. No code modified. No commit. No restart. No nginx reload. No ChatGPT reconnection.