KB-76E1
s178-cutllm-synthesize-deploy-20260417.md
5 min read Revision 1
S178 Fix 9 — Cắt LLM Synthesize — Deploy Report
Ngày: 2026-04-17 Session: S178 Fix 9 Build: agent-data-local:s178-cutllm-20260417-041528 Known-good rollback: agent-data-local:pre-s178-cutllm (ID bfe092449032)
1. Scope
- Cắt LLM GPT-4o synthesize khỏi
query_knowledge()(search_knowledge + /chat) - Đổi
get_documentMCP dispatch:search=True, top_k=3→search=False, top_k=0 - Flag env
SEARCH_SYNTHESIZE_MODE(default=raw, setsummarizedđể rollback env-only)
2. BEFORE vs AFTER (cùng query "hiến pháp nguyên tắc NT13")
| Path | Before (B1 baseline) | After (B2.2b median of 3) | Cải thiện |
|---|---|---|---|
| /chat | 3.28s (server 5016ms) | 1.31s (server 1008ms) | ~2.5× nhanh |
| /chat attempt 3 (warm) | n/a | 0.21s (server 141ms) | ~15× vs baseline |
| /mcp search_knowledge | 0.45s (server 335ms) | 0.27s (server 224ms) | ~1.7× nhanh |
| /mcp get_document | chưa đo | 0.08s (related=[]) | — |
Note: /chat attempt 1 cold = 1.71s (server latency_ms=1608) do _retrieve_query_context qdrant/embedding cold call. Attempt 3 steady-state = 141ms xác nhận LLM block đã bị cắt hoàn toàn — thời gian còn lại là retrieval + MCP envelope.
3. Response structure kept (backward compat)
/chatkeys:[response, content, session_id, context, usage]— nguyên vẹnresponselen = 500 (max từ_build_raw_reply, ghép 3 snippets)context= 5 items rawQueryContextEntry(unchanged)usage.latency_mspopulated,usage.qdrant_hits=5/mcp search_knowledgeinner (wrapped trongresult.content[0].text): same 5 keys/mcp get_document:related=[](empty list) xác nhậnsearch=Falseđã effect
4. Deploy timeline (2026-04-17)
- Source edit (B2.2a): 04:07 CEST — server.py 95767→97952 bytes (+2199B)
- Build image: 04:15:28 — 4s (mostly CACHED), size 1.5GB
- Tag switch
latest→ new (d53a11072c00): 04:15:xx docker compose up -d agent-data: 04:15:56- Healthcheck HEALTHY at attempt 7/18 (~70s sau restart)
- Smoke test: 7/7 PASS (3 /chat + 3 /mcp search + 1 /mcp get_document)
- RestartCount = 0, không crash/Traceback
5. Rollback readiness
- Image rollback (1 lệnh, <30s):
docker tag agent-data-local:pre-s178-cutllm agent-data-local:latestcd /opt/incomex/docker && docker compose up -d agent-data - Source rollback (1 lệnh):
cp /opt/incomex/docker/agent-data-repo/agent_data/server.py.bak-s178-20260417-040440 \ /opt/incomex/docker/agent-data-repo/agent_data/server.py - Env toggle (không rebuild, <10s):
→ quay về nhánh LLM cũ (code pathecho "SEARCH_SYNTHESIZE_MODE=summarized" >> /opt/incomex/docker/.envdocker compose up -d agent-dataelse:trong query_knowledge giữ nguyên)
6. Files changed (VPS host, SSOT)
/opt/incomex/docker/agent-data-repo/agent_data/server.py(+2199 bytes)- Change A: Helper
_build_raw_reply(L794-822) - Change B: RAW early-return block trong
query_knowledge(L897-928) - Change C: MCP dispatch
get_document— search=True→False, top_k=3→0 (L2664)
- Change A: Helper
- Backup:
server.py.bak-s178-20260417-040440(giữ cho rollback)
7. Repo note
/opt/incomex/docker/agent-data-repocó.git/, đã diverge với origin (5 ahead, 112 behind).- KHÔNG chạy
git pull/checkout/resettrong suốt deploy (NT1 SSOT VPS). - Backup GitHub manual (nếu cần) là việc của Desktop/user, không thuộc deploy pipeline.
8. Next
- GĐ E: tạo DOT cặp monitor latency p95 (NT12 DOT cặp)
- Amend Đ35/Đ41 pattern VPS-as-SSOT cho agent-data-repo nếu cần
- Handle git diverge agent-data-repo manual (không thuộc fix này)
9. Risk / lấn cấn sau deploy
- /chat cold attempt 1 = 1.71s: cao hơn expected ≤1s nhưng vẫn < 5s threshold. Do qdrant/embedding cold. Warm call = 0.21s xác nhận LLM đã cắt đúng.
- Response
responseluôn = 500 chars (ghép 3 snippets). Nếu Claude Desktop user kỳ vọng câu tóm tắt Vietnamese mạch lạc, UX có thể thấy "raw-ish". Có thể điều chỉnh format ở bước sau. pre-s178-cutllmimage giữ — không chạydocker system prune.