KB-76E1

s178-cutllm-synthesize-deploy-20260417.md

5 min read Revision 1

S178 Fix 9 — Cắt LLM Synthesize — Deploy Report

Ngày: 2026-04-17 Session: S178 Fix 9 Build: agent-data-local:s178-cutllm-20260417-041528 Known-good rollback: agent-data-local:pre-s178-cutllm (ID bfe092449032)

1. Scope

  • Cắt LLM GPT-4o synthesize khỏi query_knowledge() (search_knowledge + /chat)
  • Đổi get_document MCP dispatch: search=True, top_k=3search=False, top_k=0
  • Flag env SEARCH_SYNTHESIZE_MODE (default=raw, set summarized để rollback env-only)

2. BEFORE vs AFTER (cùng query "hiến pháp nguyên tắc NT13")

Path Before (B1 baseline) After (B2.2b median of 3) Cải thiện
/chat 3.28s (server 5016ms) 1.31s (server 1008ms) ~2.5× nhanh
/chat attempt 3 (warm) n/a 0.21s (server 141ms) ~15× vs baseline
/mcp search_knowledge 0.45s (server 335ms) 0.27s (server 224ms) ~1.7× nhanh
/mcp get_document chưa đo 0.08s (related=[])

Note: /chat attempt 1 cold = 1.71s (server latency_ms=1608) do _retrieve_query_context qdrant/embedding cold call. Attempt 3 steady-state = 141ms xác nhận LLM block đã bị cắt hoàn toàn — thời gian còn lại là retrieval + MCP envelope.

3. Response structure kept (backward compat)

  • /chat keys: [response, content, session_id, context, usage] — nguyên vẹn
  • response len = 500 (max từ _build_raw_reply, ghép 3 snippets)
  • context = 5 items raw QueryContextEntry (unchanged)
  • usage.latency_ms populated, usage.qdrant_hits=5
  • /mcp search_knowledge inner (wrapped trong result.content[0].text): same 5 keys
  • /mcp get_document: related=[] (empty list) xác nhận search=False đã effect

4. Deploy timeline (2026-04-17)

  • Source edit (B2.2a): 04:07 CEST — server.py 95767→97952 bytes (+2199B)
  • Build image: 04:15:28 — 4s (mostly CACHED), size 1.5GB
  • Tag switch latest → new (d53a11072c00): 04:15:xx
  • docker compose up -d agent-data: 04:15:56
  • Healthcheck HEALTHY at attempt 7/18 (~70s sau restart)
  • Smoke test: 7/7 PASS (3 /chat + 3 /mcp search + 1 /mcp get_document)
  • RestartCount = 0, không crash/Traceback

5. Rollback readiness

  • Image rollback (1 lệnh, <30s):
    docker tag agent-data-local:pre-s178-cutllm agent-data-local:latestcd /opt/incomex/docker && docker compose up -d agent-data
    
  • Source rollback (1 lệnh):
    cp /opt/incomex/docker/agent-data-repo/agent_data/server.py.bak-s178-20260417-040440 \   /opt/incomex/docker/agent-data-repo/agent_data/server.py
    
  • Env toggle (không rebuild, <10s):
    echo "SEARCH_SYNTHESIZE_MODE=summarized" >> /opt/incomex/docker/.envdocker compose up -d agent-data
    
    → quay về nhánh LLM cũ (code path else: trong query_knowledge giữ nguyên)

6. Files changed (VPS host, SSOT)

  • /opt/incomex/docker/agent-data-repo/agent_data/server.py (+2199 bytes)
    • Change A: Helper _build_raw_reply (L794-822)
    • Change B: RAW early-return block trong query_knowledge (L897-928)
    • Change C: MCP dispatch get_document — search=True→False, top_k=3→0 (L2664)
  • Backup: server.py.bak-s178-20260417-040440 (giữ cho rollback)

7. Repo note

  • /opt/incomex/docker/agent-data-repo.git/, đã diverge với origin (5 ahead, 112 behind).
  • KHÔNG chạy git pull/checkout/reset trong suốt deploy (NT1 SSOT VPS).
  • Backup GitHub manual (nếu cần) là việc của Desktop/user, không thuộc deploy pipeline.

8. Next

  • GĐ E: tạo DOT cặp monitor latency p95 (NT12 DOT cặp)
  • Amend Đ35/Đ41 pattern VPS-as-SSOT cho agent-data-repo nếu cần
  • Handle git diverge agent-data-repo manual (không thuộc fix này)

9. Risk / lấn cấn sau deploy

  • /chat cold attempt 1 = 1.71s: cao hơn expected ≤1s nhưng vẫn < 5s threshold. Do qdrant/embedding cold. Warm call = 0.21s xác nhận LLM đã cắt đúng.
  • Response response luôn = 500 chars (ghép 3 snippets). Nếu Claude Desktop user kỳ vọng câu tóm tắt Vietnamese mạch lạc, UX có thể thấy "raw-ish". Có thể điều chỉnh format ở bước sau.
  • pre-s178-cutllm image giữ — không chạy docker system prune.