KB-66FE

Qdrant Investigation Report 2026-03-31

2 min read Revision 1
reportqdrantinvestigationgpt-actionsdegraded

Qdrant Investigation Report — 2026-03-31

Mission: S135H — Investigate Qdrant degraded + Runtime action path Status: INVESTIGATION COMPLETE — NO FIXES APPLIED

1. QDRANT STATUS: HEALTHY

  • Container: Up 13 days, CPU 0.27%, RAM 103.7MiB/1GiB (10%)
  • Collection: production_documents, 1172 points
  • Local latency: 0.37ms. No errors in logs. Disk 52%.

2. AGENT DATA HEALTH: DEGRADED (STALE CACHE)

  • Health shows degraded, Qdrant Connection error, latency 753.4ms
  • THIS IS CACHED from startup probe 2026-03-29. NOT realtime.
  • Direct Python test from container: OK, 473.8ms
  • Docker network test: OK, 2.85ms

3. SEARCH TESTS: WORKING BUT SLOW

  • /chat via HTTPS: 5/5 OK, qdrant_hits=5, but latency 8-27s
  • Latency breakdown: embedding (OpenAI) + vector search + TLS overhead
  • GPT Actions timeout ~30s, total latency ~20-27s = near timeout

4. ROOT CAUSE

  1. Health degraded = stale cached value from startup, not current state
  2. GPT intermittent because /chat latency (8-27s) near GPT 30s timeout
  3. MCP stable because: local-to-cloud failover + no hard timeout
  4. No circuit breaker: when Qdrant slow, retries 3x15s = 45s worst case

5. ENDPOINT TABLE

  • /chat: Qdrant YES (vector search), PG fallback, GPT uses
  • /documents GET: Qdrant optional (related docs), PG YES, GPT uses
  • /kb/list, /kb/get: PG only, GPT uses
  • /health: Probe only, not used by GPT

6. CODE CONFIG

  • Qdrant timeout: 15s hardcoded (vector_store.py:137)
  • Retry: 3x exponential 1-4s (resilient_client.py:36-38)
  • Health cache TTL: 30s, no periodic re-probe
  • Circuit breaker: NONE

7. RECOMMENDATIONS (NOT APPLIED)

  1. Periodic health re-probe (background task, not just startup)
  2. Reduce Qdrant timeout 15s to 3-5s for local Docker
  3. Add circuit breaker: skip Qdrant when DOWN, use PG fallback immediately
  4. Cache embeddings to reduce /chat total latency
  5. Auto noop_qdrant when health degraded for faster PG-only responses