KB-12BF

IU Core 3000x — 03 Qdrant retrieval smoke + per-IU boundary re-audit

5 min read Revision 1
iu-core3000xdieu44qdrantiu-core-iu-chunksvector-boundaryretrieval-smokepayload-audit

03 — Qdrant retrieval smoke + per-IU boundary re-audit

1. Why

2400x indexed 60/60 enacted IUs into iu_core_iu_chunks (61 points because KT-B splits into 2 chunks within its own boundary). 3000x must prove the collection is retrievable, the per-IU vector boundary still holds, and production_documents remains untouched.

All probes here are read-only HTTP against incomex-qdrant from a throw-away curlimages/curl container on docker_incomex. Qdrant api-key was discovered from incomex-agent-data env (QDRANT_API_KEY= vps-qdrant-…0662d3be58) and never logged at file level.

2. Collection metadata

{
  "status": "green",
  "optimizer_status": "ok",
  "indexed_vectors_count": 0,
  "points_count": 61,
  "segments_count": 3,
  "config": {
    "params": { "vectors": { "size": 1536, "distance": "Cosine" } }
  }
}

indexed_vectors_count = 0 is a Qdrant detail: HNSW indexes points only above indexing_threshold = 10 000 per segment. The 61 points are fully retrievable via brute-force search (which is what we use for the smoke). Above 10k we get HNSW automatically — no IU Core change needed.

3. Full payload audit (61 points)

Check Value
total_points 61
unique_unit_ids 60
unit_ids_with_multiple_chunks ['d3ad5874-9e32-4179-b6f6-586722288278'] (KT-B)
multi_chunk payloads (sample) (KT-B, 1, 2), (KT-B, 0, 2)
payloads_without_unit_id 0

Every payload carries:

  • unit_id (always present);
  • chunk_index, chunk_count (1/1 for atomic IUs, 0/2 and 1/2 for KT-B);
  • source_kind = 'iu';
  • source_path = 'information_unit/<unit_id>';
  • content_digest;
  • axis_refs.source_axis_ref, axis_refs.semantic_axis_ref, axis_refs.hierarchy_axis_ref.

This is the 3-layer boundary contract (app-level assert_boundary + DB CHECK via fn_iu_vector_sync_record_v2 + Qdrant payload) re-audited post-2400x. Zero cross-IU vectors found.

4. Retrieval smoke

POST /collections/iu_core_iu_chunks/points/search
{ "vector": [0.05]*1536, "limit": 5, "with_payload": true }

Returns 5 hits, each carrying valid unit_id + chunk + axis refs:

score=0.0495 unit_id=845c04d6… chunk=0/1 sem=axis_b:ICX-CONST/DIEU-0-S-M-L
score=0.0404 unit_id=6da58e8e… chunk=0/1 sem=axis_b:ICX-CONST/DIEU-0-G
score=0.0306 unit_id=fcd1b351… chunk=0/1 sem=axis_b:ICX-CONST/DIEU-24
score=0.0293 unit_id=224dbb9f… chunk=0/1 sem=axis_b:ICX-CONST/DIEU-26
score=0.0253 unit_id=fb6f2197… chunk=0/1 sem=axis_b:ICX-CONST/DIEU-43

The vector was a synthetic constant — the scores are uniformly low (no semantic match) but the search path executes and returns the expected shape. Top-k traces back to a single source IU each.

5. Drift vs PG sync table

Source Count Notes
Qdrant iu_core_iu_chunks points_count 61 live HTTP
PG iu_vector_sync_point (sync_status='indexed') 61 SELECT count(*) ... WHERE sync_status='indexed'
PG rows tagged last_actor='iu_core_2400x_full_reindex' 61 matches
Drift (PG indexed − Qdrant points) 0 aligned
Missing IU ids 0 60 unique IUs, all enacted
Duplicate chunks 0 KT-B is the only multi-chunk IU; chunk_index distinct

6. production_documents non-interference

GET /collections/production_documents -> status=green, points_count=9226

(2400x recorded 9 213; +13 organic growth from external agent-data ingestion, unrelated to IU Core. IU Core writes nothing to this collection — confirmed by 0 sync points whose collection_name = 'production_documents' exists in iu_vector_sync_point.)

7. Rollback / bounded delete

  • Per-actor undo: DELETE FROM iu_vector_sync_point WHERE last_actor = 'iu_core_2400x_full_reindex', then loop the same UUIDv5 ids through DELETE /collections/iu_core_iu_chunks/points/<uuid>.
  • Atomic undo of the entire collection: DELETE /collections/iu_core_iu_chunks then re-run runtime/310 to re-register from v_iu_qdrant_collection_active.
  • Neither was needed in 3000x — retrieval verification was the goal.

8. Operator one-command surface

dot_iu_qdrant_collection_status  -- registry row + per-sync-status counts
dot_iu_external_healthcheck      -- aggregate over Directus + Qdrant + cache

Both are read-only and listed under category external in cutter_agent/iu_core/dot_commands.py.

Back to Knowledge Hub knowledge/dev/laws/dieu44-trien-khai/v0.6-iu-core-3000x-nuxt-redeploy-auto-refresh-retrieval-open-goal/03-qdrant-retrieval-and-boundary.md