KB-3987

IU Core 2400x — Qdrant full corpus reindex (Slice B)

7 min read Revision 1
iu-core2400xdieu44qdrantreindexper-iu-boundaryvector-sync

03 — Qdrant full corpus reindex (Slice B)

1. Embeddable population (discovery-first)

The discovery query joins v_ui_iu_three_axis_envelope (163 active IUs) with unit_version filtered on lifecycle_status='enacted' (the only lifecycle that carries an authoritative body). Result on live directus:

enacted_iu_count = 60
min_body_len     = 7          max_body_len = 1995
sum_body_chars   = 9 519
need_chunking >1800 chars = 1     (KT-B at 1995 chars)
already_indexed_distinct_unit_ids = 5    (from 2000x)
already_indexed_points            = 6

So the full embeddable corpus is 60 IUs / 61 points (one IU split under the per-IU boundary). The remaining 103 envelope rows are either draft (91) or deprecated (12) — they have no enacted body to embed. This is the canonical max under the per-IU vector boundary rule, not an artificial cap.

2. Driver — tools/2400x-full-reindex.py (not checked into prod)

Lives at /tmp/2400x-full-reindex.py inside incomex-agent-data (the container that owns OPENAI_API_KEY and QDRANT_API_KEY — keys never crossed to the MacBook host). The driver:

  1. Discovers the collection plan from v_iu_qdrant_collection_active (DB SSOT) -> iu_core_iu_chunks / dim 1536 / Cosine / openai:text-embedding-3-small.
  2. ensure_collection — already exists, no create.
  3. Fetches every enacted IU via the discovery JOIN above.
  4. For each batch (size 30, configurable via BATCH_SIZE):
    • Builds points via build_iu_point_set(unit_id, canonical_address, body, axis_refs). axis_refs carries source_axis_ref/semantic_axis_ref/hierarchy_axis_ref derived from the envelope.
    • assert_boundary(points) — app-layer guard.
    • Opens the gate (UPDATE dot_config SET value='true' WHERE key='iu_core.vector_sync_enabled'), runs apply_iu_set(..., record_status='indexed'), always closes the gate in finally.
  5. After all batches, queries Qdrant /collections/iu_core_iu_chunks + iu_vector_sync_point for verification.

3. Live evidence

plan: iu_core_iu_chunks dim=1536 Cosine embedder=openai:text-embedding-3-small status=active
ensure_collection: created=False
candidates: 60
gate_opened: iu_core.vector_sync_enabled=true
batch 1: planned=30 upserted=30 recorded=30 boundary_ok=True in 1.07s
batch 2: planned=31 upserted=31 recorded=31 boundary_ok=True in 0.78s
TOTAL  planned=61 upserted=61 recorded=61
gate_closed: iu_core.vector_sync_enabled=false
qdrant_info: status=200 points_count=61
  pg_registry dryrun: 3      (grandfather)
  pg_registry indexed: 61
  pg_registry unique_unit_ids_this_actor=60
  gate_final: false
DONE

Total wall time across both batches: 1.85 s. The two-batch shape is defensive (gate closes between batches via the try/finally even if a batch were to throw mid-flight); for a 9.5 KB corpus, the cost is negligible.

4. Drift / boundary verification (post-apply)

Driver /tmp/2400x-drift.py scrolls the entire Qdrant collection + audits payload boundary fields + compares to PG registry:

{
  "pg_indexed":              61,
  "qdrant_points":           61,
  "pg_unique_ius":           60,
  "qdrant_unique_unit_ids":  60,
  "embeddable_population":   60,
  "axis_refs_coverage":      "61/61",
  "missing_required_fields": [],
  "forbidden_payloads":      [],
  "multi_chunk_ius":         ["d3ad5874-9e32-4179-b6f6-586722288278"],
  "drift_detected":          false
}
gate_final = false
production_documents_status = green points_count = 9 213    (untouched)

Every assertion holds:

  • pg_indexed == qdrant_points (61).
  • pg_unique_ius == qdrant_unique_unit_ids == embeddable_population (60).
  • 61 / 61 points carry source_axis_ref + semantic_axis_ref + hierarchy_axis_ref.
  • Zero missing required boundary fields (unit_id, chunk_index, chunk_count, source_kind, point_key, content_digest).
  • Zero forbidden payloads (no summary_marker=true on per-IU points; no source_kind != 'iu'; no missing unit_id).
  • One multi-chunk IU (KT-B, chunks [0, 1]) — matches the chunker's per-IU boundary projection of a 1995-char body at DEFAULT_IU_CHUNK_CHARS=1800.
  • production_documents collection still 9 213 points / green — proves the IU Core write was strictly isolated to iu_core_iu_chunks.

5. Per-IU vector-boundary rule — 3-layer enforcement intact

layer enforcement proof
App assert_boundary(points) in vector_sync.py before any upsert every batch logs boundary_ok=True
DB function fn_iu_vector_sync_record_v2 rejects per-IU points missing unit_id/chunk_index/chunk_count and refuses 'indexed' when gate closed 61 records inserted with sync_status='indexed', last_actor='iu_core_2400x_full_reindex'
DB CHECK iu_vector_sync_point_boundary_chk enforces the rule at row level no constraint violations on any of the 61 rows

No vector spans two IUs. No collection/corpus vector pollutes the per-IU collection (the registry's purpose='iu_core_per_iu_chunks' makes the rule discoverable, and the driver never builds a summary vector here).

6. Reversibility / disable

  • Per-actor rollback: DELETE FROM iu_vector_sync_point WHERE sync_status='indexed' AND last_actor='iu_core_2400x_full_reindex' — removes the 61 PG registry rows.
  • Per-point Qdrant delete: ids are UUIDv5 over point_key in namespace iu-core.qdrant.point-id.v1. The driver can recompute the ids and DELETE /collections/iu_core_iu_chunks/points/<uuid> per point.
  • Whole-collection rollback: DELETE /collections/iu_core_iu_chunks (atomic; reversible by runtime/310 to re-register, then re-apply).
  • Gate disable (already in this state): UPDATE dot_config SET value='false' WHERE key='iu_core.vector_sync_enabled'.

7. Five-layer impact

layer impact
PG iu_vector_sync_point +55 indexed rows (6 -> 61; the 6 existing rows were UPSERTed to the 2400x actor — UUIDv5 wins idempotently); gate toggled then closed
Directus none in this slice (Slice A, doc 02)
Nuxt none
AgentData none in this slice (Qdrant client uses agent-data env, but no AgentData write)
Qdrant iu_core_iu_chunks 6 -> 61 points (60 unique IUs, KT-B split); production_documents untouched
Back to Knowledge Hub knowledge/dev/laws/dieu44-trien-khai/v0.6-iu-core-2400x-directus-promotion-full-qdrant-reindex-open-goal/03-qdrant-full-reindex.md