IU Core 2400x — Qdrant full corpus reindex (Slice B)
03 — Qdrant full corpus reindex (Slice B)
1. Embeddable population (discovery-first)
The discovery query joins v_ui_iu_three_axis_envelope (163 active IUs)
with unit_version filtered on lifecycle_status='enacted' (the only
lifecycle that carries an authoritative body). Result on live directus:
enacted_iu_count = 60
min_body_len = 7 max_body_len = 1995
sum_body_chars = 9 519
need_chunking >1800 chars = 1 (KT-B at 1995 chars)
already_indexed_distinct_unit_ids = 5 (from 2000x)
already_indexed_points = 6
So the full embeddable corpus is 60 IUs / 61 points (one IU split
under the per-IU boundary). The remaining 103 envelope rows are either
draft (91) or deprecated (12) — they have no enacted body to embed.
This is the canonical max under the per-IU vector boundary rule, not
an artificial cap.
2. Driver — tools/2400x-full-reindex.py (not checked into prod)
Lives at /tmp/2400x-full-reindex.py inside incomex-agent-data (the
container that owns OPENAI_API_KEY and QDRANT_API_KEY — keys never
crossed to the MacBook host). The driver:
- Discovers the collection plan from
v_iu_qdrant_collection_active(DB SSOT) ->iu_core_iu_chunks/ dim 1536 / Cosine /openai:text-embedding-3-small. ensure_collection— already exists, no create.- Fetches every enacted IU via the discovery JOIN above.
- For each batch (size 30, configurable via
BATCH_SIZE):- Builds points via
build_iu_point_set(unit_id, canonical_address, body, axis_refs).axis_refscarriessource_axis_ref/semantic_axis_ref/hierarchy_axis_refderived from the envelope. assert_boundary(points)— app-layer guard.- Opens the gate (
UPDATE dot_config SET value='true' WHERE key='iu_core.vector_sync_enabled'), runsapply_iu_set(..., record_status='indexed'), always closes the gate infinally.
- Builds points via
- After all batches, queries Qdrant
/collections/iu_core_iu_chunks+iu_vector_sync_pointfor verification.
3. Live evidence
plan: iu_core_iu_chunks dim=1536 Cosine embedder=openai:text-embedding-3-small status=active
ensure_collection: created=False
candidates: 60
gate_opened: iu_core.vector_sync_enabled=true
batch 1: planned=30 upserted=30 recorded=30 boundary_ok=True in 1.07s
batch 2: planned=31 upserted=31 recorded=31 boundary_ok=True in 0.78s
TOTAL planned=61 upserted=61 recorded=61
gate_closed: iu_core.vector_sync_enabled=false
qdrant_info: status=200 points_count=61
pg_registry dryrun: 3 (grandfather)
pg_registry indexed: 61
pg_registry unique_unit_ids_this_actor=60
gate_final: false
DONE
Total wall time across both batches: 1.85 s. The two-batch shape is defensive (gate closes between batches via the try/finally even if a batch were to throw mid-flight); for a 9.5 KB corpus, the cost is negligible.
4. Drift / boundary verification (post-apply)
Driver /tmp/2400x-drift.py scrolls the entire Qdrant collection +
audits payload boundary fields + compares to PG registry:
{
"pg_indexed": 61,
"qdrant_points": 61,
"pg_unique_ius": 60,
"qdrant_unique_unit_ids": 60,
"embeddable_population": 60,
"axis_refs_coverage": "61/61",
"missing_required_fields": [],
"forbidden_payloads": [],
"multi_chunk_ius": ["d3ad5874-9e32-4179-b6f6-586722288278"],
"drift_detected": false
}
gate_final = false
production_documents_status = green points_count = 9 213 (untouched)
Every assertion holds:
pg_indexed == qdrant_points(61).pg_unique_ius == qdrant_unique_unit_ids == embeddable_population(60).- 61 / 61 points carry
source_axis_ref + semantic_axis_ref + hierarchy_axis_ref. - Zero missing required boundary fields (
unit_id,chunk_index,chunk_count,source_kind,point_key,content_digest). - Zero forbidden payloads (no
summary_marker=trueon per-IU points; nosource_kind != 'iu'; no missingunit_id). - One multi-chunk IU (KT-B, chunks
[0, 1]) — matches the chunker's per-IU boundary projection of a 1995-char body atDEFAULT_IU_CHUNK_CHARS=1800. production_documentscollection still 9 213 points / green — proves the IU Core write was strictly isolated toiu_core_iu_chunks.
5. Per-IU vector-boundary rule — 3-layer enforcement intact
| layer | enforcement | proof |
|---|---|---|
| App | assert_boundary(points) in vector_sync.py before any upsert |
every batch logs boundary_ok=True |
| DB function | fn_iu_vector_sync_record_v2 rejects per-IU points missing unit_id/chunk_index/chunk_count and refuses 'indexed' when gate closed |
61 records inserted with sync_status='indexed', last_actor='iu_core_2400x_full_reindex' |
| DB CHECK | iu_vector_sync_point_boundary_chk enforces the rule at row level |
no constraint violations on any of the 61 rows |
No vector spans two IUs. No collection/corpus vector pollutes the
per-IU collection (the registry's purpose='iu_core_per_iu_chunks'
makes the rule discoverable, and the driver never builds a summary
vector here).
6. Reversibility / disable
- Per-actor rollback:
DELETE FROM iu_vector_sync_point WHERE sync_status='indexed' AND last_actor='iu_core_2400x_full_reindex'— removes the 61 PG registry rows. - Per-point Qdrant delete: ids are UUIDv5 over
point_keyin namespaceiu-core.qdrant.point-id.v1. The driver can recompute the ids andDELETE /collections/iu_core_iu_chunks/points/<uuid>per point. - Whole-collection rollback:
DELETE /collections/iu_core_iu_chunks(atomic; reversible byruntime/310to re-register, then re-apply). - Gate disable (already in this state):
UPDATE dot_config SET value='false' WHERE key='iu_core.vector_sync_enabled'.
7. Five-layer impact
| layer | impact |
|---|---|
| PG | iu_vector_sync_point +55 indexed rows (6 -> 61; the 6 existing rows were UPSERTed to the 2400x actor — UUIDv5 wins idempotently); gate toggled then closed |
| Directus | none in this slice (Slice A, doc 02) |
| Nuxt | none |
| AgentData | none in this slice (Qdrant client uses agent-data env, but no AgentData write) |
| Qdrant | iu_core_iu_chunks 6 -> 61 points (60 unique IUs, KT-B split); production_documents untouched |