KB-674C

IU Core 1500x — Qdrant collection registry + embedder seam + live-safe sync

6 min read Revision 1
iu-core1500xqdrantembeddermigration-021vector-boundary

02 — Qdrant collection registry + embedder seam + live-safe sync

1. The gap closed

After 1k+, the per-IU vector-boundary rule was enforced at three layers (application + function + DB CHECK) — but the actual Qdrant collection NAME, dimension, distance metric, and embedder MODEL were still implicit literals scattered through docs. A naive caller could upsert_points to the agent-data-shared production_documents collection, contaminating retrieval, because nothing in the database said which collection IU Core writes to. Migration 021 closes that gap: the IU Core Qdrant collection plan is registry-backed.

2. Migration 021 — registry substrate (durable, applied)

Additive DDL, applied to production directus:

  • iu_qdrant_collection_registry(collection_name UNIQUE, vector_dim, distance_metric, embedder_model_ref, purpose, status, notes, created_at, updated_at, retired_at). The unique key is collection_name; identity (dim + distance) is immutable on re-register. Status: planned -> active -> retired (reversible).
  • fn_iu_qdrant_collection_register — governed upsert. Idempotent on collection_name; refuses re-dimension or distance change; never logs or stores a secret (the model NAME openai:text-embedding-3-small is recorded, the API key is not).
  • fn_iu_qdrant_collection_retire — reversible retire (status=retired + retired_at=now()); a subsequent register call resurrects the row.
  • v_iu_qdrant_collection_active — read view (planned/active only).

runtime/310 seeds the default IU Core plan:

field value
collection_name iu_core_iu_chunks
vector_dim 1536
distance_metric Cosine
embedder_model_ref openai:text-embedding-3-small
purpose iu_core_per_iu_chunks
status (post-live-create) active

iu_core_iu_chunks is separate from agent-data's shared production_documents — the boundary rule requires per-IU and shared vectors to live in different collections so retrieval cannot mix them.

3. vector_sync.py — embedder seam + bounded apply

New, additive on top of the 1k+ connector:

  • QdrantCollectionPlan (frozen dataclass) — read from v_iu_qdrant_collection_active via discover_collection_plans / discover_default_plan(purpose='iu_core_per_iu_chunks'). The collection name + dim + distance + embedder ref live in ONE place: the database.
  • Embedder protocol: embed(texts) -> list[list[float]]. Two concrete drivers:
    • NoopEmbedder — deterministic, content-addressed, network-free; used by tests + dryrun.
    • OpenAIEmbedder — reads OPENAI_API_KEY lazily AT CALL TIME (never at import); HTTP poster is INJECTED for unit testing; the key is never logged on error; enforces the plan's vector_dim via the dimensions parameter.
  • ensure_collection(connector, plan) — idempotent: returns False if the collection already exists, else creates it (PUT /collections/<name> with vectors.size = plan.vector_dim, vectors.distance = plan.distance_metric).
  • apply_iu_set(plan, points, bodies, embedder, connector, executor, actor, record_status='dryrun') — bounded live apply. Enforces the per-IU boundary at three layers, embeds + upserts + records in the governed registry. record_status='indexed' is refused unless the iu_core.vector_sync_enabled gate is open.
  • New CLI verbs: dot_iu_qdrant_collection_list /_register /_retire.

4. Bounded live Qdrant proof (reversible)

A bounded, reversible live Qdrant collection create ran inside the incomex-agent-data container (where the keys live; no key crossed into this session):

NOT_EXISTS: 404
CREATED:    200 {"result":true,"status":"ok","time":0.136409452}
LIST:       {"collections":[{"name":"production_documents"}, {"name":"iu_core_iu_chunks"}]}
CONFIG:     status=green, indexed_vectors_count=0, points_count=0, vectors.size=1536, vectors.distance=Cosine, segments=3

The registry row was then flipped planned -> active via fn_iu_qdrant_collection_register (idempotent; identity unchanged).

What is NOT done in this macro: no embedder call, no point upserted, no flip of iu_core.vector_sync_enabled, no destructive change to production_documents.

The bounded-apply path is fully proved by unit tests with injected poster + executor — the next slice runs it end-to-end against the live collection.

5. Reversibility

  • Qdrant: DELETE /collections/iu_core_iu_chunks removes the live collection (atomic; no IU Core durable PG row references it).
  • Registry: fn_iu_qdrant_collection_retire(...) flips status to retired (reversible).
  • Code: additive imports only.
  • DDL: rollback/021_* drops the table + 2 views + 2 fns.

6. Five-layer impact

layer impact
PG additive durable: 1 table + 2 views + 2 functions + 1 registry row
Directus none (the registration package is built but not applied — deploy-gated)
Nuxt none
AgentData 7 reports uploaded + verified
Qdrant new EMPTY collection iu_core_iu_chunks (size=1536, Cosine, status=green); production_documents untouched
Back to Knowledge Hub knowledge/dev/laws/dieu44-trien-khai/v0.6-iu-core-1500x-qdrant-directus-nuxt-external-closeout-open-goal/02-qdrant-collection-registry-and-embedder.md