KB-5444

IU Core 1k — 02 Vector / semantic sync foundation

4 min read Revision 1
iu-core1kvector-syncqdrantconnectordot-iu-cutter

02 — Vector / semantic sync foundation

1. The gap closed

The 960x macro specified the IU-Core → Qdrant reindex contract (doc 05) but left the connector + the gate unbuilt. This macro builds the durable substrate, the connector module, and proves a bounded dry-run.

2. Migration 019 — vector-sync substrate (durable, applied)

Additive DDL, applied to production directus:

  • iu_core.vector_sync_enabled — the config gate, default false. The single off-switch for any push to the external vector store.
  • fn_iu_vector_sync_enabled() — the gate function.
  • iu_vector_sync_point — the point registry: one row per vector-index unit (an iu-tree/**/*.md manifest). content_digest is the live digest; indexed_digest is what the external store last acknowledged — a mismatch is drift.
  • fn_iu_vector_sync_record(...) — the governed upsert, the ONLY writer. Fail-closed: an indexed status (asserting a real external write occurred) is REFUSED while the gate is shut. planned/dryrun/drift/ disabled are pure internal bookkeeping and need no gate.
  • v_iu_vector_sync_status — the drift / coverage surface (drifted / in_sync booleans).

3. cutter_agent/iu_core/vector_sync.py — the connector

Framework-free, no IO at import; the SQL registry is written through an injected SqlExecutor, the external Qdrant store through an injected HttpPoster — fully unit-testable, no socket opened until a verb runs.

  • build_sync_plan(root) — discovers every iu-tree/**/*.md manifest, projects each to a VectorPoint (stable point_key, sha256 content_digest, source_kind in iu/collection/corpus). Deterministic.
  • record_plan(...) — records points through the governed fn_iu_vector_sync_record; refuses an indexed status.
  • drift_report(...) — compares every registered point against the live plan; a digest mismatch or a missing manifest is drift.
  • QdrantConnector — idempotent upsert_points keyed by the stable point id (never a destructive reindex); read-only health probe.
  • CLI / one-commands dot_iu_vector_sync_{plan,dryrun,apply,verify, drift,disable}. apply (the external write) is deliberately NOT a CLI verb — it needs the gate open AND a wired embedder + poster.

4. Bounded dry-run — proven durably

vector_sync dryrun discovered the live iu-tree/ export and recorded 3 dryrun points in iu_vector_sync_point (no external write): the two collection manifests (iu_core.composer.pilot-doc-001, iu_core.autocut.file-001) and the _corpus/corpus-manifest. verify confirms gate false, 3 dryrun rows. The rows are additive + truncatable.

5. sandbox/170 — vector-sync probe, BEGIN ... ROLLBACK — 7/7

T1 gate CLOSED -> 'indexed' REFUSED; T2 'dryrun' records with gate shut; T3 re-record same key -> idempotent upsert, 1 row; T4 gate OPEN -> 'indexed' sets indexed_digest; T5 matching index -> in_sync; T6 new digest -> drifted=true; T7 empty content_digest REFUSED. T6 first FAILED — it exposed a real defect: the drift view computed drifted only WHERE sync_status='indexed'. Fixed: drift is a fact about DIGESTS — drifted = indexed_digest IS NOT NULL AND indexed_digest IS DISTINCT FROM content_digest. Re-applied + re-probed 7/7.

6. Qdrant — connectivity proven, exact blocker recorded

The incomex-qdrant container is reachable on the docker network at incomex-qdrant:6333 (proven via a read-only fetch from incomex-directus). Exact blocker for an external apply: Qdrant requires an api-key request header; the connector's injected HttpPoster must be wired with it from the VPS deployment secret — NOT in this repo. The secret was never read or logged. An external apply also needs an embedder. Per the FORBIDDEN list (no destructive vector reindex) no external write was performed — the bounded dry-run is the correct proof boundary.

Back to Knowledge Hub knowledge/dev/laws/dieu44-trien-khai/v0.6-iu-core-1k-vector-operator-ui-delivery-acceptance-open-goal/02-vector-sync-foundation.md