KB-18A0
IU vector-sync boundary rule — one vector never spans two IUs
3 min read Revision 1
iu-corevector-syncqdrantboundary-ruleembeddingbinding-ruledot-iu-cutter
02 — IU vector-sync boundary rule (per-IU embedding integrity)
- Macro:
AGENTDATA_MCP_CONNECTOR_REPAIR_AND_1K_REPORT_VERIFICATION - Date: 2026-05-23
- Status: BINDING RULE for all IU-Core → Qdrant / vector-store sync
work (the 1200x
IU_CORE_1200X_QDRANT_LIVE_SYNC_AND_OPERATOR_UImacro and beyond).
Rule
When embedding Information Units (IUs) into a vector store:
- One vector per chunk; one chunk = content from exactly one IU. A vector / chunk must never contain content drawn from more than one IU.
- Over-long IU → chunk only inside that IU's boundary. If a single IU is too long for one embedding unit, split it into multiple chunks — but every chunk stays strictly within that one IU. A chunk must never straddle an IU boundary.
- Every chunk carries identity. Each chunk's payload / metadata must
carry
unit_id(the IU id) andparent_piece_id. A chunk with no IU identity is invalid. - Never concatenate IU A + IU B into one embedding unit. No merging of distinct IUs into a shared vector for any reason — not padding, not batching, not "context-window efficiency".
- Collection / document-level vectors are metadata only. A collection- or document-level summary vector is permitted ONLY as an explicitly-marked metadata / summary object. It must never replace or substitute for the per-IU vectors.
Why
The IU is the atomic unit of governed meaning. A vector that mixes two IUs
makes retrieval, drift detection (iu_vector_sync_point.content_digest vs
indexed_digest) and per-IU provenance unsound — a search hit could no
longer be traced to a single governed unit. Per-IU boundary integrity keeps
the vector layer consistent with the five-layer model and the unit_id /
parent_piece_id lineage already enforced in Postgres.
How to apply
cutter_agent/iu_core/vector_sync.pybuild_sync_plan/VectorPoint: when an IU exceeds the embedding size limit, emit multipleVectorPoints with a stable per-chunkpoint_keyderived fromunit_id+ chunk index — never one point spanning two IUs.- Carry
unit_idandparent_piece_idin every Qdrant point payload. - A
source_kind=collectionorsource_kind=corpuspoint is allowed only as a marked summary object — additive to, never instead of, thesource_kind=iupoints.
Relates to: the 1k macro doc 02 (vector-sync foundation) and the 1200x
IU_CORE_1200X_QDRANT_LIVE_SYNC_AND_OPERATOR_UI next-macro package.