9000x-onboarding · 03 — Dry-run results + content policy (5 empty-body skip)
9000x — Dry-run + content policy
Live dry-run (Mac via psql-over-ssh)
Same JSON output also reproduced in-container via in_container_run.py
(psycopg2 path):
{
"mode": "dry_run",
"plan_summary": {
"candidate_count": 86,
"empty_body_policy": "skip",
"empty_body_skipped_count": 5,
"per_doc_candidates": {"DIEU-28": 27, "DIEU-32": 23, "DIEU-35": 36},
"per_doc_chunks": {"DIEU-28": 22, "DIEU-32": 23, "DIEU-35": 43},
"per_doc_empty": {"DIEU-28": 5},
"point_count": 88,
"preflight_ok": true,
"to_index_count": 81
},
"gate_opened": false,
"gate_closed": false,
"points_upserted": 0
}
Predicted point distribution
- DIEU-28: 27 enacted IUs → 5 skipped (empty body) → 22 to index → 22 chunks (all single-chunk; max body 1672 < 1800 char ceiling).
- DIEU-32: 23 IUs → 23 to index → 23 chunks (all single-chunk).
- DIEU-35: 36 IUs → 36 to index → 43 chunks (7 IUs with body > 1800 chars split into 2 chunks each; max single body 3464 chars).
- Total: 88 chunks across 81 indexable IUs.
Empty-body policy
Default policy --empty-body-policy skip:
- 5 DIEU-28 IUs with
unit_version.body = ''(length 0) are NOT embedded; - the dry-run summary records them under
empty_body_skipped_count; - they remain
lifecycle_status='enacted'and reachable to all other IU Core surfaces; only the Qdrant vector is absent.
Rationale: embedding an empty string with OpenAI yields a generic centroid
vector that would draw irrelevant retrieval results — i.e. the vector
would be a fake. The alternative policy --empty-body-policy index is
available for explicit operator override but is NOT used by default and
was NOT exercised in this run.
To revisit the 5 empty-body IUs (e.g. after a content backfill), re-run:
python3 ops/qdrant-onboarding-package-8000x/run_onboarding.py \
--apply --docs DIEU-28 --actor 'iu-core-XXXX/empty-body-backfill' \
--collection iu_core_iu_chunks --empty-body-policy index
Preflight
qdrant_onboarding.preflight_iu_ids ran against the 86-IU set and
returned is_ok=True (all enacted, none not_found). The non-enacted
refusal path is locked by tests test_dry_run_refuses_draft_iu +
test_assert_enacted_only_raises_on_draft (6000x).
Forbidden actions avoided
- No HTTP call to Qdrant or OpenAI.
- No
vector_syncgate open. - No write to
iu_vector_sync_point. - No touch of
production_documents. - No cross-IU vector / shared chunk.