KB-1879

9000x-onboarding · 03 — Dry-run results + content policy (5 empty-body skip)

3 min read Revision 1
iu-corev0.69000xdry-runcontent-policyempty-body

9000x — Dry-run + content policy

Live dry-run (Mac via psql-over-ssh)

Same JSON output also reproduced in-container via in_container_run.py (psycopg2 path):

{
  "mode": "dry_run",
  "plan_summary": {
    "candidate_count": 86,
    "empty_body_policy": "skip",
    "empty_body_skipped_count": 5,
    "per_doc_candidates": {"DIEU-28": 27, "DIEU-32": 23, "DIEU-35": 36},
    "per_doc_chunks":     {"DIEU-28": 22, "DIEU-32": 23, "DIEU-35": 43},
    "per_doc_empty":      {"DIEU-28": 5},
    "point_count": 88,
    "preflight_ok": true,
    "to_index_count": 81
  },
  "gate_opened": false,
  "gate_closed": false,
  "points_upserted": 0
}

Predicted point distribution

  • DIEU-28: 27 enacted IUs → 5 skipped (empty body) → 22 to index → 22 chunks (all single-chunk; max body 1672 < 1800 char ceiling).
  • DIEU-32: 23 IUs → 23 to index → 23 chunks (all single-chunk).
  • DIEU-35: 36 IUs → 36 to index → 43 chunks (7 IUs with body > 1800 chars split into 2 chunks each; max single body 3464 chars).
  • Total: 88 chunks across 81 indexable IUs.

Empty-body policy

Default policy --empty-body-policy skip:

  • 5 DIEU-28 IUs with unit_version.body = '' (length 0) are NOT embedded;
  • the dry-run summary records them under empty_body_skipped_count;
  • they remain lifecycle_status='enacted' and reachable to all other IU Core surfaces; only the Qdrant vector is absent.

Rationale: embedding an empty string with OpenAI yields a generic centroid vector that would draw irrelevant retrieval results — i.e. the vector would be a fake. The alternative policy --empty-body-policy index is available for explicit operator override but is NOT used by default and was NOT exercised in this run.

To revisit the 5 empty-body IUs (e.g. after a content backfill), re-run:

python3 ops/qdrant-onboarding-package-8000x/run_onboarding.py \
    --apply --docs DIEU-28 --actor 'iu-core-XXXX/empty-body-backfill' \
    --collection iu_core_iu_chunks --empty-body-policy index

Preflight

qdrant_onboarding.preflight_iu_ids ran against the 86-IU set and returned is_ok=True (all enacted, none not_found). The non-enacted refusal path is locked by tests test_dry_run_refuses_draft_iu + test_assert_enacted_only_raises_on_draft (6000x).

Forbidden actions avoided

  • No HTTP call to Qdrant or OpenAI.
  • No vector_sync gate open.
  • No write to iu_vector_sync_point.
  • No touch of production_documents.
  • No cross-IU vector / shared chunk.
Back to Knowledge Hub knowledge/dev/laws/dieu44-trien-khai/v0.6-iu-core-9000x-qdrant-onboarding-piece-platform-open-goal/03-dry-run-and-content-policy.md