KB-233A

01 — IU Limited-Production Pilot Operating Protocol (operator-ready) (2026-05-28)

10 min read Revision 1
iupilotoperating-protocoloperator-runbookgatesreview-decisionrollbackdlqhealth-checkstop-conditions2026-05-28

01 — IU Limited-Production Pilot Operating Protocol

Audience: the operator (human or supervised agent) running the IU Core in limited production. This document is self-contained — you do not need to read any prior report to operate the pilot.

Channel reminder: read via query_pg (role context_pack_readonly). Any write goes via SSH docker exec -i postgres psql -U workflow_admin -d directus on the contabo VPS (keep VPS commands top-level — do NOT nest ssh contabo "… ssh contabo …", the inner runs on the VPS and fails DNS).


1. What the pilot IS allowed to do

  1. Create IUs via the canonical function only: fn_iu_create(p_canonical_address, p_title, p_body, p_actor, p_unit_kind, p_section_type, p_owner_ref, p_publication_type, p_parent_ref). Direct INSERT into information_unit is blocked by fn_iu_gateway_write_guard (gateway mode=enforced, direct_insert_policy=block_after_guard).
  2. Edit drafts via fn_iu_create_edit_draft(p_address,p_body,p_actor,p_reason,p_title).
  3. Enact a draft to enacted via fn_iu_enact(...) — requires a real review_decision_id (because iu_enact.allow_no_review_decision=false, a NEVER-FLIP key).
  4. Structure operations (reparent, deprecate) via fn_iu_structure_op(...) only while the structure_ops_enabled gate is open under the bounded gate protocol (§5).
  5. Split / merge via fn_iu_piece_split / fn_iu_piece_merge — these create draft children/merged rows, leave the source row untouched, and require an FK-probed review_decision_id. They do NOT need a gate or the cut-state-machine.
  6. Retire / supersede via fn_iu_retire / fn_iu_supersede — both require p_review_decision_id; both support p_dry_run. (These are now in the gateway allowlist — see §9.)
  7. Read everything freely via the read-only role.

2. What remains FORBIDDEN in the pilot

  • Flipping either NEVER-FLIP key: iu_enact.allow_no_review_decision, iu_core.vector_sync_enabled. Never, under any circumstance.
  • Self-minting a production review_decision (Điều 32) — the agent may only mint a test-scoped decision via fn_iu_test_review_decision_create; production decisions need human/council/sovereign sign-off (doc 06).
  • P-pub hard-block (block_all) — not until backfill is complete (doc 05).
  • Any Qdrant / vector write; any Directus collection mutation; any Nuxt/UI build.
  • Opening a gate without an approval_id and a bounded TTL (§5).
  • Hard-deleting IU rows as a cleanup shortcut — use fn_iu_retire (lawful lifecycle), not DELETE.

3. Who / what can operate it

  • Operator-agent: may create/edit/split/merge/retire/supersede using test-scoped review decisions and within open gates it opened under §5. Must log every DOT command.
  • Human/council/sovereign: the only authority that can mint a production review_decision, approve a gate-open approval_id, and authorize P-pub stage transitions.
  • Canonical writers (the only app.canonical_writer marker values the gateway accepts): fn_iu_create, fn_iu_apply_edit_draft, fn_iu_enact, fn_iu_structure_op, fn_iu_retire, fn_iu_supersede. Anything else writing to information_unit is blocked.

4. Required gates (default = all CLOSED)

The pilot runs with all 8 governable gates closed by default. Only structure_ops_enabled and piece_event_runtime.emit_enabled are opened transiently for specific operations, under §5. delivery_enabled, operator_runtime_enabled, composer_enabled, three_axis_auto_refresh_enabled, queue.job_substrate.enabled, queue.dlq.replay_enabled stay closed unless a specific, approved task needs them.

5. Bounded gate protocol (mandatory for any gated op)

  1. Verify start state: SELECT * FROM fn_iu_gate_verify_closed(); → require all_safe=true.
  2. Open exactly one gate with an approval_id and a TTL ≤ 3600s: fn_iu_gate_open(p_gate_key, p_approval_id, p_actor, p_reason, p_ttl_seconds).
  3. Do the single intended operation.
  4. Close immediately: fn_iu_gate_close(p_gate_key, p_actor, p_reason).
  5. Re-verify: fn_iu_gate_verify_closed()all_safe=true.
  6. fn_iu_gate_watchdog(p_actor) force-closes any expired gate — run it at the start of every session.
  7. Gate functions refuse the 2 never-flip keys, non-governable keys, null approval_id, and ttl>3600 (fail-closed).

Durable-write method (proven): dress-rehearse the whole flow in BEGIN..ROLLBACK; author a DRY-RUN emergency rollback; COMMIT a small change; prove durability by reading it back in a FRESH psql connection; clean up lawfully via fn_iu_retire (not DELETE).

6. Required approval / review

  • Enact, retire, supersede, split, merge → all consume a review_decision_id. In the pilot, the operator-agent uses a test-scoped decision (fn_iu_test_review_decision_create, tagged test_scope, builder=automated_agent, cross_signed=false). Test decisions never promote to production governance — see doc 06.
  • Gate-open → requires an approval_id issued by a human/council authority.

7. Allowed documents / IUs in the pilot

  • Pilot content should be non-load-bearing or clearly test-tagged IUs until the production review_decision path (doc 06) and P-pub backfill (doc 05) land.
  • Do not create IUs that assert publication authority they don't have: fn_iu_create cannot bind a publication_authority_ref, and P-pub1/P-pub2 currently only warn. Treat newly created IUs as authority-unverified until backfill.

8. Rollback / retire / supersede rules

  • Reversibility (Điều 30): every mutating op must be reversible. Prefer fn_iu_retire / fn_iu_supersede (lawful lifecycle, logged to iu_lifecycle_log) over deletion.
  • Hard-delete (emergency only, human-authorized): requires SET CONSTRAINTS ALL DEFERRED (circular version_anchor FK); the gateway ignores DELETE; enacted_immut blocks DELETE of enacted rows. Always DRY-RUN first.
  • Keep retired trails (do not purge) unless the sovereign directs otherwise.

9. Audit / evidence requirements (Điều 31)

  • Every DOT command appends to dot_iu_command_run (audit actor recorded). Every lifecycle change appends to iu_lifecycle_log.
  • Zero-delta proof for read-only sessions must include the audit table: a "read-only" DOT wrapper still writes a dot_iu_command_run row — a true no-mutation proof shows IU-data counts unchanged AND accounts for audit rows.
  • Session evidence to capture: gate verify before/after, counts before/after, the exact SQL, the review_decision_id used, and the watchdog result.

10. Vector / Qdrant boundary

  • iu_core.vector_sync_enabled is NEVER-FLIP and stays false. No embedding writes, no Qdrant sync in the pilot. iu_vector_sync_point and iu_qdrant_collection_registry are read-only references only.

11. Event / queue boundary

  • piece_event_runtime.emit_enabled stays closed except during a §5-bounded emit test. delivery_enabled, queue.job_substrate.enabled, queue.dlq.replay_enabled stay closed.
  • 15 iu_outbound_route rows exist; all real routes are dry_run=true / allowlist-gated. Do not enable real delivery in the pilot.
  • 3 iu_sql_link rows, all enabled=false. enabled=false suppresses delivery/capture only — validate/resolve still read OK. Do not enable links in the pilot.

13. Incident / DLQ handling

  • iu_route_dead_letter = 0 (clean). If a DLQ row appears: triage via fn_iu_route_dead_letter_replay(p_dead_letter_id) (replay is gated by the master routes gate, NOT by queue.dlq.replay_enabled). DRY-RUN first; record the dead_letter_id and outcome.
  • An incident ≠ a problem ≠ a change; ack ≠ resolution; never close without a verified fix.

14. Daily / weekly health checks

Every session (daily):

  1. SELECT current_database(), current_user, inet_server_addr(); → confirm directus / correct role / 172.19.0.3.
  2. SELECT * FROM fn_iu_gate_verify_closed();all_safe=true, never_flip_intact=true.
  3. fn_iu_gate_watchdog('operator') → force-close stragglers.
  4. Snapshot counts: information_unit, iu_relation, iu_route_dead_letter, dot_iu_command_run, iu_gate_transition.
  5. Confirm iu_gate_transition=0 open and iu_route_dead_letter=0.

Weekly:

  • Reconcile counts vs last week; review iu_lifecycle_log for unexpected retire/supersede; review dot_iu_command_run for unknown actors; confirm both never-flip keys still false.

15. Stop conditions (halt the pilot immediately)

  • Either never-flip key reads true.
  • A gate is open with no matching approval_id, or open longer than its TTL with watchdog not clearing it.
  • A production review_decision was minted by an agent (not human/council/sovereign).
  • iu_route_dead_letter grows and replay does not resolve it.
  • Counts drift in ways not explained by logged DOT commands.
  • Any direct INSERT/DELETE on information_unit succeeded outside the canonical writers.

16. Escalation path

Operator-agent → human operator → architecture council (GOV-COUNCIL) → sovereign. P-pub stage changes, production review_decision minting, gate approval issuance, and any never-flip discussion escalate to council/sovereign.

17. Pilot success criteria

  • ≥ N governed create/edit/enact/split/merge/retire cycles complete with test-scoped decisions, each fully audited and reversible.
  • Zero never-flip violations; zero unexplained count drift; DLQ stays 0 or every DLQ row is resolved.
  • All gated ops used the §5 protocol; watchdog never had to force-close an abandoned gate during an op.
  • Evidence package per session is complete (§9).
  • Exit readiness: the pilot demonstrates the system can safely run governed writes, justifying promotion to full production once P-pub backfill (doc 05), production review_decision (doc 06), and human-org-role law (doc 07) land.
Back to Knowledge Hub knowledge/dev/reports/architecture/iu-limited-pilot-cr-kg-design-recon-authority-megabundle-2026-05-28/01-iu-limited-production-pilot-operating-protocol.md