KB-42CB

Automation Orchestrator Design · 02 End-to-End State Machine

13 min read Revision 1
dot-iu-cutterv0.5automation-orchestrator-designend-to-end-state-machineg2-passfail-closedresume-safedieu442026-05-20

Automation Orchestrator Design · 02 End-to-End State Machine

doc 2 of 7 · 2026-05-20 · design-only macro

phase                : G2 — state machine + persistence + resume
outcome              : G2 PASS — 14 states, 3 sovereign-gate transitions,
                       single-file JSON sidecar persistence
production_mutation  : NONE

1. State enumeration

states:
  - pending                        # created; nothing run
  - source_pinned                  # manifest digest + region sha captured
  - marked                         # MARK phase done; region rehash matched
  - cutplan_ok                     # cut-plan rebuild reproduces writer_digest
  - pre_write_backup_taken         # GPG backup persisted + sidecar sha pinned
  - grants_probed                  # GRANT/REVOKE state surveyed; no apply
  - awaiting_cut_authorization     # SOVEREIGN GATE 1 (paused)
  - cut_leg_a_committed            # IU + UV + anchor rows written
  - structural_verified            # 11-bool probe PASS, cardinality matches
  - leg_b_recorded                 # cutter_governance rows persisted
  - write_verified                 # verify_result + verifier signature persisted
  - awaiting_lifecycle_authorization  # SOVEREIGN GATE 2 (paused)
  - lifecycle_enacted              # fn_iu_enact called for every IU
  - closeout_reported              # final KB report uploaded

terminal_states:
  - failed_<gate>                  # any internal gate raised → terminal
  - stopped_<reason>               # sovereign STOP route → resumable
  - voided_<reason>                # sovereign-voided run (idempotency cleared)

2. Transition table

Each non-terminal state has exactly one forward transition. Naming: internal_gate(IG_*) auto-passes (with KB receipt); sovereign_gate(SG_*) pauses and waits.

transitions:
  pending → source_pinned                          : IG_source_pin
  source_pinned → marked                            : IG_mark
  marked → cutplan_ok                               : IG_cutplan
  cutplan_ok → pre_write_backup_taken               : IG_backup
  pre_write_backup_taken → grants_probed            : IG_grant_probe
  grants_probed → awaiting_cut_authorization        : SG_cut_authz_request
  awaiting_cut_authorization → cut_leg_a_committed  : SG_cut_authz_received
                                                     + IG_cut_leg_a_execute
  cut_leg_a_committed → structural_verified         : IG_structural_verify
  structural_verified → leg_b_recorded              : IG_leg_b_record
  leg_b_recorded → write_verified                   : IG_write_verify
  write_verified → awaiting_lifecycle_authorization : SG_lifecycle_authz_request
  awaiting_lifecycle_authorization → lifecycle_enacted
                                                    : SG_lifecycle_authz_received
                                                     + IG_lifecycle_enact_execute
  lifecycle_enacted → closeout_reported             : IG_closeout
  closeout_reported → (terminal: success)           : —

failure_transitions:
  <any-state> + invariant_violation → failed_<gate>     : (terminal)
  <any-state> + sovereign_stop → stopped_<reason>       : (resumable iff sovereign re-approves)
  <any-state> + drift_detected → stopped_drift          : (resumable iff drift sovereign-acknowledged)

3. Sovereign gates — full enumeration

SG_1_cut_authz:
  state_in   : grants_probed
  state_out  : cut_leg_a_committed
  request    : orchestrator uploads "cut-authorization-request" KB doc
               with: doc_id, source_version_id, writer_digest, manifest_digest,
                     region_sha, candidate_count, backup_sha, grants_probe_snapshot,
                     run_id, sidecar_path
  response   : sovereign uploads a sibling KB doc with explicit allowance,
               returns its KB id; user resumes via
                 cutter orchestrate resume --run-id <id> --approval-kb-id <path>
  refusal    : orchestrator refuses to continue without a valid approval doc id
               (cryptographic signing is optional in v0.6 — see DQ_2)

SG_2_lifecycle_authz:
  state_in   : write_verified
  state_out  : lifecycle_enacted
  request    : orchestrator uploads "lifecycle-enact-authorization-request" doc
               + creates a *new* row in cutter_governance.review_decision
               (review_decision_id captured into run sidecar)
  response   : sovereign signs the request KB doc; orchestrator validates the
               UUID is present, type='lifecycle_enactment', actor matches
  refusal    : refuses if review_decision_id is reused from a prior run

SG_3_failure_escalation:
  state_in   : any non-terminal
  state_out  : failed_<gate> | stopped_<reason>
  trigger    : invariant violation OR drift OR uncaught exception
  produces   : KB "stop-route-report" with full sidecar dump + restart command
  policy     : NEVER silent-retry. NEVER skip a gate. NEVER pull authority
               from a prior run's approval.

4. Internal gates — invariant checklist per gate

Each internal gate raises OrchestratorGateFail(<gate>, <reason>) on any of these conditions (refusal is fail-closed):

IG_source_pin:
  - source_document row exists
  - latest source_version row exists
  - manifest digest recomputed matches source_version.manifest_digest if pinned;
    if not pinned, records the freshly computed digest into RunContext

IG_mark:
  - region_sha rebuilt matches recorded region_sha
  - mark rowset size > 0 and ≤ MAX_CANDIDATES_PER_DOC (deployment limit, e.g. 1000)

IG_cutplan:
  - writer_digest stable across two independent rebuilds (replay determinism)
  - exactly-N rows produced where N = mark rowset size
  - publication_type / unit_kind / section_type vocab map exists for every row
  - idempotency_key_set is distinct (cardinality == N)

IG_backup:
  - backup target reachable
  - GPG public-key fingerprint matches deployment-pinned fpr
  - backup .gpg sha256 recorded
  - sidecar JSON written with backup_sha + size_bytes + timestamp

IG_grant_probe:
  - cutter_exec has EXECUTE on fn_iu_create + fn_iu_apply_edit_draft + fn_iu_enact
  - cutter_verify has SELECT + INSERT on cutter_governance.verify_result
  - directus has SELECT on cutter_governance.review_decision (for SECDEF probe)
  - NO accidental PUBLIC EXECUTE on any cutter_governance writer

IG_cut_leg_a_execute:
  - approval-kb-id resolves to a sovereign-signed doc
  - approval doc explicitly authorizes (doc_id, writer_digest, change_set_id-pending)
  - exactly-N successful fn_iu_create invocations
  - transaction COMMITTED (no silent autocommit reset)
  - 0 IUs with lifecycle_status != 'draft' after write (per v0.5 lesson)

IG_structural_verify:
  - 11-bool probe matches OD-W2..OD-W9 cardinality assertions
  - section_type_cardinality matches the cutplan output
  - dieu_44_intrusion (or per-doc forbidden-id intrusion) == 0

IG_leg_b_record:
  - change_set_id present and unique
  - manifest_envelope_id + executor_signature_id present
  - G-LEG-B-ONCE: count(rows for change_set_id) == expected, no duplicates
  - lane_overlap_invariants assertion PASS

IG_write_verify:
  - VerifyRecorder runs with cutter_verify principal
  - verify_result row inserted; verifier dot_pair_signature row inserted
  - G-VERIFY-ONCE: only one verify_result per change_set_id

IG_lifecycle_enact_execute:
  - review_decision_id matches the SG_2 approval doc
  - fn_iu_enact called exactly N times in a single txn (per Phase 7 doctrine)
  - all N return status='enacted'; 0 partial
  - iu_lifecycle_log row count == N
  - trigger trg_iu_enacted_immut + trg_uv_enacted_immut both enabled 'O'

IG_closeout:
  - all KB docs for this run uploaded
  - sidecar finalized (status: success)
  - run_id appended to a global runs index (one line per run, KB-tracked)

5. Persistence — JSON sidecar shape

Chosen per DQ_10 = KB + filesystem JSON sidecar (no new DB table; revisit in v0.7 only if backlog grows).

sidecar_root          : ${CUTTER_RUNS_DIR:-/var/lib/cutter/runs}
sidecar_path          : ${sidecar_root}/<run_id>/state.json
sidecar_lock          : ${sidecar_root}/<run_id>/state.lock      # POSIX fcntl
backup_blob_path      : ${sidecar_root}/<run_id>/pre_write.gpg   # encrypted
phase_report_paths    : ${sidecar_root}/<run_id>/phase-<NN>.md   # mirrored to KB

run_id_format         : "ictr-<YYYYMMDDTHHMMSSZ>-<uuid7>"        # human-sortable

Sidecar JSON schema (top-level):

run_id                : string  (== filename parent)
created_utc           : ISO8601
created_by            : string  (actor)
document_id           : string  (e.g. "ICX-LAW-2026-001")
source_document_id    : uuid
source_version_id     : uuid
mode                  : "dryrun" | "live"
state                 : <one of §1>
phases                : map<phase_name, PhaseRecord>
sovereign_approvals   : list<ApprovalRecord>
idempotency_keys      : map<phase_name, opaque_string>
context_pins          : map<key, value>   # manifest_digest, region_sha, writer_digest, etc.
last_error            : optional<ErrorRecord>
schema_version        : 1

PhaseRecord:

phase                 : enum
started_utc           : ISO8601
finished_utc          : optional<ISO8601>
result                : "passed" | "failed" | "running"
gate_invariants       : map<key, boolean>
kb_doc_id             : optional<string>   # KB path uploaded for this phase
artifacts             : list<{path, sha256}>

ApprovalRecord:

gate                  : "SG_cut_authz" | "SG_lifecycle_authz"
requested_utc         : ISO8601
approval_kb_id        : string             # the KB doc id sovereign uploaded
review_decision_id    : optional<uuid>     # for SG_2
validated_utc         : ISO8601

6. Phase soft-caps + run hard-cap

Per the v0.5 lesson "macro tasks default 45–60 minutes" the orchestrator enforces:

phase_soft_cap_minutes:
  source_pin            : 2
  mark                  : 5
  cutplan               : 10
  backup                : 5
  grant_probe           : 1
  cut_leg_a             : 10
  structural_verify     : 2
  leg_b_record          : 5
  write_verify          : 5
  lifecycle_enact       : 15
  closeout              : 5
run_hard_cap_minutes    : 60

over_cap_action         : STOP_AND_ESCALATE (sovereign sees the partial run sidecar)

7. Resume — algorithm

1. Read sidecar JSON (or fail STOP_RUN_NOT_FOUND).
2. Acquire fcntl lock on state.lock (or fail STOP_RUN_BUSY).
3. Inspect `state`:
   - if terminal success → exit 0 (idempotent).
   - if terminal failure → exit non-zero (operator must void or amend).
   - if sovereign-pending and --approval-kb-id provided → validate, advance.
   - if mid-phase (state == running on disk) → drift-revalidate the
     last completed gate's invariants against live DB. If still hold,
     re-enter the next phase. If not, STOP_DRIFT.
4. Continue forward execution from the next state.
5. Release lock on exit.

Resume is idempotent: re-running with the same --run-id and unchanged live state is a no-op.

8. Drift detection (cross-cutting)

Before every internal gate, the orchestrator re-survey:

  • live row count for canonical_address LIKE <prefix>%
  • live md5(prosrc) of fn_iu_create / fn_iu_enact / fn_iu_apply_edit_draft
  • live cardinality of cutter_governance.review_decision rows for the run's review_decision_id

against the corresponding context_pin in the sidecar. Any mismatch → STOP_DRIFT_<dim> with a diff payload.

This is the same "drift policy" the lifecycle DDL fingerprints.yaml header declares:

"If live md5 differs from these pins, STOP and route to sovereign — do not silently patch the repo to match."

9. Verdict

g2_outcome                  : PASS
states_total                : 14 (12 non-terminal + 2 terminal categories)
sovereign_gates             : 3 (SG_1, SG_2, SG_3)
internal_gates              : 11 (IG_*)
sidecar_persistence         : JSON + lock file (no new DB table for v0.6)
resume_safety               : invariant re-validation before every continuation
drift_policy                : STOP_DRIFT before any write
phase_soft_cap_total        : 65 min  → 60 min hard-cap enforced
Back to Knowledge Hub knowledge/dev/laws/dieu44-trien-khai/v0.5-automation-orchestrator-design/02-end-to-end-state-machine-2026-05-20.md