IU Core 1k — 03 OperatorRuntime concurrency lease (F-960x-1 closed)
03 — OperatorRuntime concurrency lease (finding F-960x-1 closed)
1. The finding
960x recorded F-960x-1: the OperatorRuntime gate window — open the
dot_config gates -> apply -> close them in a finally — is NOT
concurrency-safe. Two proof-driver instances racing on the toggles caused
one instance's finally to shut the gates mid-flight in the other, leaving
a compensation refused. The 960x mitigation was "run a single instance" —
a rule that depends on operator memory, which the constitution forbids as a
safety mechanism.
2. Why a session advisory lock does not work
The psql_executor opens a new connection per statement (each _exec
call is one ssh ... psql process = one autocommit transaction). A
session-scoped pg_advisory_lock is released the instant that connection
closes — it cannot span the multi-statement open->apply->close window. A
transaction-scoped lock cannot either: the window is many transactions.
3. The fix — a durable single-run lease
Migration 019 adds dot_iu_runtime_lease + two governed functions:
fn_dot_iu_runtime_lease_acquire(lease_name, holder, ttl_seconds)— atomic acquire viaINSERT ... ON CONFLICT (lease_name) DO UPDATE ... WHERE expires_at < now() OR lease_holder = EXCLUDED.lease_holder. The conflicting row is row-locked for the update, so two concurrent acquirers serialise; the loser'sDO UPDATE WHEREis false, no row is returned, the function returnsNULL= "held — do not run". A TTL makes a crashed holder self-healing.fn_dot_iu_runtime_lease_release(lease_name, holder, token)— holder + token guardedDELETE; a foreign token cannot steal the lease.
operator_runtime.RuntimeLease wraps the pair as a context manager — the
gate window goes INSIDE it. A second instance's acquire raises
RuntimeLeaseError — it can never open the gates underneath the first.
4. sandbox/180 — concurrency-lease probe, BEGIN ... ROLLBACK — 7/7
T1 holder A acquires -> a non-null token; T2 holder B acquires the SAME lease -> NULL (A holds it); T3 holder A re-acquires re-entrantly -> a token; T4 release with the WRONG token -> false; T5 release with the correct holder+token -> true, row gone; T6 after release B can acquire -> a token; T7 an EXPIRED lease is reclaimable by a new holder. T2 is the F-960x-1 proof: two instances cannot both hold the lease. F-960x-1 closed.
5. OperatorRuntime command coverage
The 17 dot_iu_* commands remain _assert_governed-clean. RuntimeLease
is unit-proven (acquire/release cycle, second-holder refusal,
context-manager release, harmless double-release, holder-required
validation). The operator-runtime gate stays inert.
6. Rollback / disable
rollback/019 drops the lease table + both functions. The lease is
self-disabling: it carries a TTL, and dot_iu_runtime_lease is empty at
end-state (0 rows).