KB-97FF

IU Core 1k — 03 OperatorRuntime concurrency lease (F-960x-1 closed)

3 min read Revision 1
iu-core1koperator-runtimeconcurrencyleaseF-960x-1dot-iu-cutter

03 — OperatorRuntime concurrency lease (finding F-960x-1 closed)

1. The finding

960x recorded F-960x-1: the OperatorRuntime gate window — open the dot_config gates -> apply -> close them in a finally — is NOT concurrency-safe. Two proof-driver instances racing on the toggles caused one instance's finally to shut the gates mid-flight in the other, leaving a compensation refused. The 960x mitigation was "run a single instance" — a rule that depends on operator memory, which the constitution forbids as a safety mechanism.

2. Why a session advisory lock does not work

The psql_executor opens a new connection per statement (each _exec call is one ssh ... psql process = one autocommit transaction). A session-scoped pg_advisory_lock is released the instant that connection closes — it cannot span the multi-statement open->apply->close window. A transaction-scoped lock cannot either: the window is many transactions.

3. The fix — a durable single-run lease

Migration 019 adds dot_iu_runtime_lease + two governed functions:

  • fn_dot_iu_runtime_lease_acquire(lease_name, holder, ttl_seconds) — atomic acquire via INSERT ... ON CONFLICT (lease_name) DO UPDATE ... WHERE expires_at < now() OR lease_holder = EXCLUDED.lease_holder. The conflicting row is row-locked for the update, so two concurrent acquirers serialise; the loser's DO UPDATE WHERE is false, no row is returned, the function returns NULL = "held — do not run". A TTL makes a crashed holder self-healing.
  • fn_dot_iu_runtime_lease_release(lease_name, holder, token) — holder + token guarded DELETE; a foreign token cannot steal the lease.

operator_runtime.RuntimeLease wraps the pair as a context manager — the gate window goes INSIDE it. A second instance's acquire raises RuntimeLeaseError — it can never open the gates underneath the first.

4. sandbox/180 — concurrency-lease probe, BEGIN ... ROLLBACK — 7/7

T1 holder A acquires -> a non-null token; T2 holder B acquires the SAME lease -> NULL (A holds it); T3 holder A re-acquires re-entrantly -> a token; T4 release with the WRONG token -> false; T5 release with the correct holder+token -> true, row gone; T6 after release B can acquire -> a token; T7 an EXPIRED lease is reclaimable by a new holder. T2 is the F-960x-1 proof: two instances cannot both hold the lease. F-960x-1 closed.

5. OperatorRuntime command coverage

The 17 dot_iu_* commands remain _assert_governed-clean. RuntimeLease is unit-proven (acquire/release cycle, second-holder refusal, context-manager release, harmless double-release, holder-required validation). The operator-runtime gate stays inert.

6. Rollback / disable

rollback/019 drops the lease table + both functions. The lease is self-disabling: it carries a TTL, and dot_iu_runtime_lease is empty at end-state (0 rows).

Back to Knowledge Hub knowledge/dev/laws/dieu44-trien-khai/v0.6-iu-core-1k-vector-operator-ui-delivery-acceptance-open-goal/03-operator-runtime-concurrency-lease.md