T1 FIX7 Design Verification - 10 TOCTOU Control Epoch (SUPERTRACK J)
10 — TOCTOU / control_epoch Review (SUPERTRACK J)
Design statement (02): "Writer and activation serialize on one hash-bound control_epoch: writer holds shared transaction lock; activation takes exclusive lock and increments epoch, closing TOCTOU."
| # | Question | Design answer | Verdict |
|---|---|---|---|
| J.1 | What increments control_epoch? | manifest activation (exclusive lock + increment) | INTENT-OK; is epoch incremented for every control mutation (owner/ACL/manifest) or only manifest activation? ambiguous |
| J.2 | What does the writer read at start? | holds a shared lock; reads the epoch (implied) | INTENT-OK; the read-and-pin step not concretely specified |
| J.3 | What does the writer check before commit? | epoch must be unchanged / hash-bound (implied by "hash-bound") | NEEDS_CLARIFICATION — the pre-commit predicate ("epoch read at start == epoch at commit, else abort") is not written |
| J.4 | What does activation read/write? | exclusive lock; binds old/new hashes; increments epoch; read-back (02) |
INTENT-OK; envelope schema not specified |
| J.5 | How does a stale epoch cause fail? | shared/exclusive lock ordering should force the writer to either complete before activation or see the new epoch | NEEDS_CLARIFICATION — depends on J.3 predicate, which is unspecified |
| J.6 | How can activation and writer not race? | shared (writer) vs exclusive (activation) lock on one epoch object | INTENT-OK if the lock is held across the writer's whole read→decide→commit; that scope is not stated |
| J.7 | How is post-activation fresh evidence required? | "makes readiness fail until fresh evidence binds the new set" (00); "fresh post-activation evidence" |
INTENT-OK; the freshness window (epoch-stamped? time-stamped?) not specified |
Analysis
The shared/exclusive lock on a single hash-bound epoch is the correct concurrency primitive and directly answers the writer/activation TOCTOU that motivated FIX7. The design's intent — a writer cannot commit a decision based on a manifest set that activation has since replaced, because activation's exclusive lock + epoch increment forces the writer to either finish first or observe the change — is sound.
But the realization is unspecified, and the details are where TOCTOU bugs live:
- Is the lock a PostgreSQL advisory lock (which lock id?), a row lock on an epoch table, or a table-level lock? Advisory vs row locks have different auto-release-on-commit semantics.
- Does the writer hold the shared lock for the entire transaction (read manifest → evaluate readiness → write), or only while reading? If released early, the TOCTOU reopens.
- The pre-commit check (J.3) — re-reading the epoch and aborting if it changed — is the actual TOCTOU closure and is not written.
- "Fresh post-activation evidence" (J.7) must be epoch-bound (evidence stamped with the epoch it was produced under, rejected if epoch advanced), otherwise stale evidence can satisfy a new set.
If T1 implements these by guessing, two reasonable engineers could produce a design that still races (e.g. advisory lock released between read and commit). This is a semantic decision, not surgical drift.
TOCTOU verdict
No writer/activation race in the design intent — the shared/exclusive hash-bound epoch is the correct primitive. But the lock object, the lock-hold scope, the writer pre-commit staleness predicate, and the epoch-binding of post-activation evidence are unspecified. → INTENT-OK + NEEDS_CLARIFICATION. Required correction: a concrete concurrency spec (lock type + id, hold scope = whole writer txn, pre-commit epoch_start == epoch_now else abort, evidence epoch-stamping).