KB-2765

T1 FIX7 Design Verification - 10 TOCTOU Control Epoch (SUPERTRACK J)

4 min read Revision 1

QT001FIX7T1toctoucontrol-epochsupertrack-j

10 — TOCTOU / control_epoch Review (SUPERTRACK J)

Design statement (02): "Writer and activation serialize on one hash-bound control_epoch: writer holds shared transaction lock; activation takes exclusive lock and increments epoch, closing TOCTOU."

#	Question	Design answer	Verdict
J.1	What increments control_epoch?	manifest activation (exclusive lock + increment)	INTENT-OK; is epoch incremented for every control mutation (owner/ACL/manifest) or only manifest activation? ambiguous
J.2	What does the writer read at start?	holds a shared lock; reads the epoch (implied)	INTENT-OK; the read-and-pin step not concretely specified
J.3	What does the writer check before commit?	epoch must be unchanged / hash-bound (implied by "hash-bound")	NEEDS_CLARIFICATION — the pre-commit predicate ("epoch read at start == epoch at commit, else abort") is not written
J.4	What does activation read/write?	exclusive lock; binds old/new hashes; increments epoch; read-back (`02`)	INTENT-OK; envelope schema not specified
J.5	How does a stale epoch cause fail?	shared/exclusive lock ordering should force the writer to either complete before activation or see the new epoch	NEEDS_CLARIFICATION — depends on J.3 predicate, which is unspecified
J.6	How can activation and writer not race?	shared (writer) vs exclusive (activation) lock on one epoch object	INTENT-OK if the lock is held across the writer's whole read→decide→commit; that scope is not stated
J.7	How is post-activation fresh evidence required?	"makes readiness fail until fresh evidence binds the new set" (`00`); "fresh post-activation evidence"	INTENT-OK; the freshness window (epoch-stamped? time-stamped?) not specified

Analysis

The shared/exclusive lock on a single hash-bound epoch is the correct concurrency primitive and directly answers the writer/activation TOCTOU that motivated FIX7. The design's intent — a writer cannot commit a decision based on a manifest set that activation has since replaced, because activation's exclusive lock + epoch increment forces the writer to either finish first or observe the change — is sound.

But the realization is unspecified, and the details are where TOCTOU bugs live:

Is the lock a PostgreSQL advisory lock (which lock id?), a row lock on an epoch table, or a table-level lock? Advisory vs row locks have different auto-release-on-commit semantics.
Does the writer hold the shared lock for the entire transaction (read manifest → evaluate readiness → write), or only while reading? If released early, the TOCTOU reopens.
The pre-commit check (J.3) — re-reading the epoch and aborting if it changed — is the actual TOCTOU closure and is not written.
"Fresh post-activation evidence" (J.7) must be epoch-bound (evidence stamped with the epoch it was produced under, rejected if epoch advanced), otherwise stale evidence can satisfy a new set.

If T1 implements these by guessing, two reasonable engineers could produce a design that still races (e.g. advisory lock released between read and commit). This is a semantic decision, not surgical drift.

TOCTOU verdict

No writer/activation race in the design intent — the shared/exclusive hash-bound epoch is the correct primitive. But the lock object, the lock-hold scope, the writer pre-commit staleness predicate, and the epoch-binding of post-activation evidence are unspecified. → INTENT-OK + NEEDS_CLARIFICATION. Required correction: a concrete concurrency spec (lock type + id, hold scope = whole writer txn, pre-commit epoch_start == epoch_now else abort, evidence epoch-stamping).