KB-2765

T1 FIX7 Design Verification - 10 TOCTOU Control Epoch (SUPERTRACK J)

4 min read Revision 1
QT001FIX7T1toctoucontrol-epochsupertrack-j

10 — TOCTOU / control_epoch Review (SUPERTRACK J)

Design statement (02): "Writer and activation serialize on one hash-bound control_epoch: writer holds shared transaction lock; activation takes exclusive lock and increments epoch, closing TOCTOU."

# Question Design answer Verdict
J.1 What increments control_epoch? manifest activation (exclusive lock + increment) INTENT-OK; is epoch incremented for every control mutation (owner/ACL/manifest) or only manifest activation? ambiguous
J.2 What does the writer read at start? holds a shared lock; reads the epoch (implied) INTENT-OK; the read-and-pin step not concretely specified
J.3 What does the writer check before commit? epoch must be unchanged / hash-bound (implied by "hash-bound") NEEDS_CLARIFICATION — the pre-commit predicate ("epoch read at start == epoch at commit, else abort") is not written
J.4 What does activation read/write? exclusive lock; binds old/new hashes; increments epoch; read-back (02) INTENT-OK; envelope schema not specified
J.5 How does a stale epoch cause fail? shared/exclusive lock ordering should force the writer to either complete before activation or see the new epoch NEEDS_CLARIFICATION — depends on J.3 predicate, which is unspecified
J.6 How can activation and writer not race? shared (writer) vs exclusive (activation) lock on one epoch object INTENT-OK if the lock is held across the writer's whole read→decide→commit; that scope is not stated
J.7 How is post-activation fresh evidence required? "makes readiness fail until fresh evidence binds the new set" (00); "fresh post-activation evidence" INTENT-OK; the freshness window (epoch-stamped? time-stamped?) not specified

Analysis

The shared/exclusive lock on a single hash-bound epoch is the correct concurrency primitive and directly answers the writer/activation TOCTOU that motivated FIX7. The design's intent — a writer cannot commit a decision based on a manifest set that activation has since replaced, because activation's exclusive lock + epoch increment forces the writer to either finish first or observe the change — is sound.

But the realization is unspecified, and the details are where TOCTOU bugs live:

  • Is the lock a PostgreSQL advisory lock (which lock id?), a row lock on an epoch table, or a table-level lock? Advisory vs row locks have different auto-release-on-commit semantics.
  • Does the writer hold the shared lock for the entire transaction (read manifest → evaluate readiness → write), or only while reading? If released early, the TOCTOU reopens.
  • The pre-commit check (J.3) — re-reading the epoch and aborting if it changed — is the actual TOCTOU closure and is not written.
  • "Fresh post-activation evidence" (J.7) must be epoch-bound (evidence stamped with the epoch it was produced under, rejected if epoch advanced), otherwise stale evidence can satisfy a new set.

If T1 implements these by guessing, two reasonable engineers could produce a design that still races (e.g. advisory lock released between read and commit). This is a semantic decision, not surgical drift.

TOCTOU verdict

No writer/activation race in the design intent — the shared/exclusive hash-bound epoch is the correct primitive. But the lock object, the lock-hold scope, the writer pre-commit staleness predicate, and the epoch-binding of post-activation evidence are unspecified. → INTENT-OK + NEEDS_CLARIFICATION. Required correction: a concrete concurrency spec (lock type + id, hold scope = whole writer txn, pre-commit epoch_start == epoch_now else abort, evidence epoch-stamping).

Back to Knowledge Hub knowledge/dev/reports/architecture/t1-fix7-design-deep-verification-before-implementation-2026-06-07/10-toctou-control-epoch-review.md