Codex Review — Gap-only Spec, FIX7 Pilot, MVP Readiness

Date: 2026-06-09
Nature: narrow adversarial design checkpoint
Production mutation: NO
Final verdict: BLOCKED_BY_AUTHORITY_OR_ARTICLE14_RISK

1. Final verdict

BLOCKED_BY_AUTHORITY_OR_ARTICLE14_RISK

The packet preserves the intended no-run/no-production-mutation boundary in prose and correctly reuses many named PG/read surfaces. It is not sealed and the MVP implementation may not start.

The decisive defect is Article 14: the design equates “a referenced evidence artifact resolves” with sufficient read-level evidence. A resolvable artifact can itself be prose-only, unrelated, self-referential, stale, too narrow, or contradicted by the actual executable artifact. Therefore the current READ_REPORT_PASS can be emitted while the load-bearing executable claim remains unproven.

This is exactly the Recheck-8 risk. Recheck-8 had resolvable documents and a selftest claim, but the declared .py SSOT did not exist and the exact invocation exited 2. The proposed no-run pilot cannot detect that failure merely by resolving evidence-document references. It can catch missing/ambiguous references, not the full Recheck-8 / Article-14 class.

2. Gate table

Gate	Verdict	Evidence inspected	Issue found	Required correction
1 — Authority Contract compliance	`PARTIAL`	Authority Contract, Gap-only Spec, JSON mirror, MVP plan	No-run/no-mutation/no-new-authority boundaries are stated. However the spec calls a contract with status `READY_FOR_GPT_REVIEW` “binding”; static counts and fixed denominator expectations appear in machine/design contracts; implementation authority for claim discovery/evidence binding is undefined.	Normalize the Authority Contract to one unambiguous adopted status. Remove literal current counts from normative/runtime acceptance logic. Define claim/evidence binding authority before build.
2 — Constitution / Article 14	`FAIL`	Spec §§6–9, pilot §§2–8, acceptance tests 4–8/13/20, Recheck-8 evidence	`READ_REPORT_PASS` needs only resolvable references, not evidence-to-claim validity, scope coverage, independence, or executable existence. Pilot incorrectly claims sufficiency for Recheck-8.	Make PASS unavailable in v0.1; add explicit `ARTICLE14_NOT_PROVEN`/`EXECUTION_UNVERIFIED`; require structural claim↔evidence binding; narrow pilot claim to missing/ambiguous-evidence detection only.
3 — PG-first / native / driven	`PARTIAL`	Named read surfaces, denominator rules, adapters, MVP modules	Reads consume existing PG surfaces, but policy/control is a fixed file design: static surface lists, claim kinds, adapter list, C1–C7 rules, query expectations, and numeric acceptance examples. No PG-native/data-driven authority controls these rules.	Separate policy from code; consume governed metadata/contract entries or mark coverage incomplete. Enforce read-only capability through read-only role/transaction and absent execution/write capabilities, not module-name assertions.
4 — Hardcode / fake-green	`FAIL`	Spec JSON, acceptance matrix, MVP gates/exit semantics, index readiness language	Hardcoded current counts are repeated as machine inputs/expected outputs. G2 requires `>=2` denominators regardless of dossier relevance. G7/test 11 compare literal 41/4; test 10 expects literal 219/102. Exit `0` is allowed for `FLAG`, enabling downstream fake-green. Index says `PROGRAM_MACRO_READY` and “no engineering omissions remain.”	Replace numeric expectations with role/query/provenance comparisons; require all relevant discovered denominators, not `>=2`; FLAG/BLOCKED must be machine-distinguishable and cannot share green exit semantics; remove over-strong readiness language.
5 — FIX7 read/report pilot sufficiency	`FAIL`	Pilot C1–C7, Fixture A/B, Recheck-8 report	Fixture B only removes references. It does not test the decisive case: evidence documents resolve while the declared executable is missing/non-runnable or evidence scope is insufficient.	Add a non-executing counter-fixture with resolvable but insufficient/contradictory evidence. Expected outcome must be `ARTICLE14_NOT_PROVEN`, not `EVIDENCE_PRESENT` or PASS. State that full Recheck-8 detection requires the future execution contract.
6 — MVP build readiness	`FAIL`	MVP phases, validation gates, manual gates, negative tests, exit semantics	Claim extraction completeness, evidence binding, capability enforcement, and anti-fake-green outcome semantics are not specified sufficiently for deterministic implementation.	Patch spec/pilot/plan/tests first; no MVP code until re-review seals them.

3. Hardcode / fake-green findings

H1 — Current observations are embedded as normative machine inputs

The machine spec and module/acceptance wording repeatedly embed 309, 214, 186, 163, 54, 41/4, 128/36, 219/102, and graph counts. The documents say these are not invariants, but acceptance rules still require literal examples such as “canonical (4) + diagnostic (41) both shown” and “219/102 shown separately.” This is disguised hardcode because a correct future runtime value would conflict with the fixed expected wording.

Required: counts may appear only as dated examples/evidence. Normative checks compare surface role, query provenance, match key, population, observation timestamp, and separation behavior, never literal values.

H2 — `>=2 denominators` is a fabricated invariant

MVP gate G2 requires at least two denominators. A dossier may legitimately involve one relevant denominator; another may involve more than seven. This rule encourages adding irrelevant counts to appear compliant and creates false failures when only one denominator is relevant.

Required: enumerate all denominators relevant to the inspected claims/surfaces and prove that none were collapsed. No numeric minimum or fixed maximum.

H3 — Static claim kinds and C1–C7 are treated as complete coverage

The fixed claim-kind set and FIX7 reason list are useful pilot taxonomy, but no authority or extension mechanism proves they cover all load-bearing prose claims. A new phrase or claim type can be missed and the dossier can still pass.

Required: either consume a governed claim declaration/binding contract, or emit claim_inventory_completeness=UNVERIFIED and prohibit a positive dossier verdict. Free-form prose extraction may be advisory only.

H4 — Exit `0` for `FLAG` is fake-green

The MVP plan permits exit 0 for PASS, FLAG, and NOT_APPLICABLE. Downstream automation or an operator can treat a flagged inspection as green without reading the report.

Required: machine outcome must distinguish clean completion from policy acceptance. FLAG/BLOCKED must never be consumable as green; define non-zero policy outcome or a mandatory separate machine gate that fails closed.

H5 — Module-name assertions are not enforcement

“No module named runner/dispatcher” and “only one write target” do not prevent hidden subprocess, shell, network, DB-write, or filesystem-write capability inside another module.

Required: implementation design must enforce capabilities: read-only PG role/transaction; no Directus write credential; no shell/subprocess/command capability; no filesystem mutation capability; report output restricted to the approved KB path; negative capability tests.

H6 — Over-strong readiness language

PROGRAM_MACRO_READY, “no engineering omissions remain,” and the pilot’s “fully designed/sufficient” claims are stronger than the evidence.

Required: downgrade to NEEDS_T1_FIX until Article-14 and hardcode blockers are closed.

4. Article 14 findings

Article 14 verdict: FAIL — STRUCTURAL FAKE-GREEN REMAINS

The design does prevent one narrow class: a claim with no referenced artifact is flagged. It does not structurally prevent prose-only/fake evidence from producing a positive dossier result.

Missing structural bindings include:

stable claim identity;
evidence identity and immutable revision/content binding;
evidence kind required by claim kind;
subject/command/artifact identity binding;
scope/coverage binding;
producer and observation time;
independence/non-self-reference;
contradiction detection across evidence;
distinction between “artifact exists” and “artifact proves this claim.”

is_proof_of_run:false is honest but insufficient. A verdict named READ_REPORT_PASS still communicates green and is allowed when executable claims remain unverified.

Required Article-14 wording

For any dossier containing executable/run/selftest/hash/exit-code claims, v0.1 may report only structural evidence presence and must emit an explicit overall status equivalent to:

ARTICLE14_NOT_PROVEN_EXECUTION_UNVERIFIED

A positive PASS verdict is unavailable until a separately sealed execution/evidence-binding contract proves the load-bearing claim.

5. Parallel-authority findings

Risk	Verdict	Reason
New runner authority	`NO CURRENT CREATION / ENFORCEMENT GAP`	No runner is designed, but absence is asserted by module names rather than capability controls.
New logger authority	`NO`	KB report output is within adopted file-report-only boundary; no `system_issues` write.
New registry authority	`RISK`	File-only fixed surface/denominator policy can become a shadow registry if implementation consumes it as truth.
New graph/duplicate authority	`RISK`	`readonly_dead_link_reporter` and existence resolver have undefined matching/completeness semantics; they must remain advisory and cannot claim canonical-id coverage.
New TAC/IU corpus authority	`NO CURRENT CREATION`	Dual-report prohibition is clear, but literal corpus counts must be removed from normative expectations.
New claim/evidence authority	`UNRESOLVED / BLOCKING`	The proposed claim extractor and verdict engine would define claim/evidence truth without a sealed declaration/binding authority.

6. MVP readiness decision

MVP implementation allowed now: NO

No implementation, schema, runner, Directus integration, system-issues write, TAC/IU bridge, command invocation, detector execution, or PG/registry/filesystem mutation is allowed.

The read/report concept remains viable after correction. The blocker is not the no-run scope; it is the unsupported positive verdict and undefined claim/evidence authority inside that scope.

7. Required fixes before MVP

Remove READ_REPORT_PASS from v0.1. Use a neutral completion verdict and a separate fail-closed Article-14 status; executable claims force ARTICLE14_NOT_PROVEN_EXECUTION_UNVERIFIED.
Define structural claim↔evidence binding fields and validation rules. Reference resolution alone must never produce a positive claim verdict.
State that free-form prose claim discovery is incomplete/advisory unless backed by a governed declaration contract. Missing completeness proof blocks positive dossier status.
Correct the FIX7 pilot scope: it catches missing/ambiguous evidence bindings only, not full Recheck-8. Add the resolvable-but-insufficient-evidence counter-fixture.
Remove literal current counts from normative JSON, gates, module responsibilities, and acceptance outcomes. Keep them only as dated examples.
Replace >=2 denominators with “all relevant discovered denominators remain distinct and fully provenanced.”
Replace literal 41/4 and 219/102 acceptance checks with role/key/population/provenance/separation checks.
Make FLAG/BLOCKED machine-failing; do not allow exit 0 to be interpreted as green.
Replace design assertions about no-run/no-write with enforceable capability boundaries and negative capability tests.
Mark dead-link/doc-reference coverage advisory/unverified until existing graph authority proves coverage; do not imply resolver completeness.
Normalize the Authority Contract status and remove PROGRAM_MACRO_READY/“no engineering omissions remain” until re-sealed.
Add explicit output fields for writes_performed/approved KB report writes so Production mutation: NO cannot hide evidence-output mutation.

8. Minimal safe next step

Return to Claude for correction of the spec, JSON mirror, FIX7 pilot, MVP plan, acceptance matrix, checkpoint packet, and index using the exact fixes above. Do not implement the MVP before Codex re-review.

9. Three declarations

Vĩnh viễn: rules must be driven by governed claim/evidence metadata and live surface provenance, not today’s counts or a fixed phrase list.
Nhầm được không: PASS must be structurally unavailable when executable evidence is unresolved; runtime capabilities must make invoke/write impossible.
100% tự động: no recurring manual interpretation of reports; machine outcome must fail closed and cannot encode FLAG as green.

Evidence inspected

Gap-only Scope Spec .md and .json
FIX7 read/report pilot design
MVP no-code implementation plan
Acceptance test matrix
Authority Contract v0.1
Codex checkpoint packet and index
Codex FIX7 Recheck-8 Article-14/executable evidence
Constitution v4.6.3 / NT14 and PG-first/native/driven evidence via direct main-process KB reads