KB-5340

Codex audit — Question Catalog + Dependency Map + Operational Risk Gates — 2026-06-15

25 min read Revision 1
laws-newreportcodexauditquestion-catalogdependency-mapoperational-riskhold2026-06-15

Codex Audit — Question Catalog + Dependency Map + Operational Risk Gates

Date: 2026-06-15
Scope: Read-only audit of knowledge/dev/laws-new/cau-hoi-khi-tai-cau-truc.md and related drafts.
Status: HOLD
Mutation boundary: No production/DB mutation; no checker/pilot/registry/table/collection/DOT/index/worker creation; no edit to knowledge/dev/laws/ or knowledge/dev/laws-new/.

Progress

  • Step 0: Read .claude/skills/incomex-rules.md.
  • Step 1: Read task and mandatory KB laws in main process.
  • Step 2: Read Question Catalog and related laws-new drafts from Agent Data KB.
  • Step 3: Compare against current checkout evidence and identify dependency/gate risks.
  • Step 4: Complete coverage, reuse, modularity, implementation, operation, and gate audits.
  • Step 5: Verify report evidence and finalize.

Mandatory Sources Read

  • .claude/skills/incomex-rules.md — 36 rules / 8 steps.
  • knowledge/dev/ssot/operating-rules.md — v7.58, via search_knowledge("operating rules SSOT").
  • knowledge/dev/laws/constitution.md — current KB result v4.6.3, via search_knowledge("hiến pháp v4.0 constitution").
  • knowledge/dev/laws-new/cau-hoi-khi-tai-cau-truc.md — KB revision 59.
  • knowledge/dev/laws-new/de-bai-cai-tien.md — KB revision 33.
  • knowledge/dev/laws-new/required-stamps.v0.1.json — KB revision 6.
  • knowledge/dev/laws-new/promote-checker-v0.1-spec.md — KB revision 11.
  • knowledge/dev/laws-new/matrix-stamp-governance-addendum.md — KB revision 14.
  • knowledge/dev/laws-new/matrix-refactor-quick-rules.md — KB revision 8.
  • knowledge/dev/laws-new/matrix-refactor-implementation-plan.md — KB revision 5.
  • knowledge/dev/laws-new/roadmap-cai-tien.md — KB revision 1.

Evidence Snapshot

  • Current checkout does not contain the referenced knowledge/dev/laws-new/* files. Attempts to wc/rg those paths returned No such file or directory; the drafts were read from Agent Data KB.
  • Current checkout contains substantial reusable substrate evidence: meta_catalog, collection_registry, dot_tools, birth_registry, universal_edges, system_issues, counting/integrity scanners, Directus snapshot, and smoke tests.
  • The draft set contains internal inconsistencies that the survey must resolve before design:
    • Quick Rules S3 says every stamp is an inspect_* column on birth_registry, while Addendum §2b and required-stamps split pre-promote stamps into staging metadata.
    • Quick Rules rule 17 includes DOT check inside IO Contract, while de-bai-cai-tien.md defines the minimal five-field IO boundary and explicitly keeps DOT/evidence outside when they cause bloat.
    • Drafts use a six-level assembly model while the mandatory skill/constitution baseline describes seven composition levels; the catalog recognizes this as a conflict but does not make document-authority resolution an explicit pre-survey gate.

Three Declarations

  1. Permanent: The audit recommends fixing decision/gate wording, source authority, and per-slice dependencies so future surveys cannot recreate a parallel architecture.
  2. Cannot be mistaken: New-creation and phase-transition decisions must be blocked by explicit evidence plus Owner decision records, not inferred from ANSWERED status alone.
  3. 100% automatic: Not applicable to this read-only audit. For future operation, the catalog must require machine-detectable freshness, liveness, concurrency, and cleanup evidence before pilot.

STATUS: HOLD

EXECUTIVE SUMMARY

  • The Question Catalog has strong coverage and the right stated philosophy: reuse-first, workspace-first, delete-fast, minimal IO boundary, checker verdict-only, atomic promote, scanner/report separated from auto-fix, and explicit operational risks.
  • It is not ready for Owner approval yet because its current gate model can itself create the next monster: approximately 318 questions and about 47 blockers are treated too globally. L0 can block all later work rather than only blocking new design/creation for the selected pilot slice.
  • It is not ready for an uncontrolled next survey. A tightly scoped read-only survey can start only after five wording-level fixes: source authority/version pin, vertical-slice gate scope, explicit Owner decision checkpoints, earlier placement of operational gates, and runtime liveness/config-delivery questions.
  • The most serious implementation risk is not missing architecture. It is agents building from a draft revision that differs from the Owner-approved revision. The referenced laws-new files exist in Agent Data KB but are absent from this checkout.
  • The most serious operational risk is not merely a long transaction. It is a system that produces a valid-looking stale verdict while the config source, scheduler, scanner, or cleanup path is unavailable or drifted.
  • Existing assets should be treated as the default path. Checkout evidence shows mature reusable patterns around meta_catalog, collection_registry, dot_tools, birth_registry, universal_edges, and system_issues. The survey must prove the exact live capability before proposing anything new.

COVERAGE AUDIT

Area Verdict Missing / issue Minimal patch if needed
Reuse Baseline covered L0 is globally blocking and can force a full-system survey before a small pilot Add: “L0 blocks new design/creation only for the selected pilot slice and its direct dependencies; unrelated groups may remain DEFER.”
Birth / minimal canonical identity covered Birth-fold gates are not scoped to atomic promote State that birth-fold blocks atomic promote/pilot promote, not read-only checker design or workspace-only pilot.
Governance as relationship/state information covered Drafts still contain wording that can make each governance fragment a DOT+stamp subsystem Add one question: “Can this governance fact remain metadata/edge/state without a new DOT or stamp?”
Registries / existing SSOT reuse covered No explicit authority check for KB draft versus checkout/runtime source Add source-authority/revision/hash question before survey.
Staging / candidate store covered Actual staging substrate is still unverified; local checkout has no iu_staging_* evidence Keep as survey priority; add proof of authoritative environment/source.
Collections / layer stores covered Risk of standardizing every layer store before the first slice Add question asking which existing store is sufficient for the selected pilot slice and what can be deferred.
Matrix cell / cell_id covered Six-level draft model conflicts with the seven composition levels in the mandatory skill/constitution baseline Add explicit authority-resolution question before accepting cell_id.
Formula / procedure covered Formula work can drift into a new formula registry Add explicit “existing object/metadata/plain artifact first; no formula registry by default.”
IO Contract minimal v0.1 partially covered Cross-draft conflict: minimal five-field boundary versus Quick Rules rule 17 / plan schema adding DOT/check/owner fields Add one invariant question fixing the minimal boundary and separating optional evidence/check fields.
Stamp lifecycle covered Pre/post store split is good, but Quick Rules S3 contradicts it Add cross-document consistency blocker.
Stamp governance declaration without bloat partially covered “one stamp/one inspector/one DOT” language can create column and DOT explosion Add default: no new core stamp/column/DOT unless an existing core stamp cannot carry the evidence.
DOT capability without bloat partially covered DOT-CAP can become a new capability-governance system Ask only for the minimum contract of each reused/target DOT; explicitly forbid a new DOT-capability registry/system in v0.1.
Atomic Promote partially covered Covers all-or-nothing and negative states, but not enough on crash recovery, retries, bypass privileges, and outbox consistency Add four operational questions listed below.
Scanner/report partially covered Correctly separated from auto-fix, but L7 is not a hard pilot gate and liveness/freshness is missing Require a minimal scoped detector plus heartbeat/freshness evidence before pilot.
Pilot cell covered Forbidden surfaces are present; missing hard scope budget and stop/abort criteria Add pilot budget question: touched surfaces, maximum new artifacts, duration, rollback trigger.
Order / dependency partially covered L5b is too late and too monolithic; L0 is too global Attach each risk question to the earliest affected layer and the selected pilot slice.
Operational Risk Gates partially covered Strong on locks/index/stale/cleanup/cell context; missing runtime availability, bypass, crash/outbox, liveness, retention/backpressure Add the small question set in Operational Risk Audit.

DEPENDENCY MAP AUDIT

  • Verdict: HOLD. The conceptual order is mostly sound, but the gate topology is too coarse for the stated fastest-path objective.
  • Hidden dependencies:
    • Exact authoritative draft revision/hash and how runtime/config consumers receive it.
    • Resolution of cross-document contradictions before survey answers can be trusted.
    • Runtime authorization/bypass surface for checker and atomic promote.
    • Coexistence/disable path with existing production behavior.
    • Heartbeat/freshness of scanner, scheduler, cleanup, and config source.
    • Crash/retry/outbox behavior after or around transaction commit.
  • Ordering concerns:
    • RISK-CELL-* belongs with L2/L3, before IO/validation/stamp conclusions.
    • Stale/config binding belongs with L3/L5, before a checker verdict can be meaningful.
    • Query-path/index and transaction-boundary questions belong during L4/L5 design, not only after checker.
    • Cleanup/external artifact questions belong with L1 staging and before workspace delete-fast proof.
    • A minimal scoped scanner/heartbeat must block L8 pilot; it cannot remain merely “nên có.”
  • L5b assessment: Correctly blocks atomic promote rehearsal and pilot, but it is positioned as one late checkpoint. Convert it into risk gates attached to their earliest dependency, with a final L5b roll-up before rehearsal/pilot.
  • Jump risk: The roadmap says “docs fixed → system check” and the implementation plan already proposes pilots, while the new Group R and catalog are not approved. Add an explicit rule that roadmap phase labels do not authorize survey/design/pilot transitions.

REUSE-FIRST AUDIT

  • Verdict: Strong intent, HOLD on enforcement shape.
  • Missing reuse questions:
    • What is the authoritative live/runtime location and revision of the reused asset?
    • Is the asset actually runnable/owned/maintained, or merely present/design-only?
    • What existing generic DOT/check can serve multiple pieces without creating one DOT per piece?
    • What existing disable/rollback path allows reuse without coupling the pilot to production?
    • What is the lifecycle cost of new versus reuse, not only the initial build speed?
  • New-creation guardrail assessment: The rule is strong but has two weaknesses:
    • Requiring proof that “reuse is slower than new” can justify short-term duplication that becomes long-term debt.
    • Requiring all five generic insufficiency proofs for every artifact is not context-sensitive and encourages ceremonial evidence.
  • Minimal wording fix: “New is allowed only when the selected pilot slice cannot satisfy a named invariant using an existing live asset, metadata/config, or a thin wrapper; the exception must name owner, retirement/merge path, lifecycle cost, and Owner decision.”
  • Reality check from checkout: Reuse candidates are concrete, not theoretical. Local evidence found references across code/scripts/docs for meta_catalog (23 files), collection_registry (6), dot_tools (8), birth_registry (4), universal_edges (2), and system_issues (8). No local evidence was found for iu_staging_record, iu_staging_payload, governance_candidate_state, or sandbox_tac; these must not be assumed live from draft prose.

LEGO / MODULARITY AUDIT

  • Verdict: PASS_WITH_MINOR_FIXES in philosophy; HOLD in gate execution.
  • Remaining monster-system risks:
    • The catalog itself becomes a full-system prerequisite rather than a per-slice decision tool.
    • “Every piece has DOT/check/stamp” becomes one DOT, one stamp, or one column per piece.
    • “100% IO Contract” becomes a contract registry or Module Contract First under another name.
    • DOT-CAP becomes a DOT governance/capability subsystem.
    • Candidate packet becomes a persistent packet store rather than a projection over existing staging.
    • Scanner becomes full-system coverage before a scoped pilot.
  • Suggested wording fixes:
    • Define “100% IO Contract” as communication across independent piece boundaries only; internal implementation does not require a new contract artifact.
    • State “one generic reused DOT may validate many pieces through config; no DOT-per-piece default.”
    • State “candidate packet is a view/projection by default, not a new persisted store.”
    • State “scanner scope is the pilot slice and direct dependencies first.”
    • Add a hard cap for the first slice: no new registry/ledger/workflow; any new table/DOT/stamp requires Owner exception.

IMPLEMENTATION-RISK AUDIT

  • Covered risks: building too much before pilot; new DOTs before reuse proof; Module Contract First regression; premature registries/ledgers/workflows; canonical governance in workspace; checker selftest; atomic promote rehearsal; pilot forbidden surfaces; rollback/delete-fast; broad ordering.
  • Missing risks:
    • SRC-AUTH-001: Which exact document revision/hash is authoritative for survey and later runtime? How is mismatch between Agent Data KB, checkout, and deployed/runtime copies detected?
    • SLICE-001: What is the single pilot slice, its direct dependency closure, and what questions are explicitly out of scope/deferred?
    • DECISION-001: Which transitions require an explicit Owner decision record? ANSWERED + Mức 3 must not self-authorize checker/pilot/new creation.
    • COEXIST-001: Can the new path be disabled without changing or breaking the existing path, and is the existing path still authoritative during the pilot?
    • BUDGET-001: What is the maximum touched-surface/new-artifact/time budget that triggers stop and simplification?
  • Blockers to keep:
    • Packet binding/hash before checker verdict.
    • Resolved cell context before IO/validation stamps.
    • Checker fail-closed selftest before lane.
    • Atomic promote negative-state/concurrency rehearsal before pilot promote.
    • Pilot forbidden surfaces and delete-fast proof.
  • Critical gate correction: Current global rule allows progression when blockers reach ANSWERED + Mức 3. Require explicit Owner checkpoint for: selecting the pilot slice, approving any new artifact exception, approving checker design start, and opening pilot.

OPERATIONAL-RISK AUDIT

  • Covered risks:
    • Long transaction / locks / deadlocks and read-only pre-check versus short commit transaction.
    • JSONB query paths, missing indexes, and scanner scope.
    • Stale verdict/stamp/config drift; rule_version, config_hash, IO Contract hash.
    • Cleanup of blob_ref, external artifacts/cache/vector artifacts, event/report decision.
    • UNKNOWN/PENDING context, CELL before IO/VALIDATION, IO binding to resolved cell_id.
  • Missing risks:
    • Runtime config/source availability and parse/load failure.
    • Privilege/bypass path that can write canonical without checker/atomic promote.
    • Crash/retry boundary and consistency of canonical write, consumed status, and outbox/audit.
    • Heartbeat/freshness alert when scanner/scheduler/cleanup/checker support is down or stale.
    • Retention/cardinality/backpressure limits for packets, verdicts, issues, and outbox.
    • Clock/TTL semantics and expiry comparison source.
  • Group R assessment: Good foundation, but its 32 questions are not sufficient for operational approval and are attached too late. They are sufficient to start a scoped read-only survey after the five catalog fixes.
  • Additional questions, if any:
    • RISK-RUN-001: If required-stamps/config cannot be loaded, parsed, or version-pinned, does checker fail closed and expose a machine-detectable reason?
    • RISK-AUTH-001: Which roles/functions can bypass checker/atomic promote and write canonical or consumed state directly?
    • RISK-CRASH-001: After crash/retry at every commit boundary, can the system produce double-promote, canonical-without-audit, or consumed-without-canonical?
    • RISK-LIVE-001: How is scanner/scheduler/cleanup/config-source freshness measured, and what blocks pilot if it is stale/down?
    • RISK-CAP-001: What retention/cardinality/backpressure thresholds bound packets, verdicts, issues, and outbox for the pilot?
    • RISK-TIME-001: Which clock/source defines TTL and expiry, and how is clock skew handled?

GATE IMPACT AUDIT

Question / group Current classification Codex view Reason
L0 Reuse Baseline Blocks all L1-L8 Overclassified globally Must block new design/creation for the selected slice, not unrelated read-only survey work.
Birth fold BIRTH-013/014 BLOCKER Keep, narrow scope Blocks atomic promote/pilot promote; should not block workspace-only pilot or read-only checker design.
STG-012 cleanup scheduler BLOCKER Move/narrow Blocker before delete-fast pilot, not before checker design/selftest.
STG-015 packet_hash BLOCKER Keep Verdict is unsafe without stable binding.
CELL species/cell_id blockers BLOCKER Keep per slice IO/validation context is invalid until resolved; do not require global taxonomy completion.
STAMP-GOV blockers BLOCKER Narrow to high-risk/canonical General workspace/pilot should not inherit heavy governance.
DOT-CAP blockers BLOCKER Reduce to target-DOT contract A generic capability system must not become a prerequisite.
Atomic Promote blockers BLOCKER Keep Required before pilot promote; include concurrency/crash/bypass evidence.
PILOT-018 forbidden surfaces BLOCKER Keep Directly prevents accidental production/canonical mutation.
RISK-AP blockers BLOCKER Keep, move earlier Transaction risks must shape design before rehearsal.
RISK-IDX-001..004 BLOCKER Keep only for actual pilot query paths Global index proof would overbuild; scoped query-budget proof is necessary.
RISK-STL-001/005/006/007 BLOCKER Keep, move earlier A stale PROMOTE_OK invalidates checker safety.
RISK-GC-001..004 REQUIRED Conditional BLOCKER before delete-fast pilot If external artifacts exist, delete-fast is unproven without cleanup behavior.
RISK-CELL-001..004 BLOCKER Keep, move to L2/L3 Context must be valid before IO/validation/checker.
Scanner/report liveness Not a hard gate Underclassified Pilot operation is unsafe if the detector can silently stop.
Owner transition decision Missing Add BLOCKER ANSWERED is evidence status, not authorization.
Source authority/revision pin Missing Add BLOCKER before survey Surveying or implementing against different draft revisions creates immediate drift.

ANTI-BLOAT AUDIT

Location Risk Suggested wording-level fix
Dependency Map L0 Full catalog becomes prerequisite monster Scope L0 and blockers to selected pilot slice + direct dependencies.
§2c new-creation rule “New is faster” rationalizes duplicate architecture Compare lifecycle cost and require owner exception + retirement/merge path.
De-bai Lego Protocol Every piece gets its own DOT/stamp “Reuse generic DOT + config first; no DOT/stamp per piece by default.”
Quick Rules S3 / Addendum stamp wording inspect_* column and DOT explosion; contradicts pre-promote store split Limit inspect_* to proven canonical/post-promote needs; pre-promote uses existing staging metadata.
IO Contract wording across drafts Minimal IO becomes Module Contract First Freeze the five-field boundary; checks/evidence/owner are references or optional overlays, not mandatory contract expansion.
Candidate packet spec Packet store/ledger appears State packet is projection/view over existing staging by default; persistence requires reuse-insufficiency proof.
Formula group Formula registry appears State formulas remain ordinary governed objects/metadata unless reuse survey proves otherwise.
Governance modules Registry/workflow per governance fact State modules are information fields/edges/states, not deployable subsystems by default.
DOT-CAP DOT capability/governance system appears Limit to per-target-DOT minimum declaration; prohibit new registry/system in v0.1.
Scanner group Full-system scanner/auto-fix appears Scope to report-only pilot slice first; no auto-fix; reuse existing detector/query.
Domain/cell work Ontology/domain tree appears Keep domain as controlled tag for v0.1; unresolved items remain workspace-only.

FINAL RECOMMENDATION

  • Ready for Owner approval? no
  • Ready for read-only survey? conditional — only a tightly scoped survey after the five wording fixes below
  • Need another Codex audit after Owner approval? yes — after the L0/L1 scoped survey and before checker design/start
  • Top 5 fixes before survey:
    1. Add source-authority/revision/hash gate covering Agent Data KB, checkout, and runtime/config location.
    2. Change global L0/L5b blocking into selected-pilot-slice + direct-dependency blocking.
    3. Add explicit Owner decision checkpoints; ANSWERED + Mức 3 is not authorization.
    4. Move risk questions to their earliest dependency and make minimal scanner/heartbeat a pilot gate.
    5. Resolve/flag cross-draft contradictions: pre-promote stamp store, IO Contract boundary, and six-versus-seven composition levels.
  • Top 5 risks during survey:
    1. Treating draft/KB claims as live production facts.
    2. Surveying the whole system instead of one pilot dependency closure.
    3. Turning each governance fact into a new DOT/stamp/column/registry.
    4. Using short-term “faster to create new” as justification for permanent duplicate architecture.
    5. Declaring operational readiness without liveness, bypass, crash/retry, and config-delivery evidence.

DO NOT IMPLEMENT

  • Confirmed: no production mutation, no DB mutation, no checker implementation, no pilot, no registry/table/collection/DOT/index/worker creation, and no edits to knowledge/dev/laws/ or knowledge/dev/laws-new/.

Process Evidence (Steps 0–6)

  • Step 0 — Foundation: Read skill, OR v7.58, Constitution current KB v4.6.3, Assembly First/DOT/scanner rules.
  • Step 1 — Receive: One mission only: audit catalog quality and two risk classes.
  • Step 2 — Think/design: N/A for feature design; this is a read-only audit. Evaluated objective, method, prerequisites, and dependency roadmap.
  • Step 3 — Code: No implementation. Only this required audit report was created.
  • Step 4 — Two hats: N/A; no code/deployment. Self-review performed against the requested ten audit sections.
  • Step 5 — Verify: No production verification permitted by task. Read-only evidence pasted below.
  • Step 6 — Report/update: Report created here. OR/TD/handoff not updated because this audit does not change enacted operations, technical debt state, or implementation state.

Read-only output evidence

$ wc -l knowledge/dev/laws-new/cau-hoi-khi-tai-cau-truc.md ...
wc: knowledge/dev/laws-new/cau-hoi-khi-tai-cau-truc.md: open: No such file or directory
...
0 total
checkout reference counts:
birth_registry                      4
collection_registry                 6
meta_catalog                       23
dot_tools                           8
universal_edges                     2
canonical_fields                    0
system_issues                       8
event_outbox                        0
registry_changelog                  0
iu_staging_record                   0
iu_staging_payload                  0
governance_candidate_state          0
sandbox_tac                         0
Agent Data full-document revisions read:
cau-hoi-khi-tai-cau-truc.md          revision 59
de-bai-cai-tien.md                    revision 33
required-stamps.v0.1.json             revision 6
promote-checker-v0.1-spec.md          revision 11
matrix-stamp-governance-addendum.md   revision 14
matrix-refactor-quick-rules.md         revision 8
matrix-refactor-implementation-plan.md revision 5
roadmap-cai-tien.md                    revision 1
Back to Knowledge Hub knowledge/dev/laws-new/reports/codex/codex-question-catalog-dependency-operational-risk-audit-2026-06-15.md