KB-1B32

dot-iu-cutter v0.5 — Constitution Snapshot-source MARK: Matcher Internals + Status Inheritance Design (deterministic; snapshot-evidenced; OD-MC1/group-vs-row resolved)

19 min read Revision 1
dot-iu-cutterv0.5constitution-fixturesnapshot-source-markmatcher-designstatus-inheritanceod-mc1group-vs-rowdeterministicdesign-onlydieu442026-05-18

dot-iu-cutter v0.5 — Constitution Snapshot-source MARK: Matcher Internals + Status Inheritance Design

Phase: v0_5_constitution_snapshot_source_MARK_dryrun_entrypoint_design · Nature: deterministic_matcher_and_status_design_only__no_parser_run · Date: 2026-05-18 · doc 2 of 5

nothing_executed: true ; no parser run ; matchers DESIGNED, not run
implements: GPT ruling OD-MC1 (DESIGN_BEFORE_EXECUTION), OD-G2 (LEAF_IU_IS_DIEU),
            group-vs-row (GROUP_HEADER_STATUS_INHERITS_TO_CHILD_DIEU…)
evidence_base: the pinned snapshot region of artifact 17660443e0f23e99 (read-only)
decision_authority: GPT / User ONLY ; self_advance: PROHIBITED

All line/heading examples below are verbatim from the BEGIN/END region of the pinned snapshot (read-only). Matchers operate on the normalized content lines produced by refimpl.r1 (R-RI3 N8: all empty lines dropped; exactly one \n between content lines; no blank-line runs). Each content line is therefore a non-empty string; the parser sees an ordered list L[0..N-1] of such lines spanning byte offsets within the region.


1. Document model the matchers assume (from the snapshot)

The normalized region is a flat line stream with five macro-zones, in order:

Z1 promulgation_preamble : L0 H1 title + 3 preamble lines ("Văn bản tối cao…",
                           "Ban hành: S148…", "v4.6.3 BAN HÀNH. Giữ nguyên…")
Z2 NGUYEN_TAC_block      : header "15 NGUYÊN TẮC NỀN TẢNG — CẤM VI PHẠM",
                           4 column-header lines (# / Nguyên tắc / Nghĩa / Hệ quả),
                           15 principle records (id 1..15, 3 lines each: name/nghĩa/hệ quả),
                           2 pointer lines ("→ Chi tiết:…", "→ NT3 ngoại lệ:…")
Z3 KIEN_TRUC_block       : header "KIẾN TRÚC HẠ TẦNG DỮ LIỆU — 4 DATABASE + 3 LỚP…",
                           lesson lines, sub-sections "A. CỤM POSTGRESQL…",
                           "B. 3 LỚP KIẾN TRÚC…", "C. NGUYÊN TẮC ĐỌC KIẾN TRÚC"
Z4 bridging_pointers     : "2 CHIỀU QUẢN LÝ" block, "TUYÊN NGÔN | … | HẠ TẦNG"
                           pointer, "THUẬT NGỮ" pointer  (non-content)
Z5 MUC_LUC_LUAT_catalog  : header "MỤC LỤC LUẬT" then status-grouped catalog:
                           "Nền tảng — ✅", "Registry & Governance — ✅",
                           "Vận hành — ✅", "Quản trị — ✅ BAN HÀNH",
                           "Dự thảo — 📝", "Lỗi thời — ⛔"
Z6 CHANGELOG             : header "CHANGELOG", Version/Nội dung column headers,
                           version rows v4.3.0..v4.6.3, final summary line
                           "HP v4.6.3 BAN HÀNH | 15 NT …"  (non-content boilerplate)

Candidate IU levels come from Z2 (NGUYEN_TAC), Z3 (KIEN_TRUC_SECTION), Z5 (DIEU). Z1/Z4/Z6 and all column-header / table-scaffold lines are non-content (boilerplate or container heading) — classified, never turned into IUs, never silently dropped.

2. Required matchers — deterministic rules (OD-MC1: DESIGN_BEFORE_EXECUTION)

Notation: a matcher is a pure function match(line, ctx) -> {hit, level, captures} evaluated in priority order; first hit wins; ctx carries the current zone and the current catalog group. All comparisons are on NFC-normalized code points (never rendered-string equality — C-07 lesson). Diacritics and the 4 status tokens ✅ 📋 📝 ⛔ are significant and preserved by refimpl.r1.

2.1 mc.icx.zone_router (prerequisite, structural)

Deterministically assigns each line a zone by matching the exact zone-entry header strings, in document order, fail-closed:

zone_entry_headers (verbatim, exact-match, anchored = whole line):
  Z2 := "15 NGUYÊN TẮC NỀN TẢNG — CẤM VI PHẠM"
  Z3 := "KIẾN TRÚC HẠ TẦNG DỮ LIỆU — 4 DATABASE + 3 LỚP NÃO-KHO-CỔNG (BỔ SUNG S176)"
  Z4 := "2 CHIỀU QUẢN LÝ"
  Z5 := "MỤC LỤC LUẬT"
  Z6 := "CHANGELOG"
rule:
  - lines before Z2-entry => zone Z1
  - on an exact zone-entry header, switch current zone; that header line itself is
    classified CONTAINER_HEADING (not an IU)
  - unknown line while no zone open AND not a zone-entry => Z1 boilerplate
  - if the 5 zone-entry headers are not all found exactly once in order => FAIL-CLOSED
    (malformed document) -> BLOCKED, no manifest

2.2 mc.icx.nguyen_tac (level NGUYEN_TAC, heading-rule, arabic, leaf)

Detects the 15 principle records inside Z2. After the 4 column-header lines (#, Nguyên tắc, Nghĩa, Hệ quả — exact, classified CONTAINER_HEADING), the block is a strict repeating triple keyed by an integer id line:

record_shape (within Z2, after column headers, before the two "→ " pointer lines):
  line k     : ^(?P<id>([1-9]|1[0-5]))$            # id line: bare 1..15
  line k+1   : <principle_name>      e.g. "LÀM MỘT LẦN, DÙNG MÃI"
  line k+2   : <nghĩa>               e.g. "SSOT duy nhất"
  line k+3   : <hệ quả>              e.g. "Sửa 1 chỗ = thay đổi mọi nơi"
emit:
  level=NGUYEN_TAC, leaf=true, number=<id> (arabic),
  title=<principle_name>, normalized_text = name + "\n" + nghĩa + "\n" + hệ quả,
  source_span = [byte_start(id line) , byte_end(line k+3)]
verbatim_examples_from_snapshot:
  id "1"  -> "LÀM MỘT LẦN, DÙNG MÃI" / "SSOT duy nhất" / "Sửa 1 chỗ = thay đổi mọi nơi"
  id "12" -> "DOT THEO CẶP (2 CHIỀU)" / "Động cơ chính thực hiện + động cơ phụ phát
             hiện lệch, xử lý, báo cáo (…)" / "Không có cặp = không có tự kiểm = thiết kế sai. …"
  id "15" -> "THIẾT KẾ TRƯỚC TRIỂN KHAI" / "Với thay đổi kỹ thuật có rủi ro, …" / "Không
             code/DDL trước bản vẽ. Điều 20 quyết cách triển khai đúng; …"
boundary:
  the two trailing pointer lines "→ Chi tiết: law-01-foundation-principles.md v3.3 …"
  and "→ NT3 ngoại lệ: dieu33-postgresql-law.md §13" are classified
  EXCLUDED_BOILERPLATE (cross-reference pointers), terminating Z2.
note: id set must be exactly {1..15} contiguous; a gap/duplicate/out-of-order id =>
  FAIL-CLOSED malformed-heading.

2.3 mc.icx.kien_truc_section (level KIEN_TRUC_SECTION, heading-rule, letter, leaf)

Detects the lettered architecture sub-sections inside Z3:

section_header_pattern (anchored whole line, within Z3):
  ^(?P<sec>[A-C])\.\s+(?P<sec_title>.+)$
verbatim_examples_from_snapshot:
  "A. CỤM POSTGRESQL — 4 DATABASE TRONG 1 CLUSTER"
  "B. 3 LỚP KIẾN TRÚC — NÃO, KHO, CỔNG (kế thừa data-connection-law v1.1)"
  "C. NGUYÊN TẮC ĐỌC KIẾN TRÚC"
emit:
  level=KIEN_TRUC_SECTION, leaf=true, number=<sec> (letter A/B/C),
  title=<sec_title>, parent = Z3 container heading,
  normalized_text = all lines from the section header (inclusive) up to the next
                     section header OR the Z4 zone entry (exclusive)
boundary:
  Z3 lines before "A." (the lesson lines "Mọi thiết kế…", "Bài học S176…") =>
  classified CONTAINER_HEADING/EXCLUDED_BOILERPLATE under the Z3 parent
note: section ids observed = {A,B,C} contiguous; the embedded 4-DB table lines
  (#/Database/Owner/Vai trò/Lớp/Gateway… rows) are table-scaffold => CONTAINER body
  of section A, not separate IUs (DIEU is the segmentation floor — §4 OD-G2).

2.4 mc.icx.dieu (level DIEU, structural, arabic, leaf — the segmentation floor)

Detects catalog rows inside Z5. Z5 is organised as status-group sub-blocks; inside each group there is a column-header line set, then DIEU rows. A DIEU row is keyed by an Điều id token on its own line:

dieu_id_token (anchored whole line, within Z5, after a group's column headers):
  ^(?P<dieu_id>—|\d+(?:-[A-Z])?|0-S/M/L)$
  observed id domain (verbatim): — 0 0-B 0-G 0-H 0-S/M/L 1 2 3 4 5 6 7 8 9
                                 10 11 12 13 14 15 16 17 18 19 20 22
                                 24 26 28 29 30 31 32 33 35 36 37 38 39 41 43 44 34
row_shape (group-dependent column arity, from the group's own column-header line):
  2-col groups ("Điều","Tên")            -> id line, then <Tên> line
  3-col groups ("Điều","Tên","File")     -> id line, <Tên>, <File>           (Nền tảng)
  3-col groups ("Điều","Tên","Ghi chú")  -> id line, <Tên>, <Ghi chú>        (Quản trị, Dự thảo)
  obsolete group ("Tên","Lý do")         -> NO Điều id; <Tên> line, <Lý do> line
emit:
  level=DIEU, leaf=true, number=<dieu_id> (arabic/compound id token; "—" = pointer row),
  title=<Tên>, normalized_text = the row's content cells joined by "\n",
  parent = the catalog group container, status_marker_observed per §3,
  source_span = [byte_start(id line) , byte_end(last cell line)]
verbatim_examples_from_snapshot:
  group "Nền tảng — ✅":
    "0"   / "Luật Thực thể + Bảo toàn"            / "law-00-entity.md"
    "0-B" / "7 Lớp Cấu tạo (33 species)"          / "law-00b-composition.md"
    "1"   / "13 Nguyên tắc Nền tảng"              / "law-01-foundation-principles.md v3.0 (cần update +NT12+NT13)"
  group "Quản trị — ✅ BAN HÀNH" (3-col, Ghi chú carries a per-row ✅):
    "24"  / "Luật Nhãn (Label)"                   / "✅ 6 facets × 6 layers."
    "32"  / "Luật Phê duyệt v1.1"                 / "✅ S178 Fix 15 BAN HÀNH. PG-native …"
    "44"  / "Luật Schema Đối tượng Chuẩn (UOSL) v0.1.2 controlled DRAFT"
            / "📋 DỰ THẢO KIỂM SOÁT v0.1.2 (2026-05-01). …"
  group "Dự thảo — 📝":  "34" / "Luật Workflow" / "Chờ workflow engine active"
  group "Lỗi thời — ⛔":  "Luật Luồng DL v1.1 (data-connection-law.md)" / "SUPERSEDED — …"
                          ;  "Hiến pháp v3.9" / "SUPERSEDED"
note:
  - the "—" id ("Giải thích Từ ngữ v2.2" / "terminology-glossary.md") is a pointer
    row, not a numbered Điều: emit level=DIEU number="—" kind=pointer_row (still a
    candidate under its group's effective status; flagged confidence=lower for REVIEW)
  - id "0-S/M/L" is a single compound row (one record), not three
  - obsolete group has NO Điều id column: each (Tên,Lý do) pair is one DIEU-level
    record number=null kind=obsolete_entry

2.5 status_marker_detector (cross-cutting; 4 markers; profile-bound)

Operates on the live ratified 4-marker map (status-marker amendment CLOSED_PASS_LIVE):

marker_map (grammar_profile_status_marker, incomex-architecture-constitution-v4):
  "✅" (U+2705)        -> enacted
  "📋" (U+1F4CB)       -> controlled_draft
  "📝" (U+1F4DD)       -> draft
  "⛔" (U+26D4)        -> obsolete
detection_targets (two granularities):
  group_header_marker : a Z5 group-header line of the exact shape
      ^(?P<group_label>.+?)\s+—\s+(?P<marker>✅|📋|📝|⛔)(?:\s+BAN HÀNH)?$
      verbatim hits: "Nền tảng — ✅" ; "Registry & Governance — ✅" ;
                     "Vận hành — ✅" ; "Quản trị — ✅ BAN HÀNH" ;
                     "Dự thảo — 📝" ; "Lỗi thời — ⛔"
  row_marker          : a leading status token at the start of a DIEU "Ghi chú" cell,
      e.g. "✅ 6 facets × 6 layers." (row 24) ; "📋 DỰ THẢO KIỂM SOÁT v0.1.2 …" (row 44)
counts_must_equal: { ✅:19, 📋:1, 📝:1, ⛔:1 }  (= 4 group ✅ + 15 row ✅ ; 1 row 📋 ;
  1 group 📝 ; 1 group ⛔)  — else FAIL-CLOSED (marker census mismatch -> BLOCKED)
rule: any status code point on a candidate-bearing line that is NOT one of the 4
  mapped markers => unknown-marker FAIL-CLOSED (the profile can only exclude what it
  can name — grammar-applicability-review §3).

2.6 changelog_boundary_detector (cross-cutting; terminates content)

rule:
  - the exact line "CHANGELOG" (Z6 zone entry) marks the END of all candidate
    content; everything from "CHANGELOG" through the END sentinel is classified
    EXCLUDED_BOILERPLATE (version table + final "HP v4.6.3 BAN HÀNH | …" summary line)
  - symmetric START guard: lines in Z1 (before the Z2 entry header) — H1 title,
    "Văn bản tối cao…", "Ban hành: S148…", "v4.6.3 BAN HÀNH. Giữ nguyên…" — are
    EXCLUDED_BOILERPLATE / promulgation preamble (the H1 still yields the
    document-level promulgation status, §3.1)
  - Z4 bridging pointers ("2 CHIỀU QUẢN LÝ" block, "TUYÊN NGÔN | … | HẠ TẦNG",
    "THUẬT NGỮ" pointer) => EXCLUDED_BOILERPLATE
purpose: guarantees coverage closure — no content line is left unclassified, and
  CHANGELOG can never leak into a candidate IU.

3. Status inheritance design (group-vs-row ruling; QG4)

Implements GPT GROUP_HEADER_STATUS_INHERITS_TO_CHILD_DIEU_UNTIL_NEXT_STATUS_SCOPE with an added, deterministic document-level base tier (required because Z2/Z3 bear no marker yet are part of the promulgated BAN HÀNH constitution).

3.1 Three-tier deterministic status cascade (most specific wins)

tier_0_document_promulgation (base default):
  derived from Z1: H1 "… v4.6.3 BAN HÀNH" + preamble "v4.6.3 BAN HÀNH" => enacted
  applies to: NGUYEN_TAC (Z2) and KIEN_TRUC_SECTION (Z3) units, which carry NO marker
  -> effective_status(NGUYEN_TAC*, KIEN_TRUC_SECTION*) = enacted   (deterministic)
tier_1_group_header (catalog scope):
  the status marker on a Z5 group-header sets the inherited status for every DIEU row
  in that group, UNTIL the next group-header status scope:
    "Nền tảng — ✅"             -> child DIEU = enacted
    "Registry & Governance — ✅" -> child DIEU = enacted
    "Vận hành — ✅"             -> child DIEU = enacted
    "Quản trị — ✅ BAN HÀNH"    -> child DIEU = enacted
    "Dự thảo — 📝"             -> child DIEU = draft
    "Lỗi thời — ⛔"            -> child DIEU = obsolete
tier_2_explicit_row_marker (overrides tier_1 for that row only):
  a row_marker at the start of a DIEU "Ghi chú" cell overrides the inherited group
  status for THAT DIEU only:
    rows 24,26,28,29,30,31,32,33,35,36,37,38,39,41,43 carry explicit "✅" -> enacted
      (consistent with the "Quản trị — ✅" group; explicit confirms inherited)
    row 44 carries explicit "📋" -> controlled_draft  (OVERRIDES the inherited
      "Quản trị — ✅" enacted; this is the decisive Điều 44 exclusion — GPT ruling #4)
resolution: effective_status = tier_2 if a row marker present,
            else tier_1 if inside a Z5 group,
            else tier_0 (document promulgation = enacted) for Z2/Z3.

3.2 Worked status derivations (from the snapshot — proof of determinism)

NGUYEN_TAC #1..#15                : tier_0 -> enacted        -> CANDIDATE
KIEN_TRUC_SECTION A/B/C           : tier_0 -> enacted        -> CANDIDATE
DIEU 0,0-B,0-G,0-H,0-S/M/L,1 (Nền tảng ✅)        : tier_1 enacted -> CANDIDATE
DIEU 2..9 (Registry & Governance ✅)               : tier_1 enacted -> CANDIDATE
DIEU 10..20,22 (Vận hành ✅)                       : tier_1 enacted -> CANDIDATE
DIEU 24..43 (Quản trị ✅ + explicit row ✅)        : tier_2 enacted -> CANDIDATE
DIEU 44 (Quản trị group ✅, explicit row 📋)       : tier_2 controlled_draft -> EXCLUDED
                                                     reason=controlled_draft_deferred
DIEU 34 (Dự thảo 📝)                               : tier_1 draft -> EXCLUDED
                                                     reason=draft_excluded_by_enacted_only
obsolete entries (Lỗi thời ⛔): "Luật Luồng DL v1.1", "Hiến pháp v3.9"
                                                   : tier_1 obsolete -> EXCLUDED
                                                     reason=obsolete_excluded
"—" pointer row (Nền tảng ✅)      : tier_1 enacted -> CANDIDATE (kind=pointer_row,
                                     confidence=lower; container-vs-leaf = REVIEW decision)

3.3 Exclusion rows + no-silent-drop guarantee

every non-enacted node MUST appear in the manifest as an explicit excluded row:
  { node_id, level, heading, source_span, status_marker_observed,
    effective_status ∈ {controlled_draft,draft,obsolete}, exclusion_reason }
no_silent_drop_invariant:
  union( candidate spans ∪ excluded spans ∪ classified-boilerplate spans )
  == the entire snapshot BEGIN/END region, with NO gap and NO overlap (see doc 4 V-7/V-8)
fail_closed: an unknown marker, an unclassifiable content line, an overlapping span,
  a duplicate address, or a section without a resolvable parent => BLOCKED (no manifest
  emitted), never an auto-pass and never a silent drop.

4. Leaf granularity (OD-G2) and one residual open decision

OD-G2 applied: DIEU is the SEGMENTATION FLOOR — the parser does NOT descend below a
  DIEU row into Khoản/Điểm/sub-bullets; the embedded 4-DB table and per-cell text stay
  inside their DIEU/section unit body. This matches GPT ruling LEAF_IU_IS_DIEU and the
  profile (no Chương/Khoản/Điểm level defined).
OD-G3 (NEW, residual — route to GPT/User, NOT self-resolved):
  GPT ruling OD-G2 names DIEU explicitly but the ratified grammar profile sets
  leaf=true on ALL THREE levels (NGUYEN_TAC / KIEN_TRUC_SECTION / DIEU) and the
  planning readiness range [55,78] counts principles(15) + sections(3) + DIEU.
  Question: are NGUYEN_TAC and KIEN_TRUC_SECTION emitted as their OWN candidate IUs
  (recommended — matches the ratified 3-level profile and the [55,78] range), OR is
  ONLY the DIEU catalog emitted (would collapse the range toward its lower bound)?
  Recommendation: emit all 3 ratified levels; DIEU remains the floor. This is a
  one-line micro-ruling that can be folded into the code-package review (doc 5).

5. Failure modes (explicit, fail-closed — task §2)

Failure Detection Disposition
unknown marker status code point ∉ {✅,📋,📝,⛔} on a candidate-bearing line BLOCKED, no manifest
overlapping spans pairwise span intersection non-empty BLOCKED (no double-cut)
uncovered body text a content line in no candidate/excluded/boilerplate span BLOCKED (no silent drop)
malformed heading zone-entry / id-sequence / column-header shape violated BLOCKED (malformed)
duplicate address two units resolve to the same ICX-CONST/<path> BLOCKED (address collision)
section without parent a DIEU row before any Z5 group header / orphan child BLOCKED (orphan)
marker census mismatch counts ≠ {✅19,📋1,📝1,⛔1} BLOCKED (pre-parse abort)
snapshot drift region sha256 ≠ 17660443…cae80c or length ≠ 17522 BLOCKED (abort before parse)

6. Statement

  • QG3 satisfied: concrete deterministic internals for mc.icx.zone_router, mc.icx.nguyen_tac, mc.icx.kien_truc_section, mc.icx.dieu, status_marker_detector, changelog_boundary_detector — with verbatim snapshot evidence and exhaustive failure modes (OD-MC1 DESIGN_BEFORE_EXECUTION discharged).
  • QG2 satisfied: leaf/segmentation floor = DIEU (OD-G2).
  • QG4 satisfied: 3-tier status cascade implements GROUP_HEADER_STATUS_INHERITS…; explicit row marker overrides group; no silent drop; Điều 44 deterministically EXCLUDED as controlled_draft.
  • Nothing executed; no parser run. One residual micro-decision OD-G3 routed to GPT.
  • doc 2 of 5; STOP after 5 docs → route GPT/User. Self-advance PROHIBITED.

Companion docs: operational-framing (1), manifest-contract (3), command-and-verification-plan (4), entrypoint-design-report (5).

Back to Knowledge Hub knowledge/dev/laws/dieu44-trien-khai/v0.5-constitution-snapshot-source-mark-dryrun-entrypoint-design/dot-iu-cutter-v0.5-constitution-snapshot-mark-matcher-and-status-design-2026-05-18.md