KB-6A18

dot-iu-cutter Session Handoff to Next Chat — Constitution Hardtest and Production-Scale Information Unit Pipeline

16 min read Revision 1
dot-iu-cutterhandoffconstitutionhardtestinformation-unitproduction-scaleingestioncanonicalizationpipelinescalemetadata-labelsnext-session

dot-iu-cutter — Session Handoff to Next Chat

Date: 2026-05-17 Prepared by: GPT User goal: complete the information-unit system so a user can eventually request Cắt hiến pháp and the pipeline operates end-to-end automatically, safely, and at production scale.


0. Why this handoff exists

The current session is very long and contains many gated phases: schema design, production migrations, observability, credentials, code, real PostgreSQL dry-runs, first production single-IU CUT/VERIFY, closeout, and early v0.5 scale design.

The user has now clarified the true next goal:

large_goal: hoàn thành hệ thống miếng thông tin
near_goal: make the cutting pipeline robust enough that `Cắt hiến pháp` can run end-to-end automatically
constitution_source_url: https://vps.incomexsaigoncorp.vn/knowledge/dev/laws/constitution
production_scale_expectation: hundreds_of_thousands_to_millions_of_information_units

Important: Do not treat the Constitution as a small demo. The user explicitly wants it as a hardtest for the real production process.


1. Current production state — completed work

1.1 Schema / observability / credentials

v0_1_governance_tables: LIVE
v0_2_structural_schema: COMPLETE
v0_3_read_observability: LIVE
v0_4_credentials: LIVE
base_tables: 12
primary_keys: 12
in_schema_FKs: 19
observe_views: 12
read_role: cutter_ro
writer_roles:
  - cutter_exec
  - cutter_verify

The governance schema is live in PostgreSQL under cutter_governance.

1.2 Cutter agent code

Repo / SSOT:

code_SSOT: /opt/incomex/dot
branch: main
accepted_HEAD: e93424b5ff7fa5e4b8406131977ce4339cd0856a
VPS_is_code_SSOT: true
scoped_git_add_only: mandatory
never_git_add_A: true
commit_after_code_change: mandatory

Important accepted commits:

689e53e: initial in-memory cutter agent
56d3732: RealPostgresAdapter authored
84c52c5: LedgerWriter schema-binding to deployed schema
6060e1a: RealPostgresAdapter transaction lifecycle fix
db4aa58: UUID-safe CUT/VERIFY signing body fix
e93424b: DOT-pair signature cross-reference + lane-overlap fix

Current accepted HEAD is e93424b5ff7fa5e4b8406131977ce4339cd0856a.

1.3 PG-backed dry-run success

RERUN#4 validated the code against restored production schema in isolated PostgreSQL.

pg_backed_dry_run_RERUN4: PASS
happy_path: MARK -> SWEEP -> REVIEW -> CUT -> VERIFY
final_status: verified_complete
final_rows: 15
negative_idempotency_delta0: PASS
DOT_lane_overlap_prevention: PASS
production_touched: false

Validated row matrix per 1 IU:

decision_backlog_entry: 1
decision_backlog_history: 5
decision_backlog_dependency: 0
decision_backlog_sweep_log: 1
manifest_envelope: 1
manifest_unit_block: 1
review_decision: 1
dot_pair_signature: 2
cut_change_set: 1
cut_change_set_affected_row: 1
verify_result: 1
canonical_address_alias: 0
total: 15

1.4 First controlled production CUT/VERIFY

A single production IU trial succeeded and was closed out.

first_controlled_production_CUT_VERIFY_trial: CLOSED_PASS
TARGET_IU: 04e0c674-2a71-53b7-8d30-9c1a78d6fd17
canonical_address: D38-DIEU28-S3-P1
entry_id: 26a8c4e8-c07c-5ff4-8854-ab55ef4fcf81
change_set_id: 7c963f27-0bc1-4dd9-91ac-4d1f82f82d53
verify_result_id: 633f2c51-9a87-4bb4-a7f6-75342bf72ac7
row_delta: +15
verify_verdict: pass
rollback_triggered: false
forward_compensation_triggered: false
production_scope_clean: true

The production tac_logical_unit row was not mutated; only append-only governance ledger rows were written.

1.5 Production trial closeout

Closeout and post-execution backup verification passed.

post_trial_backup_verification: PASS
restore_test: PASS
restored_backup_contains_plus_15_rows: true
TARGET_IU_unchanged: true
DOT_991_DOT_992_lane_refs: PASS
canonical_address_alias_rows: 0
production_sysid_stable: true
protected_dry_run_envs_untouched: true
secret_leak_check: PASS

2. Current v0.5 design state

2.1 Full-document trial design

v0.5 full-document trial design was authored and reviewed.

v0_5_full_document_trial_design: PASS_WITH_BLOCKERS
full_document_cut_allowed_now: false
hien_phap_cut_allowed_now: false
second_IU_allowed_now: false
bulk_cut_allowed_now: false

Findings:

already_cut_documents:
  DIEU_28: 27_rows
  DIEU_32: 23_rows
  DIEU_35: 36_rows
storage_SSOT:
  - public.tac_logical_unit
  - cutter_governance
hien_phap_in_system: false
hien_phap_external_source_url: https://vps.incomexsaigoncorp.vn/knowledge/dev/laws/constitution
estimated_leaf_IUs_clause_point: 300_to_500
estimated_governance_rows_clause_point: 5000_to_7500
single_IU_delta_invariant: 15

2.2 Pre-scale foundation design

v0.5 pre-scale index + label/metadata foundation design was reviewed and passed.

v0_5_pre_scale_foundation_design: PASS
index_DDL_execution_allowed_now: false
label_registry_creation_allowed_now: false
tier_normalization_write_allowed_now: false
dry_run_at_volume_allowed_now: false
bulk_production_allowed_now: false

Accepted sequencing:

sequence:
  1: pre_scale_index_only_DDL_authoring
  2: index_DDL_dry_run_and_production_execution_if_PASS
  3: dry_run_at_volume_for_existing_3_documents_or_synthetic_document
  4: tier_normalization_readiness_and_write_cycle_if_needed
  5: label_metadata_registry_design_cycle
  6: authoritative_Hien_phap_source_ingestion_design
  7: Hien_phap_dry_run_then_staged_production_small_batch

3. User’s new directive for next session

The user clarified:

user_directive:
  - Use the Constitution file at https://vps.incomexsaigoncorp.vn/knowledge/dev/laws/constitution as a hardtest.
  - The pipeline must be production-real, not a toy demo.
  - The user wants the system eventually to work from a simple request: `Cắt hiến pháp`.
  - This requires completing the entire cutting process and necessary layers: ingestion, canonicalization, labeling/metadata, scale indexes, source authority, storage/merge, and automation.
  - The system must be designed for hundreds of thousands and eventually millions of information units.

Important interpretation:

constitution_is_not_the_goal_itself: true
constitution_is_hardtest_for_general_information_unit_pipeline: true
large_goal: production_ready_information_unit_factory

4. What is missing before Cắt hiến pháp can be automated

4.1 Scale indexes

Current runtime logic is correct, but full document scale can become O(n²) without indexes.

Required next phase already approved:

next_phase: v0_5_pre_scale_index_only_DDL_authoring

Hot paths needing indexes:

hot_paths:
  - decision_backlog_entry(status, emitted_at, entry_id) for SWEEP cursor
  - manifest_envelope(source_doc_ref) for lineage
  - review_decision(manifest_id) for lineage/review lookup
  - cut_change_set(decision_backlog_entry_id) for cut-once guard
  - verify_result(change_set_id) for verify lookup
  - dot_pair_signature(cross_reference_change_set_id)
  - dot_pair_signature(cross_reference_verify_result_id)

4.2 Source document registry / source authority

The current schema can govern CUT/VERIFY rows, but it does not yet fully manage source-document ingestion at scale.

Needed design:

source_document_registry_needed: true
needs:
  - source_url
  - authoritative_version
  - checksum
  - retrieval_timestamp
  - source_format
  - parser_profile
  - source_span_mapping
  - provenance_trace_to_each_IU

For Constitution:

constitution_source_url: https://vps.incomexsaigoncorp.vn/knowledge/dev/laws/constitution
must_verify:
  - accessibility
  - exact content format
  - checksum
  - whether source is authoritative enough
  - whether version is 2013 Constitution or amended variant
  - whether Vietnamese legal structure is preserved

4.3 Ingestion pipeline

Needed stages:

ingestion_pipeline:
  - fetch_source
  - identify_format
  - compute_checksum
  - normalize_encoding
  - strip_non_content_noise
  - preserve headings / numbering / hierarchy
  - produce source_span anchors
  - create deterministic document_version_id
  - output parser-ready canonical source representation

Must be config-driven, no hardcoded file paths or labels.

4.4 Canonicalization pipeline

Needed stages:

canonicalization_pipeline:
  - document grammar detection
  - hierarchy extraction: Chương / Điều / Khoản / Điểm / đoạn
  - deterministic canonical_address generation
  - stable IU id / entry id derivation
  - canonical text normalization
  - source span linkage
  - validation against grammar
  - review queue for ambiguous segments

Constitution requires multi-level grammar beyond current DIEU-28/32/35 address style.

4.5 Label / metadata registry

The user emphasized labels must grow easily at scale.

Needed:

label_metadata_registry:
  - label_dictionary
  - label_assignment_append_only
  - metadata_key_registry
  - metadata_value_type
  - cardinality_policy
  - mutability_policy
  - index_policy
  - hot_key_promotion_policy
  - no_runtime_hardcoded_labels

Principles:

principles:
  - no_new_label_columns_by_default
  - JSONB_allowed_for_sparse_evolving_metadata
  - JSONB_not_hidden_authority
  - frequently_queried_keys_must_be_promoted_to_indexed_SQL_or_registry_assignment
  - SQL_remains_SSOT
  - vector_NoSQL_projection_only

4.6 Tier normalization / existing corpus quality

Existing corpus has a data-quality issue:

DIEU_32_DIEU_35_blank_tier: true
normalization_required_before_using_existing_corpus_as_large_fixture: true

This is a separate read-review-write cycle.

4.7 Dry-run-at-volume

Before any full-document production cut:

dry_run_at_volume_required: true
scope:
  - use restored production schema
  - run hundreds of IUs in isolated DB
  - verify +15*N row invariant or revised invariant if manifest strategy changes
  - measure performance with EXPLAIN / timing
  - verify checkpoint/resume/idempotency
  - verify no duplicate cuts
  - verify DOT lane separation at volume

4.8 Production rollout strategy

Big-bang Constitution production cut is rejected.

production_strategy:
  - dry_run_first
  - staged_production_small_batch
  - exact target range per batch
  - checkpoint/resume
  - forward_compensation_no_delete
  - no document-wide delete rollback

5. Is this a schema problem?

Answer for next session:

core_governance_schema: correct_for_single_IU_and_append_only_CUT_VERIFY
schema_not_wrong: true
schema_not_complete_for_full_information_factory: true
missing_layers:
  - source_document_registry
  - ingestion_profiles
  - canonicalization_grammar
  - label_metadata_registry
  - scale_indexes
  - volume_harness
  - source_versioning_and_provenance

So: v0.4 proved the core ledger. v0.5+ must build the production information-unit factory around it.


6. Next immediate instruction for Agent in next session

The first step should not be Cắt hiến pháp execution. It should be either:

  1. Continue approved sequencing with index-only DDL authoring, or
  2. If user wants a broader architecture reset first, author a Constitution Hardtest Master Plan that includes all missing layers.

Given the user's latest directive, GPT recommends starting next session with a broader master plan before continuing index DDL:

recommended_next_phase: v0_5_constitution_hardtest_and_information_unit_factory_master_plan
nature: design_only

Why: the user wants to ensure all missing layers are considered, not just indexes.

Suggested Agent prompt for next session

Agent, open a design-only phase:
v0.5 Constitution Hardtest and Information Unit Factory Master Plan.

User goal:
The user wants the system to become production-ready so a future request `Cắt hiến pháp` can run the whole process automatically. The Constitution source is:
https://vps.incomexsaigoncorp.vn/knowledge/dev/laws/constitution
This is a hardtest for the general information-unit system, which must scale to hundreds of thousands and eventually millions of information units.

Do not execute CUT/VERIFY.
Do not write production rows.
Do not perform schema migration or index DDL.
Do not create label registry.
Do not deploy/restart.
Do not touch vector/NoSQL.
Design only.

Read and ground from current KB state and production metadata only if read-only and necessary. Do not read secrets.

Create under:
knowledge/dev/laws/dieu44-trien-khai/v0.5-constitution-hardtest-design/

Required documents:
1. dot-iu-cutter-v0.5-constitution-hardtest-master-plan-2026-05-17.md
2. dot-iu-cutter-v0.5-source-document-ingestion-pipeline-design-2026-05-17.md
3. dot-iu-cutter-v0.5-canonicalization-and-address-grammar-design-2026-05-17.md
4. dot-iu-cutter-v0.5-information-unit-label-metadata-registry-master-design-2026-05-17.md
5. dot-iu-cutter-v0.5-scale-index-and-volume-execution-roadmap-2026-05-17.md
6. dot-iu-cutter-v0.5-sql-nosql-projection-and-rebuild-strategy-2026-05-17.md
7. dot-iu-cutter-v0.5-constitution-hardtest-risk-and-gate-plan-2026-05-17.md
8. dot-iu-cutter-v0.5-constitution-hardtest-design-report-2026-05-17.md

Required analysis:
- What exactly happens when user says `Cắt hiến pháp`.
- Source authority and source checksum model.
- Ingestion pipeline per file type, starting with the Constitution URL.
- Canonicalization grammar for Chương / Điều / Khoản / Điểm / đoạn / IU.
- How to avoid hardcoded labels, hardcoded metadata keys, hardcoded source paths.
- Label registry, metadata key registry, assignment model, hot-key promotion.
- How SQL and NoSQL/vector interact: SQL SSOT, vector projection/search only.
- What indexes are mandatory before volume.
- Dry-run-at-volume plan.
- Staged production rollout plan.
- How to merge/store with existing DIEU-28/DIEU-32/DIEU-35 corpus.
- Rollback / forward-compensation policy for multi-IU document.
- Open decisions for GPT.
- Exact downstream sequencing.

Also include a section called `Do not run yet` listing every forbidden action.

Git:
- no code change expected
- no commit expected
- report branch, HEAD, git status --short -- iu-cutter

Hardcode/scale/label:
- no runtime label/key hardcoding
- no fixed IP/DSN/password/container/vector collection
- SQL remains SSOT
- JSONB is not hidden authority
- vector/NoSQL is rebuildable projection only

7. Forbidden until separately authorized

forbidden:
  - Cắt hiến pháp execution
  - full_document_CUT_VERIFY
  - second_production_IU
  - bulk_cut
  - production_reclassification_batch
  - deploy_or_restart
  - schema_migration
  - index_DDL_execution
  - label_registry_schema_creation
  - vector_NoSQL_integration
  - alias_writes
  - code_change_without_explicit_code_phase

8. Key KB review docs from this session

Important review docs:

first_trial_execution_review: knowledge/dev/laws/dieu44-trien-khai/reviews/dot-iu-cutter-v0.4-first-controlled-production-cut-verify-execution-gpt-review-2026-05-17.md
first_trial_closeout_review: knowledge/dev/laws/dieu44-trien-khai/reviews/dot-iu-cutter-v0.4-first-production-trial-closeout-gpt-review-2026-05-17.md
v0_5_full_document_design_review: knowledge/dev/laws/dieu44-trien-khai/reviews/dot-iu-cutter-v0.5-full-document-trial-design-gpt-review-2026-05-17.md
v0_5_pre_scale_foundation_review: knowledge/dev/laws/dieu44-trien-khai/reviews/dot-iu-cutter-v0.5-pre-scale-foundation-design-gpt-review-2026-05-17.md

9. Final status for next chat

session_status: ready_to_move_to_new_chat
large_goal: production_ready_information_unit_system
near_goal: make_Cat_Hien_phap_request_operate_end_to_end_automatically
current_best_next_phase: v0_5_constitution_hardtest_and_information_unit_factory_master_plan
production_write_status: stop_after_first_successful_single_IU_trial
bulk_status: forbidden
hien_phap_status: design_only_next
Back to Knowledge Hub knowledge/dev/laws/dieu44-trien-khai/handoffs/dot-iu-cutter-session-handoff-to-next-chat-2026-05-17.md