dot-iu-cutter Session Handoff to Next Chat — Constitution Hardtest and Production-Scale Information Unit Pipeline
dot-iu-cutter — Session Handoff to Next Chat
Date: 2026-05-17
Prepared by: GPT
User goal: complete the information-unit system so a user can eventually request Cắt hiến pháp and the pipeline operates end-to-end automatically, safely, and at production scale.
0. Why this handoff exists
The current session is very long and contains many gated phases: schema design, production migrations, observability, credentials, code, real PostgreSQL dry-runs, first production single-IU CUT/VERIFY, closeout, and early v0.5 scale design.
The user has now clarified the true next goal:
large_goal: hoàn thành hệ thống miếng thông tin
near_goal: make the cutting pipeline robust enough that `Cắt hiến pháp` can run end-to-end automatically
constitution_source_url: https://vps.incomexsaigoncorp.vn/knowledge/dev/laws/constitution
production_scale_expectation: hundreds_of_thousands_to_millions_of_information_units
Important: Do not treat the Constitution as a small demo. The user explicitly wants it as a hardtest for the real production process.
1. Current production state — completed work
1.1 Schema / observability / credentials
v0_1_governance_tables: LIVE
v0_2_structural_schema: COMPLETE
v0_3_read_observability: LIVE
v0_4_credentials: LIVE
base_tables: 12
primary_keys: 12
in_schema_FKs: 19
observe_views: 12
read_role: cutter_ro
writer_roles:
- cutter_exec
- cutter_verify
The governance schema is live in PostgreSQL under cutter_governance.
1.2 Cutter agent code
Repo / SSOT:
code_SSOT: /opt/incomex/dot
branch: main
accepted_HEAD: e93424b5ff7fa5e4b8406131977ce4339cd0856a
VPS_is_code_SSOT: true
scoped_git_add_only: mandatory
never_git_add_A: true
commit_after_code_change: mandatory
Important accepted commits:
689e53e: initial in-memory cutter agent
56d3732: RealPostgresAdapter authored
84c52c5: LedgerWriter schema-binding to deployed schema
6060e1a: RealPostgresAdapter transaction lifecycle fix
db4aa58: UUID-safe CUT/VERIFY signing body fix
e93424b: DOT-pair signature cross-reference + lane-overlap fix
Current accepted HEAD is e93424b5ff7fa5e4b8406131977ce4339cd0856a.
1.3 PG-backed dry-run success
RERUN#4 validated the code against restored production schema in isolated PostgreSQL.
pg_backed_dry_run_RERUN4: PASS
happy_path: MARK -> SWEEP -> REVIEW -> CUT -> VERIFY
final_status: verified_complete
final_rows: 15
negative_idempotency_delta0: PASS
DOT_lane_overlap_prevention: PASS
production_touched: false
Validated row matrix per 1 IU:
decision_backlog_entry: 1
decision_backlog_history: 5
decision_backlog_dependency: 0
decision_backlog_sweep_log: 1
manifest_envelope: 1
manifest_unit_block: 1
review_decision: 1
dot_pair_signature: 2
cut_change_set: 1
cut_change_set_affected_row: 1
verify_result: 1
canonical_address_alias: 0
total: 15
1.4 First controlled production CUT/VERIFY
A single production IU trial succeeded and was closed out.
first_controlled_production_CUT_VERIFY_trial: CLOSED_PASS
TARGET_IU: 04e0c674-2a71-53b7-8d30-9c1a78d6fd17
canonical_address: D38-DIEU28-S3-P1
entry_id: 26a8c4e8-c07c-5ff4-8854-ab55ef4fcf81
change_set_id: 7c963f27-0bc1-4dd9-91ac-4d1f82f82d53
verify_result_id: 633f2c51-9a87-4bb4-a7f6-75342bf72ac7
row_delta: +15
verify_verdict: pass
rollback_triggered: false
forward_compensation_triggered: false
production_scope_clean: true
The production tac_logical_unit row was not mutated; only append-only governance ledger rows were written.
1.5 Production trial closeout
Closeout and post-execution backup verification passed.
post_trial_backup_verification: PASS
restore_test: PASS
restored_backup_contains_plus_15_rows: true
TARGET_IU_unchanged: true
DOT_991_DOT_992_lane_refs: PASS
canonical_address_alias_rows: 0
production_sysid_stable: true
protected_dry_run_envs_untouched: true
secret_leak_check: PASS
2. Current v0.5 design state
2.1 Full-document trial design
v0.5 full-document trial design was authored and reviewed.
v0_5_full_document_trial_design: PASS_WITH_BLOCKERS
full_document_cut_allowed_now: false
hien_phap_cut_allowed_now: false
second_IU_allowed_now: false
bulk_cut_allowed_now: false
Findings:
already_cut_documents:
DIEU_28: 27_rows
DIEU_32: 23_rows
DIEU_35: 36_rows
storage_SSOT:
- public.tac_logical_unit
- cutter_governance
hien_phap_in_system: false
hien_phap_external_source_url: https://vps.incomexsaigoncorp.vn/knowledge/dev/laws/constitution
estimated_leaf_IUs_clause_point: 300_to_500
estimated_governance_rows_clause_point: 5000_to_7500
single_IU_delta_invariant: 15
2.2 Pre-scale foundation design
v0.5 pre-scale index + label/metadata foundation design was reviewed and passed.
v0_5_pre_scale_foundation_design: PASS
index_DDL_execution_allowed_now: false
label_registry_creation_allowed_now: false
tier_normalization_write_allowed_now: false
dry_run_at_volume_allowed_now: false
bulk_production_allowed_now: false
Accepted sequencing:
sequence:
1: pre_scale_index_only_DDL_authoring
2: index_DDL_dry_run_and_production_execution_if_PASS
3: dry_run_at_volume_for_existing_3_documents_or_synthetic_document
4: tier_normalization_readiness_and_write_cycle_if_needed
5: label_metadata_registry_design_cycle
6: authoritative_Hien_phap_source_ingestion_design
7: Hien_phap_dry_run_then_staged_production_small_batch
3. User’s new directive for next session
The user clarified:
user_directive:
- Use the Constitution file at https://vps.incomexsaigoncorp.vn/knowledge/dev/laws/constitution as a hardtest.
- The pipeline must be production-real, not a toy demo.
- The user wants the system eventually to work from a simple request: `Cắt hiến pháp`.
- This requires completing the entire cutting process and necessary layers: ingestion, canonicalization, labeling/metadata, scale indexes, source authority, storage/merge, and automation.
- The system must be designed for hundreds of thousands and eventually millions of information units.
Important interpretation:
constitution_is_not_the_goal_itself: true
constitution_is_hardtest_for_general_information_unit_pipeline: true
large_goal: production_ready_information_unit_factory
4. What is missing before Cắt hiến pháp can be automated
4.1 Scale indexes
Current runtime logic is correct, but full document scale can become O(n²) without indexes.
Required next phase already approved:
next_phase: v0_5_pre_scale_index_only_DDL_authoring
Hot paths needing indexes:
hot_paths:
- decision_backlog_entry(status, emitted_at, entry_id) for SWEEP cursor
- manifest_envelope(source_doc_ref) for lineage
- review_decision(manifest_id) for lineage/review lookup
- cut_change_set(decision_backlog_entry_id) for cut-once guard
- verify_result(change_set_id) for verify lookup
- dot_pair_signature(cross_reference_change_set_id)
- dot_pair_signature(cross_reference_verify_result_id)
4.2 Source document registry / source authority
The current schema can govern CUT/VERIFY rows, but it does not yet fully manage source-document ingestion at scale.
Needed design:
source_document_registry_needed: true
needs:
- source_url
- authoritative_version
- checksum
- retrieval_timestamp
- source_format
- parser_profile
- source_span_mapping
- provenance_trace_to_each_IU
For Constitution:
constitution_source_url: https://vps.incomexsaigoncorp.vn/knowledge/dev/laws/constitution
must_verify:
- accessibility
- exact content format
- checksum
- whether source is authoritative enough
- whether version is 2013 Constitution or amended variant
- whether Vietnamese legal structure is preserved
4.3 Ingestion pipeline
Needed stages:
ingestion_pipeline:
- fetch_source
- identify_format
- compute_checksum
- normalize_encoding
- strip_non_content_noise
- preserve headings / numbering / hierarchy
- produce source_span anchors
- create deterministic document_version_id
- output parser-ready canonical source representation
Must be config-driven, no hardcoded file paths or labels.
4.4 Canonicalization pipeline
Needed stages:
canonicalization_pipeline:
- document grammar detection
- hierarchy extraction: Chương / Điều / Khoản / Điểm / đoạn
- deterministic canonical_address generation
- stable IU id / entry id derivation
- canonical text normalization
- source span linkage
- validation against grammar
- review queue for ambiguous segments
Constitution requires multi-level grammar beyond current DIEU-28/32/35 address style.
4.5 Label / metadata registry
The user emphasized labels must grow easily at scale.
Needed:
label_metadata_registry:
- label_dictionary
- label_assignment_append_only
- metadata_key_registry
- metadata_value_type
- cardinality_policy
- mutability_policy
- index_policy
- hot_key_promotion_policy
- no_runtime_hardcoded_labels
Principles:
principles:
- no_new_label_columns_by_default
- JSONB_allowed_for_sparse_evolving_metadata
- JSONB_not_hidden_authority
- frequently_queried_keys_must_be_promoted_to_indexed_SQL_or_registry_assignment
- SQL_remains_SSOT
- vector_NoSQL_projection_only
4.6 Tier normalization / existing corpus quality
Existing corpus has a data-quality issue:
DIEU_32_DIEU_35_blank_tier: true
normalization_required_before_using_existing_corpus_as_large_fixture: true
This is a separate read-review-write cycle.
4.7 Dry-run-at-volume
Before any full-document production cut:
dry_run_at_volume_required: true
scope:
- use restored production schema
- run hundreds of IUs in isolated DB
- verify +15*N row invariant or revised invariant if manifest strategy changes
- measure performance with EXPLAIN / timing
- verify checkpoint/resume/idempotency
- verify no duplicate cuts
- verify DOT lane separation at volume
4.8 Production rollout strategy
Big-bang Constitution production cut is rejected.
production_strategy:
- dry_run_first
- staged_production_small_batch
- exact target range per batch
- checkpoint/resume
- forward_compensation_no_delete
- no document-wide delete rollback
5. Is this a schema problem?
Answer for next session:
core_governance_schema: correct_for_single_IU_and_append_only_CUT_VERIFY
schema_not_wrong: true
schema_not_complete_for_full_information_factory: true
missing_layers:
- source_document_registry
- ingestion_profiles
- canonicalization_grammar
- label_metadata_registry
- scale_indexes
- volume_harness
- source_versioning_and_provenance
So: v0.4 proved the core ledger. v0.5+ must build the production information-unit factory around it.
6. Next immediate instruction for Agent in next session
The first step should not be Cắt hiến pháp execution. It should be either:
- Continue approved sequencing with index-only DDL authoring, or
- If user wants a broader architecture reset first, author a Constitution Hardtest Master Plan that includes all missing layers.
Given the user's latest directive, GPT recommends starting next session with a broader master plan before continuing index DDL:
recommended_next_phase: v0_5_constitution_hardtest_and_information_unit_factory_master_plan
nature: design_only
Why: the user wants to ensure all missing layers are considered, not just indexes.
Suggested Agent prompt for next session
Agent, open a design-only phase:
v0.5 Constitution Hardtest and Information Unit Factory Master Plan.
User goal:
The user wants the system to become production-ready so a future request `Cắt hiến pháp` can run the whole process automatically. The Constitution source is:
https://vps.incomexsaigoncorp.vn/knowledge/dev/laws/constitution
This is a hardtest for the general information-unit system, which must scale to hundreds of thousands and eventually millions of information units.
Do not execute CUT/VERIFY.
Do not write production rows.
Do not perform schema migration or index DDL.
Do not create label registry.
Do not deploy/restart.
Do not touch vector/NoSQL.
Design only.
Read and ground from current KB state and production metadata only if read-only and necessary. Do not read secrets.
Create under:
knowledge/dev/laws/dieu44-trien-khai/v0.5-constitution-hardtest-design/
Required documents:
1. dot-iu-cutter-v0.5-constitution-hardtest-master-plan-2026-05-17.md
2. dot-iu-cutter-v0.5-source-document-ingestion-pipeline-design-2026-05-17.md
3. dot-iu-cutter-v0.5-canonicalization-and-address-grammar-design-2026-05-17.md
4. dot-iu-cutter-v0.5-information-unit-label-metadata-registry-master-design-2026-05-17.md
5. dot-iu-cutter-v0.5-scale-index-and-volume-execution-roadmap-2026-05-17.md
6. dot-iu-cutter-v0.5-sql-nosql-projection-and-rebuild-strategy-2026-05-17.md
7. dot-iu-cutter-v0.5-constitution-hardtest-risk-and-gate-plan-2026-05-17.md
8. dot-iu-cutter-v0.5-constitution-hardtest-design-report-2026-05-17.md
Required analysis:
- What exactly happens when user says `Cắt hiến pháp`.
- Source authority and source checksum model.
- Ingestion pipeline per file type, starting with the Constitution URL.
- Canonicalization grammar for Chương / Điều / Khoản / Điểm / đoạn / IU.
- How to avoid hardcoded labels, hardcoded metadata keys, hardcoded source paths.
- Label registry, metadata key registry, assignment model, hot-key promotion.
- How SQL and NoSQL/vector interact: SQL SSOT, vector projection/search only.
- What indexes are mandatory before volume.
- Dry-run-at-volume plan.
- Staged production rollout plan.
- How to merge/store with existing DIEU-28/DIEU-32/DIEU-35 corpus.
- Rollback / forward-compensation policy for multi-IU document.
- Open decisions for GPT.
- Exact downstream sequencing.
Also include a section called `Do not run yet` listing every forbidden action.
Git:
- no code change expected
- no commit expected
- report branch, HEAD, git status --short -- iu-cutter
Hardcode/scale/label:
- no runtime label/key hardcoding
- no fixed IP/DSN/password/container/vector collection
- SQL remains SSOT
- JSONB is not hidden authority
- vector/NoSQL is rebuildable projection only
7. Forbidden until separately authorized
forbidden:
- Cắt hiến pháp execution
- full_document_CUT_VERIFY
- second_production_IU
- bulk_cut
- production_reclassification_batch
- deploy_or_restart
- schema_migration
- index_DDL_execution
- label_registry_schema_creation
- vector_NoSQL_integration
- alias_writes
- code_change_without_explicit_code_phase
8. Key KB review docs from this session
Important review docs:
first_trial_execution_review: knowledge/dev/laws/dieu44-trien-khai/reviews/dot-iu-cutter-v0.4-first-controlled-production-cut-verify-execution-gpt-review-2026-05-17.md
first_trial_closeout_review: knowledge/dev/laws/dieu44-trien-khai/reviews/dot-iu-cutter-v0.4-first-production-trial-closeout-gpt-review-2026-05-17.md
v0_5_full_document_design_review: knowledge/dev/laws/dieu44-trien-khai/reviews/dot-iu-cutter-v0.5-full-document-trial-design-gpt-review-2026-05-17.md
v0_5_pre_scale_foundation_review: knowledge/dev/laws/dieu44-trien-khai/reviews/dot-iu-cutter-v0.5-pre-scale-foundation-design-gpt-review-2026-05-17.md
9. Final status for next chat
session_status: ready_to_move_to_new_chat
large_goal: production_ready_information_unit_system
near_goal: make_Cat_Hien_phap_request_operate_end_to_end_automatically
current_best_next_phase: v0_5_constitution_hardtest_and_information_unit_factory_master_plan
production_write_status: stop_after_first_successful_single_IU_trial
bulk_status: forbidden
hien_phap_status: design_only_next