KB-66EF

dot-iu-cutter v0.4 Scale Automation Non-hardcode GPT Mandate

7 min read Revision 1

dot-iu-cutterv0.4scaleautomationnon-hardcodesql-nosql-hybridinformation-unit-centricledgerwriter-schema-binding

dot-iu-cutter v0.4 — Scale / Automation / Non-hardcode GPT Mandate

Date: 2026-05-17 Authority: GPT acting on User instruction

User strategic requirement

The system must be designed for hundreds of thousands of information pieces, and future scale of millions. The information unit / information piece is the central managed entity. The persistence model must combine SQL DB and NoSQL/vector-style stores in a way that SQL-normalizes as much of the NoSQL data as practical, without breaking flexibility.

The system must maximize automation, avoid hardcoding in any form, and ensure the architecture does not break at scale.

Binding principle

information_unit_centric: true
scale_target:
  near_term: hundreds_of_thousands_of_information_units
  future: millions_of_information_units
non_hardcode: absolute_policy
max_automation: required
sql_first_normalization: required_where_practical
nosql_flexibility: allowed_only_for_payloads_that_cannot_yet_be_normalized
production_schema_is_contract: true
code_must_bind_to_schema_not_abstract_memory_shape: true

Immediate effect on current blocker

The current PG-backed dry-run remains blocked. The next phase must be LedgerWriter schema-binding design, but that design must now include scale and non-hardcode review, not merely column mapping.

Required next phase

Open:

phase: v0_4_LedgerWriter_schema_binding_and_scale_automation_design
nature: design_only

Required documents under:

knowledge/dev/laws/dieu44-trien-khai/v0.4-schema-binding/

Create or include at least:

required_documents:
  - dot-iu-cutter-v0.4-ledgerwriter-schema-gap-analysis-2026-05-17.md
  - dot-iu-cutter-v0.4-ledgerwriter-per-writer-mapping-design-2026-05-17.md
  - dot-iu-cutter-v0.4-state-history-and-sweep-mapping-design-2026-05-17.md
  - dot-iu-cutter-v0.4-mark-review-cut-verify-schema-binding-plan-2026-05-17.md
  - dot-iu-cutter-v0.4-pg-backed-test-revision-plan-2026-05-17.md
  - dot-iu-cutter-v0.4-scale-automation-nonhardcode-review-2026-05-17.md
  - dot-iu-cutter-v0.4-sql-nosql-hybrid-information-unit-strategy-2026-05-17.md
  - dot-iu-cutter-v0.4-schema-binding-risk-and-code-change-plan-2026-05-17.md
  - dot-iu-cutter-v0.4-ledgerwriter-schema-binding-report-2026-05-17.md

Design must answer

1. Scale readiness

For every proposed schema binding and runtime write path, evaluate:

scale_questions:
  - will_this_survive_100k_information_units
  - will_this_survive_1m_information_units
  - which_queries_need_indexes_before_scale
  - which_tables_are_append_only_and_may_grow_fastest
  - what_is_the_cardinality_of_each_table_relative_to_information_unit_count
  - what_write_amplification_does_each_phase_create
  - what_retry_or_idempotency_pattern_prevents_duplicate_rows
  - what_pagination_or_cursor_model_is_required_for_sweeps
  - what_compaction_or_archival_boundary_is_needed_later

2. Non-hardcode policy

The design must explicitly reject:

forbidden_hardcode:
  - fixed_host_ip
  - fixed_container_id
  - hardcoded_database_password_or_DSN
  - hardcoded_table_column_list_inside_runtime_without_schema_contract_tests
  - hardcoded_document_id_or_law_id_in_runtime
  - hardcoded_count_assumptions_not_derived_from_scenario_contract
  - hardcoded_enum_values_without_council_ratified_vocabulary_or_config
  - hardcoded_single_process_assumption_for_concurrency
  - hardcoded_batch_size_without_config
  - hardcoded_vector_collection_name_without config_or_registry

Allowed constants must be classified as one of:

allowed_constant_classes:
  - ratified_protocol_constant
  - config_key_name
  - test_fixture_only
  - schema_contract_test_expected_value
  - migration_artifact_hash

3. SQL / NoSQL hybrid strategy

The design must classify each data element into:

storage_classes:
  SQL_core_identity: normalized_and_indexed
  SQL_relationship: normalized_edge_or_fk_or_soft_ref
  SQL_governance_event: append_only_ledger
  SQL_query_projection: materialized_or_view_projection_future
  JSONB_payload: allowed_only_for_evolving_or_sparse_fields
  vector_store_payload: embedding_semantic_search_only_not_source_of_truth
  object_blob_payload: raw_artifact_only_with_SQL_pointer

The SQL DB must remain the source of truth for identity, lifecycle, governance, review, cut, verify, idempotency, and audit. NoSQL/vector stores may accelerate retrieval or hold flexible payloads, but must not become the hidden authority for canonical identity or state.

4. Information-unit-centric design

Every write path must explain how it relates to information units:

information_unit_questions:
  - source_information_unit_or_logical_unit
  - proposed_new_unit_identity
  - canonical_address_relationship
  - manifest_unit_block_mapping
  - cut_change_set_affected_row_mapping
  - verify_result_mapping
  - alias_and_supersession_deferred_or_active

5. Automation readiness

The design must support later automation:

automation_requirements:
  - job_queue_or_signal_contract_deferred_but_explicit
  - bounded_retry
  - resumable_phase_state
  - idempotency_key_per_phase
  - concurrency_guard_per_unit_or_manifest
  - deterministic_sweep_cursor
  - no_manual_file_copy_in_runtime
  - no_manual_sql_in_runtime
  - structured_redacted_logs
  - metrics_hooks_future

6. Runtime code binding

For each LedgerWriter method:

required_mapping_fields:
  - writer_method
  - target_table
  - target_columns
  - required_not_null_columns
  - source_of_each_value
  - config_or_protocol_constant_if_any
  - insert_update_behavior
  - principal
  - transaction_phase
  - idempotency_or_unique_constraint
  - expected_row_growth_per_information_unit
  - index_dependency
  - scale_risk

Prohibited shortcuts

prohibited:
  - abstract_schema_shadow_for_dry_run
  - column_mapping_shim_without_review
  - changing_production_schema_to_match_in_memory_shape_without_design
  - production_connection
  - production_secret_read
  - code_change_before_design_PASS
  - deploy
  - CUT_VERIFY
  - self_advance_to_code

Expected output

The final report must clearly state:

outputs:
  - whether_code_patch_is_required
  - whether_schema_migration_is_required
  - whether_additional_indexes_are_required_before_scale
  - whether_JSONB_fields_should_remain_JSONB_or_be_normalized
  - whether_PG_backed_dry_run_can_resume_after_code_patch
  - exact_next_code_authoring_scope_if_any

Gate

PG_backed_dry_run: blocked_until_schema_binding_design_PASS_and_code_patch_PASS_if_required