09 — DP7 Partitioning / Retention / Archive
09 — DP7 — Partitioning / Retention / Archive
DESIGN-ONLY. Cites Điều 45 §15 (risk), §20.1 transitional clause, Điều 36 NVSZ, Điều 35 audit. No DDL, no DML.
§1. Goal
Plan the lifetime curve for every queue substrate table so that:
- The hot tier stays small enough for partial-index reads to be sub-ms.
- History is preserved long enough for D9 audit / D30 regression / D31 watchdog (90d minimum on runs, 365d on DLQ).
- Staging cleanup at 15d (Điều 36) is preserved.
- No table grows unbounded.
- Partitioning is introduced only when justified by volume — not pre-optimised.
§2. Current state — volume snapshot
| Table | Live rows (2026-05-26 re-survey) | Daily rate observed | Days to 1M | Days to 5M |
|---|---|---|---|---|
event_outbox |
133,778 | ~16k/day (system/issue_opened dominated) | ~54 | ~324 |
event_read |
133,439 | ~16k/day | ~54 | ~324 |
event_pending |
0 | spiky | n/a | n/a |
iu_route_dead_letter |
0 | rare | very long | very long |
iu_route_worker_cursor |
1 | const | never | never |
iu_sql_event_route |
1 | rare | never | never |
dot_iu_runtime_lease |
0 | bursty | small steady-state | n/a |
iu_auto_instantiate_event_log |
33 | rare | very long | very long |
iu_core.iu_staging_record |
(per cutting cadence) | small | n/a | n/a |
iu_core.iu_staging_payload |
(3× staging_record) | small | n/a | n/a |
dot_iu_command_run |
(per DOT call) | tens/day | very long | very long |
job_queue (proposed) |
0 | initially small | depends on Phase 5+ adoption | depends |
job_dead_letter (proposed) |
0 | rare | very long | very long |
queue_heartbeat (proposed) |
0 | const ~10 rows | never | never |
Reading: event_outbox is the only table on a clear trajectory toward partitioning. Everything else is small-cardinality at the rates observed.
§3. Proposed retention contract
| Table | Hot retention | Cold archive | Hard delete | Compatibility note |
|---|---|---|---|---|
event_outbox |
90 days hot (partitioned monthly) | 365 days archive | none (audit) | partition only after ≥ 5M rows |
event_read |
Follows event_outbox partitions |
follows | follows | FK preserved per partition |
event_pending |
7 days hot | none (transient by design) | yes after worker drains + 7d | |
iu_route_dead_letter |
365 days hot | none | yes after 365d (resolved only) | resolution must be set before delete |
job_queue (proposed) |
terminal states succeeded/cancelled cleaned after 30 days → status cleaned |
none (FK preserved 90d) | yes after 90d | preserves audit window |
job_dead_letter (proposed) |
365 days hot | none | yes after 365d (resolved only) | mirrors iu_route_dead_letter |
dot_iu_command_run |
90 days hot | none | yes after 90d (Điều 35 audit baseline) | per Điều 35 to confirm |
iu_core.iu_staging_record |
per Điều 36 (15d expire → 30d archive → clean) | per Điều 36 | per Điều 36 | UNTOUCHED — preserves existing law |
iu_core.iu_staging_payload |
same | same | same | |
iu_auto_instantiate_event_log |
365 days | none | yes after 365d (idempotency preserved by event re-ratification) | |
iu_lifecycle_log |
append-only, never deleted (system law) | n/a | never | per existing law |
iu_vector_sync_point |
reflects qdrant; not transactional | n/a | never | unchanged |
queue_heartbeat |
continuous-mirror; no history | n/a | rows mutate in place | trail goes via heartbeat events |
All cleanup operations land as job_kind='retention_sweep_<table>' jobs in job_queue, gated by iu_core.retention_enabled (already live, currently false) plus a new per-table queue.retention.<table>.enabled flag.
§4. Proposed partitioning model (event_outbox)
NON-EXECUTABLE DESIGN SKETCH — DO NOT APPLY
-- Trigger: when row count crosses 5_000_000, the table is converted to partitioned.
-- Migration sketch (separate pack, separate Council ratification):
CREATE TABLE public.event_outbox_partitioned (LIKE event_outbox INCLUDING ALL)
PARTITION BY RANGE (created_at);
CREATE TABLE public.event_outbox_p_2026_05 PARTITION OF event_outbox_partitioned
FOR VALUES FROM ('2026-05-01') TO ('2026-06-01');
-- ... one partition per month going forward; backfill partitions for historical data
-- Swap-in via attach-then-rename pattern.
-- event_read FK preservation: same partitioning by (event_id) → (created_at) routing.
Important: this is one of the most invasive changes in the roadmap and lives in a dedicated pack with explicit downtime/online plan — not in Phase 1 of doc 14.
Archive scheme:
NON-EXECUTABLE DESIGN SKETCH — DO NOT APPLY
CREATE SCHEMA archive;
CREATE TABLE archive.event_outbox_yyyymm (LIKE public.event_outbox INCLUDING ALL);
-- Cold-storage job:
function fn_event_outbox_archive_sweep(p_cutoff_month text)
-- INSERT INTO archive.event_outbox_yyyymm SELECT ... FROM event_outbox WHERE ...;
-- DELETE FROM event_outbox WHERE created_at < cutoff_month::date;
-- Emits system/event_outbox_archived event.
Archive schema is for forensics; not normally queried.
§5. Lifecycle / status
Per-row retention status added by sweepers:
job_queue.status='cleaned'(terminal; row no longer claimed)- After hard-delete window: row removed.
iu_route_dead_letter.resolved_atset; after 365d row removed.
§6. Indexes / performance
Pre-partitioning (current state) — no change needed.
Post-partitioning — partition-pruning queries on created_at are O(partition_size) not O(table_size). Indexes per partition:
NON-EXECUTABLE DESIGN SKETCH — DO NOT APPLY (per-partition)
CREATE INDEX ON event_outbox_p_yyyy_mm (event_domain, event_type, event_stream);
CREATE INDEX ON event_outbox_p_yyyy_mm (correlation_id) WHERE correlation_id IS NOT NULL;
CREATE INDEX ON event_outbox_p_yyyy_mm (event_subject_table, event_subject_ref) WHERE event_subject_ref IS NOT NULL;
For job_queue — partial indexes from DP2 are sufficient until succeeded row count > 1M; consider monthly partitioning only then.
§7. Security / governance
- Retention sweep jobs require
workflow_admin. - Archive schema GRANT pattern: read-only
analytics_readerrole; no write. - Hard-delete operations are audit-emitted (
system/<table>_hard_deletedevent).
§8. Rollback / disable
- Per-table flag:
queue.retention.<table>.enabled = false→ sweep returns 0 rows. - Master flag:
iu_core.retention_enabled = false(already live, currently false). - Partition rollback: detach partition + rename = O(seconds); no data loss.
§9. Healthcheck / observability
NON-EXECUTABLE DESIGN SKETCH — DO NOT APPLY
view v_queue_retention_pressure AS
SELECT 'event_outbox'::text AS tbl, count(*) AS rows,
min(created_at) AS oldest, max(created_at) AS newest
FROM event_outbox
UNION ALL
SELECT 'job_queue', count(*), min(enqueued_at), max(enqueued_at) FROM job_queue
UNION ALL
SELECT 'iu_route_dead_letter', count(*), min(first_failed_at), max(first_failed_at)
FROM iu_route_dead_letter
UNION ALL
SELECT 'job_dead_letter', count(*), min(first_failed_at), max(first_failed_at)
FROM job_dead_letter;
function fn_retention_health() RETURNS jsonb
-- returns sizes + age + 'partition_recommended':bool for event_outbox
§10. Compatibility with Điều 45 v1.0
| Clause | Compliance |
|---|---|
| §20.1 transitional substrate preserved | ✅ no schema change to live tables at Phase 1 |
| §20.4 worker silent gap is pre-existing, post-enactment violated | ✅ retention sweep itself becomes a heartbeat-emitting executor |
| §15 risk register | ✅ partitioning timed by 5M row threshold (matches survey §6 risk) |
| §14 NVSZ | ✅ archive does not introduce vector path |
| §10 batch principles | ✅ retention sweep uses batch with rollback discipline |
§11. Implementation prerequisites
- DP2 (job_queue) for sweep-as-job pattern.
- DP3 (lease) so sweeps don't double-fire.
- DP4 (heartbeat) so sweep executor is observable.
- A
event_outboxrow count alert (added tov_queue_retention_pressure) before partitioning is required.
§12. Open questions
| # | Question | Routed to |
|---|---|---|
| DP7-Q1 | Approve partition threshold at 5M rows (Council from survey §6)? | Council |
| DP7-Q2 | Approve job_queue cleaned after 30d, hard-delete after 90d? |
Council |
| DP7-Q3 | Approve DLQ 365d retention (vs 180d / never)? | Council |
| DP7-Q4 | Approve archive schema name archive. (vs cold. / attic.)? |
Council |
| DP7-Q5 | Should dot_iu_command_run 90d retention be re-confirmed in Điều 35 amendment? |
Council + D35 |
| DP7-Q6 | Should hard-delete emit a system/*_hard_deleted event? Council vocab gate. |
Council |
§13. Self-test
self_test:
cites_dieu45_section: §15,§20.1,§20.4,§14,§10
defines_status_lifecycle_compatible_with_§6_7: yes (job_queue 'cleaned' state)
defines_idempotency_key: n/a (sweep is idempotent by design)
defines_retry_dlq: defers to DP3 substrate (sweeps are jobs)
defines_lease: inherits DP3 (sweep jobs lease)
defines_observability_view: v_queue_retention_pressure
defines_dot_config_disable_flag: queue.retention.<table>.enabled, iu_core.retention_enabled
defines_executor_set_compatible_with_§11_5: yes (sweep executor = PG_worker)
no_vector_in_transient: yes
signal_not_data: yes (archive holds same rows; no new payload class)
pg_sot: yes
rollback_concept: per-table flag + master flag + detach-partition
no_pg_cron_dependency_phase_1: yes
no_pg_18_dependency: yes
no_mutation_authored: yes
DP7 design. No mutation. Authored 2026-05-26 by Claude Opus 4.7 (1M).