KB-3A48

5000x · migration 025 · IU Core retention substrate

5 min read Revision 1
iu-core5000xmigration-025retentionforward-lookingdry-runsandbox-240

05 — Migration 025 · IU Core retention substrate

1. Why now

5000x audited IU Core append-only log volumes:

iu_three_axis_envelope_refresh_log         : 5 rows      ( 64 kB)
iu_three_axis_envelope_trigger_error_log   : 0 rows      (  8 kB)
dot_iu_command_run                         : 18 rows     ( 40 kB)
iu_route_attempt                           : 68 rows     ( 152 kB)
iu_route_dead_letter                       : 0 rows      (  8 kB)
iu_vector_sync_point                       : 64 rows     ( 80 kB)
iu_tree_change_log                         : 56 rows     ( 88 kB)
event_outbox                               : 108 039     (53 MB — shared infra, NOT IU Core scope)

IU Core's own logs are tiny. Retention is forward-looking — authored now so it's ready before any of these tables grows past a threshold worth cleaning. event_outbox is the only large table but it is the broader Cowork/lark-client/Directus event substrate, not IU Core; out of scope for this macro.

2. Surface added (1 table + 1 view + 1 function + 1 gate)

class name role
table iu_core_retention_policy per-target-table N-days policy + actor_scope + reason. Seeded with 3 rows (refresh_log / trigger_error_log / command_run).
view v_iu_core_retention_candidates one row per policy; surfaces cutoff + a sample row (so an operator can preview what would be cleaned).
function fn_iu_core_retention_cleanup(actor, dry_run) gate-checked, default dry_run=true; emits one row per policy with rows_eligible + rows_deleted.
config iu_core.retention_enabled retention gate, default false; required true before any non-dry-run DELETE.

DOT bump: 140 → 144 / 144 PASS (table 22 → 23, view 22 → 23, function 51 → 52, config 9 → 10; trigger / event_type / route unchanged).

3. Default policy seed

target_table keep_days actor_scope reason
iu_three_axis_envelope_refresh_log 30 iu_lifecycle_trigger, iu_5000x_pilot redundant with v_iu_three_axis_envelope_refresh_status snapshot after 30 d
iu_three_axis_envelope_trigger_error_log 90 NULL (all actors) errors are rare and important; keep longer
dot_iu_command_run 90 sandbox / pilot actors only (runtime_500x_op_proof, iu_core_2400x_full_reindex, iu_core_3000x_runtime_330_smoke, iu_5000x_pilot) governance: keep 90 d of one-off macro runs; trim only macro/sandbox actors, never operator activity

Adding a new target table is the registration step — INSERT one row into iu_core_retention_policy and the function picks it up; no other code change is needed.

4. Safety invariants

invariant proof
Function refuses non-dry-run when gate is false sandbox/240.3 (live schema) raises an exception when called with p_dry_run=false and iu_core.retention_enabled='false'
Dry-run is permitted with gate false sandbox/240.2: 3 policies, deleted_total = 0
Forward-looking: today every policy has 0 eligible rows sandbox/240.5: every rows_eligible = 0 against current production volumes
Seeded ancient row IS cleaned when gate is open sandbox/240.6: synthetic INSERT of a 31-day-old iu_three_axis_envelope_refresh_log row + dry_run=false with gate true returns rows_eligible=1, rows_deleted=1 — proves the cleanup branch works
DOT visibility sandbox/240.7: function + view + table + config all present=1

The probe is wrapped in a single BEGIN…ROLLBACK; the seeded row, the gate flip, and the seeded INSERT are all rolled back at exit.

5. Rollback

sql/iu-core/rollback/025_iu_core_retention_substrate.rollback.sql drops in safe order: function → view → table → config row. Pre-condition: gate is false (default at migration time).

6. Five-layer impact

layer impact
PG +1 table / +1 view / +1 function / +1 config row; D9 144/144
Directus none — retention table is not registered as a Directus collection
Nuxt none
AgentData +1 KB report (this doc)
Qdrant none

7. Future enablement runbook (NOT executed in this macro)

  1. Take backup: pg_dump -Fc directus > directus-pre-retention-…dump.
  2. UPDATE dot_config SET value='true' WHERE key='iu_core.retention_enabled'.
  3. Run SELECT * FROM fn_iu_core_retention_cleanup('iu_retention_cron', false); manually first; observe rows_eligible / rows_deleted per policy.
  4. If output looks correct, schedule via cron (host or systemd timer) at the cadence the operator chooses (daily / weekly / monthly).
  5. The healthcheck (doc 04) already surfaces failed cleanup runs via the operator_runtime surface.
  6. Disable: UPDATE dot_config SET value='false' … — the cron call will then refuse non-dry-run automatically.
Back to Knowledge Hub knowledge/dev/laws/dieu44-trien-khai/v0.6-iu-core-5000x-nuxt-pilot-monitoring-rollout-open-goal/05-retention-substrate-migration-025.md