D36 No-Vector Staging Zone — 03 SQL + NoSQL Staging Architecture</title> <parameter name="tags">["d36", "no-vector-staging-zone", "architecture", "iu_staging_record", "iu_staging_payload", "design", "2026-05-25"]
03 — SQL + NoSQL Staging Architecture
Purpose. Design the general substrate that serves all temporary data needs — SQL temp, NoSQL temp, MARK manifests, draft payloads, review packages, agent intermediate outputs, event working state, import previews. Parent:
01-d36-no-vector-staging-zone-addendum.md,02-…-matrix.md. Status: DESIGN ONLY. No DDL, no migration, no production mutation.
The substrate is intentionally small — a two-table sidecar pattern (a
proven precedent: see 15000x mig 031 sidecar template registry / 18000x mig 032
sidecar versioning) — parameterised by payload_type. One shape covers all
eight use cases.
1. Substrate overview
┌──────────────────────────────────────────┐
│ iu_staging_record (metadata, SQL row) │
│ PK: staging_record_id (uuid) │
│ - source / owner / lifecycle / index │
│ - payload_type discriminator │
│ - integrity hash + size + counts │
└──────────────────────────────────────────┘
│ 1
│
│ 1..N (some types have multiple parts; most have 1)
▼
┌──────────────────────────────────────────┐
│ iu_staging_payload (payload layer) │
│ PK: staging_payload_id (uuid) │
│ FK: staging_record_id │
│ - payload_kind (json|text|blob_ref) │
│ - payload_json (jsonb, nullable) │
│ - payload_text (text, nullable) │
│ - blob_ref (text, nullable) │
│ - byte_len + content_hash │
└──────────────────────────────────────────┘
Why two tables instead of one wide table?
- The metadata row is small and queryable for governance, lifecycle, indexes, audit, and retention scans. Keeping it narrow keeps lifecycle scans fast.
- The payload row is wide and rarely scanned; it is the place where bulk JSON / text / blob refs live. A separate table lets pg_toast handle big payloads without bloating the metadata table.
- A staging item may have multiple payload parts (e.g. a review package = manifest + commentary + check-list). The 1:N pattern handles this without forcing the metadata row to model parts.
- The "narrow sidecar" pattern matches the 15000x/18000x precedent: "Two narrow sidecars beat one wide sidecar" — applied here to the metadata vs payload split.
2. iu_staging_record — metadata table
| Column | Type | Required | Notes |
|---|---|---|---|
staging_record_id |
uuid PK |
yes | gen_random_uuid(); full uuid, no truncation |
staging_kind |
text (enum) |
yes | One of the 8 use cases — see §4 below |
payload_type |
text (enum) |
yes | Discriminator for payload table; see §5 |
purpose |
text |
yes | Short human description, like collection_registry.purpose |
lifecycle_status |
text (enum) |
yes | pending → approved → consumed / rejected / expired / cleaned |
owner_actor |
text |
yes | Agent or human actor who created the record |
source_kind |
text |
yes | agent / user / system / import |
source_ref |
text |
no | URL / doc id / event id the record references |
idempotency_key |
text UNIQUE |
yes | md5(source_ref‖staging_kind‖suffix); for safe replay (25000x lesson) |
content_hash |
text |
yes | sha256 over canonical payload bytes |
byte_len |
bigint |
yes | Total payload bytes |
part_count |
int |
yes | Number of rows in iu_staging_payload for this record |
metadata |
jsonb |
no | Free-form tags / hints (NOT the payload) |
referenced_iu_ids |
uuid[] |
no | If the record points at IU-Core entities (Điều 37 cross-ref) |
created_at |
timestamptz |
yes | DEFAULT now() |
approved_at |
timestamptz |
no | Set on approved transition |
approved_by |
text |
no | Reviewer name / role |
approval_doc_id |
text |
no | KB doc id of the review package |
consumed_at |
timestamptz |
no | Set on consumed transition |
consumed_by_run_id |
uuid |
no | FK to dot_iu_command_run.run_id of the consumer |
expires_at |
timestamptz |
yes | Computed at create time from policy |
cleaned_at |
timestamptz |
no | Set on cleaned transition |
vector_excluded |
boolean |
yes | DEFAULT true; CHECK = true (Rule N1 hard) |
CHECK constraints (testable):
CHECK (vector_excluded = true) -- Rule N1 hard
CHECK (lifecycle_status IN
('pending','approved','consumed','rejected','expired','cleaned'))
CHECK ( (lifecycle_status = 'approved')
= (approved_at IS NOT NULL AND approved_by IS NOT NULL) )
CHECK ( (lifecycle_status = 'consumed')
= (consumed_at IS NOT NULL AND consumed_by_run_id IS NOT NULL) )
CHECK ( expires_at > created_at )
Indexes: lifecycle_status, expires_at, staging_kind, payload_type,
idempotency_key (UNIQUE), owner_actor.
3. iu_staging_payload — payload table
| Column | Type | Required | Notes |
|---|---|---|---|
staging_payload_id |
uuid PK |
yes | gen_random_uuid() |
staging_record_id |
uuid FK |
yes | → iu_staging_record(staging_record_id); ON DELETE CASCADE |
part_index |
int |
yes | 0-based; UNIQUE per staging_record_id |
part_name |
text |
yes | Human label — manifest / commentary / checklist / preview_rows / … |
payload_kind |
text (enum) |
yes | json / text / blob_ref |
payload_json |
jsonb |
conditional | NOT NULL iff payload_kind = 'json' |
payload_text |
text |
conditional | NOT NULL iff payload_kind = 'text' |
blob_ref |
text |
conditional | NOT NULL iff payload_kind = 'blob_ref'; points at object storage (e.g. s3://… or directus_file_id) |
byte_len |
bigint |
yes | Computed at insert |
content_hash |
text |
yes | sha256 over canonical bytes of this part |
created_at |
timestamptz |
yes | DEFAULT now() |
CHECK constraints:
CHECK (payload_kind IN ('json','text','blob_ref'))
CHECK (
(payload_kind = 'json' AND payload_json IS NOT NULL AND payload_text IS NULL AND blob_ref IS NULL) OR
(payload_kind = 'text' AND payload_text IS NOT NULL AND payload_json IS NULL AND blob_ref IS NULL) OR
(payload_kind = 'blob_ref' AND blob_ref IS NOT NULL AND payload_json IS NULL AND payload_text IS NULL)
)
UNIQUE (staging_record_id, part_index)
Indexes: staging_record_id, payload_kind.
Vectorization note. The payload table is the only place that holds
embeddable bytes (payload_json / payload_text). By having it sit under
iu_staging_payload, the vector connector's species/path exclusion
(04-vector-exclusion-contract.md) is sufficient to keep it out of Qdrant.
4. staging_kind enum (8 supported use cases)
staging_kind:
- mark_manifest # Stage-1 MARK output (cut_manifest)
- review_package # Reviewer's bundle for a manifest
- cut_preview # Dry-run row-set a future CUT would write
- sql_snapshot # SELECT result frozen for review
- nosql_payload # JSON payload (e.g. Lark export, import preview)
- draft_iu_composition # In-flight IU-Core composition before commit
- agent_intermediate # Agent dry-run / research / planning artifact
- event_working_state # In-flight enrichment / partial composition
A new staging_kind value is added only by a follow-up addendum + matching
DOT enum bump + pinning-test bump (small-blast-radius change; not a new law).
5. payload_type enum
Mapped from staging_kind but kept separate to allow re-use across kinds:
payload_type:
- manifest_json # cut_manifest schema M1..M16+R1..R7+C1..C8
- mark_report # textual MARK output (Markdown / log)
- sql_result_snapshot # rows + columns + sql_text + ran_at
- nosql_payload # opaque JSON document
- source_excerpt # excerpt of an external doc/URL
- import_preview # proposed import row-set
- event_working_state # event enrichment buffer
- composition_draft # IU-Core pieces + planned membership
- review_bundle # mixed multi-part: manifest + commentary + checks
review_bundle is the multi-part case (part_count > 1). The others typically
have part_count = 1.
6. Specific use-case shapes
6.1 MARK manifest (covers MARK→REVIEW→CUT integration)
iu_staging_record
staging_kind = 'mark_manifest'
payload_type = 'manifest_json'
purpose = 'MARK manifest for <article#> — IU-Core cut'
source_ref = '<source_doc_url_or_id>'
idempotency_key = md5(source_ref ‖ 'mark_manifest' ‖ <suffix>)
expires_at = created_at + INTERVAL '14 days'
vector_excluded = true
iu_staging_payload (part_index = 0)
part_name = 'manifest'
payload_kind = 'json'
payload_json = { … cut_manifest M1..M16+R1..R7+C1..C8 … }
content_hash = sha256(canonical(payload_json))
The content_hash becomes the manifest_digest that the CUT contract
(dot_iu_cut_from_manifest) requires for the approval gate. No more hash
drift between disk-file and DB-row.
6.2 SQL temporary snapshot
staging_kind = 'sql_snapshot'
payload_type = 'sql_result_snapshot'
payload_json = {
"sql_text": "...",
"ran_at": "2026-05-25T08:11:15Z",
"row_count": 42,
"columns": ["col_a","col_b"],
"rows": [[…],[…],…]
}
6.3 NoSQL temporary payload (Lark / import preview)
staging_kind = 'nosql_payload' | 'import_preview'
payload_type = 'nosql_payload' | 'import_preview'
payload_json = { … upstream JSON document … }
6.4 Review package (multi-part)
iu_staging_record staging_kind = 'review_package' part_count = 3
iu_staging_payload part_index=0 part_name='manifest' payload_kind='json'
iu_staging_payload part_index=1 part_name='commentary' payload_kind='text'
iu_staging_payload part_index=2 part_name='checklist' payload_kind='json'
6.5 Agent intermediate output / draft composition / event working state
All follow the same pattern: one metadata row + one (or several) payload rows,
typed by staging_kind + payload_type.
7. Views
Two read-only views ship with the substrate (matching the 15000x convention):
| View | Purpose |
|---|---|
v_iu_staging_record |
Convenience join of record + part counts + days_to_expiry + age |
v_iu_staging_payload_observability |
Per-record payload summary (size, hashes, parts) |
These views are surfaced in Nuxt /knowledge/registries/staging (optional UI).
8. Functions (DDL contract; not implemented in this macro)
| Function | Caller | Role |
|---|---|---|
fn_iu_staging_create(staging_kind, payload_type, purpose, owner_actor, source_kind, source_ref, idempotency_key, parts jsonb[], expires_at) |
DOT only | Atomic create of record + parts; sets vector_excluded = true |
fn_iu_staging_approve(staging_record_id, approved_by, approval_doc_id) |
DOT only | Gate-checked transition pending → approved |
fn_iu_staging_consume(staging_record_id, consumed_by_run_id) |
DOT only | Gate-checked transition approved → consumed |
fn_iu_staging_reject(staging_record_id, reason) |
DOT only | pending → rejected |
fn_iu_staging_cleanup(scope jsonb) |
DOT only | (expired ∨ consumed ∨ rejected) → cleaned; respects retention policy |
fn_iu_staging_emit_event(staging_record_id, event_type) |
Trigger or DOT | Writes to event_outbox |
fn_iu_staging_healthcheck() |
Healthcheck runner | Returns ok/false; includes vector-exclusion invariant + lifecycle counts |
fn_iu_staging_unregister(staging_record_id) |
Rollback DOT | REFUSED-guarded; deletes record + parts atomically |
All functions are SECURITY DEFINER, STRICT, owned by workflow_admin,
REVOKE FROM PUBLIC, GRANT EXECUTE TO directus. Same pattern as IU-Core.
9. Configuration gates
| Gate | Default | Effect |
|---|---|---|
iu_core.staging_writes_enabled |
false |
Master gate for any write into iu_staging_* |
iu_core.staging_cleanup_enabled |
false |
Master gate for fn_iu_staging_cleanup |
iu_core.no_vector_staging_excluded |
true |
Defence-in-depth gate read by the vector connector |
Standard "in-TX gate toggle reversibility" pattern (proven 12000x): open → run → close inside one transaction.
10. Storage decisions deferred to review
| Decision | Options | Default proposal |
|---|---|---|
| Schema | iu_core vs new iu_staging |
iu_core (reuse role boundary; tables prefixed iu_staging_) |
| Payload size limit | 1 MiB / 10 MiB / unlimited | 10 MiB per part (blob_ref for larger) |
| Retention default | 7d / 14d / 30d per kind | per-kind table; see 07-implementation-backlog.md |
| Approval requirement | Required for mark_manifest, review_package, cut_preview; optional otherwise |
Same |
| Event domain | staging vs iu |
staging (own domain for clean filters) |
Open questions about these are in 08-review-questions-before-implementation.md.