IU CUT Operational Pipeline — 03 COPY_TO_CUT_ZONE (DOT-driven)
03 — COPY_TO_CUT_ZONE: DOT/system-driven copy
Requirement
"Agent must not copy source files manually with tokens. COPY must be DOT/system-driven." "target is No-Vector Staging Zone; vector_excluded=true; queue carries only signal/ref, not body."
Mechanism
Source bytes flow file→container→postgres server-side and never appear in Agent prompt tokens.
KB doc (knowledge_documents.id=1090)
↓ (1) server-side bytea export via lo_export
DO $$ DECLARE v_oid oid;
BEGIN
SELECT lo_from_bytea(0, convert_to(content,'UTF8'))
INTO v_oid FROM knowledge_documents WHERE id=1090;
PERFORM lo_export(v_oid, '/tmp/cut_zone/dieu38_1090.md');
PERFORM lo_unlink(v_oid);
END $$;
/tmp/cut_zone/dieu38_1090.md (postgres:postgres, 7736 B, md5 matches stored)
↓ (2) fn_cut_copy_to_staging(req_id, '/tmp/cut_zone/...')
calls pg_read_file(p_source_path::text)
iu_core.iu_staging_record (staging_kind='agent_intermediate',
payload_type='source_text',
purpose='cut_pipeline_source_copy',
vector_excluded=TRUE,
lifecycle_status='pending',
source_kind='user',
idempotency_key='cut-copy-<uuid>')
+
iu_core.iu_staging_payload (part_name='source_copy',
payload_kind='text',
payload_text=<7736 B raw source>,
byte_len, content_hash)
↓
cut_request.status: requested → copied
cut_request.copy_staging_record_id ← new uuid
The Agent never executes a tool with the source body as an argument: the lo_export write happens entirely inside postgres, then fn_cut_copy_to_staging uses pg_read_file to load it.
Guards inside fn_cut_copy_to_staging
p_source_pathmust be non-empty.p_source_pathmust not contain..(path-traversal refusal — verified D31-h).cut_request.statusmust equal'requested'(D31-d-like contract).pg_read_fileresult must be non-empty.- Defensive secret scan: refuses if content matches
BEGIN PRIVATE KEY | aws_secret_access_key | password=(case-insensitive). vector_excluded=TRUEenforced byiu_staging_record_vector_excluded_chk(existing CHECK).expires_at = now() + interval '20 days'(withiniu_staging_record_expiry_ceiling_chk = 30d).
Signal job (queue carries refs only)
When queue.job_substrate.enabled=true, fn_cut_copy_to_staging enqueues:
{
"job_kind": "cut.copy_to_staging",
"payload_json": {
"cut_request_id": "<uuid>",
"source_ref": "knowledge/dev/laws/dieu38-normative-document-law.md",
"source_kind": "kb_document",
"copy_staging_record_id": "<uuid>",
"manifest_staging_record_id": null,
"cut_run_id": null,
"job_step": "cut.copy_to_staging",
"queued_by": "system_dot_copy"
},
"idempotency_key": "cut.copy_to_staging:<cut_request_id>:copied"
}
No body, no vector, no secret tokens. job_queue_payload_safe_check would refuse otherwise (verified D31-a/b/c).
Live run (Điều 38 v3.0)
SELECT public.fn_cut_copy_to_staging(
p_cut_request_id := '777b1297-18af-4f07-a362-0ad18b043f21'::uuid,
p_source_path := '/tmp/cut_zone/dieu38_1090.md',
p_actor := 'system_dot_copy'
);
→ {
"cut_request_id": "777b1297-...",
"status": "copied",
"staging_file_id": "fbaecf00-5d88-4804-9ecd-44be811fdf88",
"copy_staging_record_id": "fbaecf00-5d88-4804-9ecd-44be811fdf88",
"source_path": "/tmp/cut_zone/dieu38_1090.md",
"source_hash": "fdacc492e62c40f1364392943a310769",
"source_bytes": 7736,
"vector_excluded": true,
"signal": { "enqueued": true, "result": { "job_id": "25d00ebb-..." } }
}
source_hash matches the value computed against knowledge_documents.content for id=1090 — full integrity round-trip.
Why not pass p_source_text directly?
fn_iu_op_mark_file does accept p_source_text and uses it only to compute the digest; it does NOT persist the source bytes. That makes the alias unsuitable for the COPY step on its own because:
- The Agent caller would have to provide the source text as an argument, putting source bytes through prompt tokens — violates the DOT/system-driven requirement.
- After alias return, the source text is no longer available; MARK and later steps cannot re-read it from staging.
The new path stores the source in a source_copy payload row that the MARK step later reads server-side. Source bytes touch the network exactly once (KB→tempfile, via lo_export) and live entirely inside the postgres container thereafter.