KB-21A0

IU CUT Operational Pipeline — 03 COPY_TO_CUT_ZONE (DOT-driven)

5 min read Revision 1
iu-cut-pipelinecopy-to-cut-zonedot-drivenno-vector-staging-zonepg-read-filelo-export2026-05-26

03 — COPY_TO_CUT_ZONE: DOT/system-driven copy

Requirement

"Agent must not copy source files manually with tokens. COPY must be DOT/system-driven." "target is No-Vector Staging Zone; vector_excluded=true; queue carries only signal/ref, not body."

Mechanism

Source bytes flow file→container→postgres server-side and never appear in Agent prompt tokens.

KB doc (knowledge_documents.id=1090)
  ↓ (1) server-side bytea export via lo_export
       DO $$ DECLARE v_oid oid;
       BEGIN
         SELECT lo_from_bytea(0, convert_to(content,'UTF8'))
           INTO v_oid FROM knowledge_documents WHERE id=1090;
         PERFORM lo_export(v_oid, '/tmp/cut_zone/dieu38_1090.md');
         PERFORM lo_unlink(v_oid);
       END $$;
/tmp/cut_zone/dieu38_1090.md  (postgres:postgres, 7736 B, md5 matches stored)
  ↓ (2) fn_cut_copy_to_staging(req_id, '/tmp/cut_zone/...')
       calls pg_read_file(p_source_path::text)
iu_core.iu_staging_record (staging_kind='agent_intermediate',
                           payload_type='source_text',
                           purpose='cut_pipeline_source_copy',
                           vector_excluded=TRUE,
                           lifecycle_status='pending',
                           source_kind='user',
                           idempotency_key='cut-copy-<uuid>')
  +
iu_core.iu_staging_payload (part_name='source_copy',
                            payload_kind='text',
                            payload_text=<7736 B raw source>,
                            byte_len, content_hash)
  ↓
cut_request.status: requested → copied
cut_request.copy_staging_record_id ← new uuid

The Agent never executes a tool with the source body as an argument: the lo_export write happens entirely inside postgres, then fn_cut_copy_to_staging uses pg_read_file to load it.

Guards inside fn_cut_copy_to_staging

  1. p_source_path must be non-empty.
  2. p_source_path must not contain .. (path-traversal refusal — verified D31-h).
  3. cut_request.status must equal 'requested' (D31-d-like contract).
  4. pg_read_file result must be non-empty.
  5. Defensive secret scan: refuses if content matches BEGIN PRIVATE KEY | aws_secret_access_key | password= (case-insensitive).
  6. vector_excluded=TRUE enforced by iu_staging_record_vector_excluded_chk (existing CHECK).
  7. expires_at = now() + interval '20 days' (within iu_staging_record_expiry_ceiling_chk = 30d).

Signal job (queue carries refs only)

When queue.job_substrate.enabled=true, fn_cut_copy_to_staging enqueues:

{
  "job_kind":         "cut.copy_to_staging",
  "payload_json": {
    "cut_request_id":             "<uuid>",
    "source_ref":                 "knowledge/dev/laws/dieu38-normative-document-law.md",
    "source_kind":                "kb_document",
    "copy_staging_record_id":     "<uuid>",
    "manifest_staging_record_id": null,
    "cut_run_id":                 null,
    "job_step":                   "cut.copy_to_staging",
    "queued_by":                  "system_dot_copy"
  },
  "idempotency_key": "cut.copy_to_staging:<cut_request_id>:copied"
}

No body, no vector, no secret tokens. job_queue_payload_safe_check would refuse otherwise (verified D31-a/b/c).

Live run (Điều 38 v3.0)

SELECT public.fn_cut_copy_to_staging(
  p_cut_request_id := '777b1297-18af-4f07-a362-0ad18b043f21'::uuid,
  p_source_path    := '/tmp/cut_zone/dieu38_1090.md',
  p_actor          := 'system_dot_copy'
);
→ {
  "cut_request_id":         "777b1297-...",
  "status":                 "copied",
  "staging_file_id":        "fbaecf00-5d88-4804-9ecd-44be811fdf88",
  "copy_staging_record_id": "fbaecf00-5d88-4804-9ecd-44be811fdf88",
  "source_path":            "/tmp/cut_zone/dieu38_1090.md",
  "source_hash":            "fdacc492e62c40f1364392943a310769",
  "source_bytes":           7736,
  "vector_excluded":        true,
  "signal": { "enqueued": true, "result": { "job_id": "25d00ebb-..." } }
}

source_hash matches the value computed against knowledge_documents.content for id=1090 — full integrity round-trip.

Why not pass p_source_text directly?

fn_iu_op_mark_file does accept p_source_text and uses it only to compute the digest; it does NOT persist the source bytes. That makes the alias unsuitable for the COPY step on its own because:

  1. The Agent caller would have to provide the source text as an argument, putting source bytes through prompt tokens — violates the DOT/system-driven requirement.
  2. After alias return, the source text is no longer available; MARK and later steps cannot re-read it from staging.

The new path stores the source in a source_copy payload row that the MARK step later reads server-side. Source bytes touch the network exactly once (KB→tempfile, via lo_export) and live entirely inside the postgres container thereafter.

Back to Knowledge Hub knowledge/dev/laws/dieu44-trien-khai/v0.6-iu-cut-operational-pipeline-copy-mark-verify-cut/03-copy-to-cut-zone.md