KB-7D8A

Minimal Technical Design v0.1 — WF Procedure Index

15 min read Revision 1
workflow-manageprocedure-indexdesignminimalinventory-viewreadiness-viewingredient-mapperformancerag-boundaryv0.12026-06-23

Minimal Technical Design v0.1 — WF Procedure Index

Path: knowledge/dev/laws-new/workflow-manage/design/minimal-technical-design-v0.1.md Status: DESIGN ONLY · v0.1 · Non-authorizing · NO DDL proposed for execution · 0 PG objects created Date: 2026-06-23 Companion: ref-grammar-v0.1.md (join key), reports/wf-procedure-index-readonly-survey-2026-06-23.md (evidence). Principle locked: PG reads PG. PG is the truth. Index is the eye. Declarations are hints/cache. Readiness is computed from PG/SQL facts.


1. Q6 — Physical design options & recommendation

A — 1 seed table B — 3 tables + 2 views (RECOMMENDED) C — B + materialized inventory
Shape wf_procedure only; ingredients & links live in manifest_jsonb; views unnest JSONB wf_procedure, wf_procedure_ingredient, wf_procedure_link + wf_inventory_current (view) + wf_procedure_readiness (view) as B but wf_inventory_current is a MATERIALIZED view/cache
Impl speed fast to write, slow to query moderate slowest
Complexity hidden in JSONB unnest explicit columns + refresh machinery
Drift risk high — ref join key buried in JSONB, can't index/constrain cleanly low — ingredient_ref is a real, indexable column low
Query perf poor — every readiness call unnests JSONB good — plain joins, EXISTS probes good, + staleness
Agent-readability poor (must understand JSONB layout) best (plain SELECT) good
"PG reads PG"? OK Yes — inventory is a pure view over catalogs weaker (cache duplicates catalog)
New monster? no no (3 tables = the stated ceiling) risk of a refresh engine

Recommendation: Option B, and even B can start reduced:

  • B-minus for the very first prototype: ship wf_procedure + wf_procedure_ingredient + the two views; defer wf_procedure_link to a links_jsonb column on wf_procedure until M2M links are actually needed. This honors "if fewer than 3 tables works, choose fewer."
  • Promote to full B (real wf_procedure_link) only when reuse/sub-procedure links are queried enough to need indexing.
  • Do NOT start at C. Inventory stays a pure view until measured slowness justifies a matview. This matches the substrate's own idiom: many small governed registry tables + a forest of v_* computed views (49 *readiness* views already exist).

Rationale: Option B keeps ingredient_ref (the grammar join key) as a first-class, indexable column; keeps Inventory as a non-duplicating "pane of glass" over dot_tools/catalogs/registries; and keeps Readiness as a plain joinable view — exactly the problem statement's procedure + ingredient + link + view skeleton.


2. Q2 — Inventory View contract (wf_inventory_current)

A VIEW (not a table) = a UNION ALL of per-kind sub-selects over EXISTING sources. It never copies the tens-of-millions-of-objects universe; each branch reads its own catalog/registry and emits a normalized row. Output columns:

object_kind        text     -- ref-grammar kind (dot, collection, view, field, trigger, function, approval, event, procedure)
object_ref         text     -- canonical normalized ref (PK-ish; grammar §1)
object_code        text     -- raw code/name as stored in source
object_name        text     -- human label if available
source_schema      text     -- e.g. public
source_object      text     -- source table/catalog (dot_tools, information_schema.tables, …)
source_pk          text     -- source row identity (code / schema.table / oid)
exists_bool        boolean  -- always true for a materialized source row (a row = it exists)
authority_level    text     -- authoritative_catalog | authoritative_registry | registry_cache | declared | unknown
metadata_jsonb     jsonb    -- kind-specific extras (risk_level, handler_ref, overloaded, logical/physical, facet…)
freshness_status   text     -- fresh | stale_but_acceptable | refresh_required | unknown_source
last_seen_at       timestamptz -- source timestamp where available (date_updated/last_executed/created_at); now() for live catalog reads
notes              text

v0.1 UNION branches (SAFE kinds):

  1. dotdot_tools (authoritative_registry; last_seen_at=date_updated).
  2. collectioninformation_schema.tablesdirectus_collections (authoritative_catalog/registry; fresh).
  3. viewinformation_schema.views + pg_class matviews (authoritative_catalog; fresh).
  4. fieldinformation_schema.columns (authoritative_catalog; fresh). (large; see perf note — exposed but not eagerly scanned)
  5. triggerinformation_schema.triggers (authoritative_catalog; fresh).
  6. functionpg_proc/information_schema.routines (authoritative_catalog; fresh).
  7. approvalapr_action_types (authoritative_registry; metadata carries risk_level,handler_ref).
  8. eventevent_type_registry (authoritative_registry).
  9. procedurewf_procedure (self) ∪ workflows.process_code (authoritative for those rows).

Deliberately NOT branches in v0.1 (carry UNKNOWN_SOURCE): io, checker, report, template. label is an OPTIONAL 10th branch (taxonomy) with the facet caveat.

Anti-monster guard: wf_inventory_current is for browsing / treeview, not the readiness hot path. The hot path uses correlated EXISTS probes keyed by a single procedure's refs (see §6). The view must never be SELECT *-scanned per agent question.

Object kinds safe for v0.1: dot, collection, view, field, trigger, function, approval, event, procedure (+label PARTIAL). Kinds that remain UNKNOWN_SOURCE: io, checker, report, template.


3. Q4 — Ingredient Map: thin & non-authoritative

wf_procedure_ingredient — one row = one ingredient a procedure declares it needs. It is a lazy human/Agent cache, never truth.

procedure_code                text  NOT NULL   -- FK→wf_procedure (declared)
step_code                     text             -- optional step grouping (declared)
ingredient_kind               text  NOT NULL   -- ref-grammar kind (declared)
ingredient_ref                text  NOT NULL   -- normalized ref, grammar §1 (declared)
usage_role                    text             -- reads | writes | calls | requires | produces (declared)
required_level                text             -- required | optional | nice_to_have | UNKNOWN | NEEDS_TRIAGE (declared)
source_probe_kind             text             -- which inventory branch should verify (declared hint)
missing_route_procedure_code  text             -- where to go if missing (declared)
ref_status                    text             -- NORMALIZED | UNNORMALIZED (declared/validated)
note                          text
  • Declared fields (above) = what the procedure says. Computed fields are NOT stored hereexists_bool, freshness_status, computed_status are produced only by the Readiness View at query time. This separation is the whole point.
  • Stale/unknown handling: an un-parseable ref → ref_status='UNNORMALIZED' → Readiness yields INVALID_REF. A kind with no v0.1 source → Readiness yields UNKNOWN_SOURCE. required_level may be UNKNOWN/NEEDS_TRIAGE — both legal, neither blocks recording the procedure.
  • Avoid mapping all 20,000: ingredients are declared only on demand — for a procedure that is being called, that needs real readiness, or that is one of the 30–50 painful seeds. No bulk backfill. Procedures with zero ingredient rows are valid; they compute as UNMAPPED, not READY.
  • Seed only the painful first 30–50 (see seed-procedure-candidates-v0.1.md).

4. Q5 — Readiness computation (wf_procedure_readiness)

A VIEW joining wf_procedure_ingredient (declared) to live PG existence (computed via the §6 resolver), rolled up per procedure. Two grains:

Per-ingredient computed_status (precedence top-down):

  1. ref_status='UNNORMALIZED'INVALID_REF
  2. kind has no v0.1 source → UNKNOWN_SOURCE
  3. source schema read-denied → READ_BLOCKED
  4. ref resolves & found → EXISTS (downgrade to STALE if the inventory branch reports freshness_status='refresh_required')
  5. ref resolves & not found → MISSING

Per-procedure computed_readiness (rollup):

  • zero ingredient rows → UNMAPPED
  • any required ingredient INVALID_REF or READ_BLOCKEDNOT_READY (+ data-quality flag)
  • any required ingredient MISSING or UNKNOWN_SOURCENOT_READY
  • all required ingredients EXISTS (STALE allowed but flagged) → READY
  • READINESS_DRIFT is raised (orthogonal flag) when declared_maturity claims ready (checklist_ready/dot_sequence_ready/one_button_ready) but computed_readinessREADY.

Full status set emitted (matches macro Q5): EXISTS, MISSING, UNKNOWN_SOURCE, UNMAPPED, INVALID_REF, READ_BLOCKED, STALE, READY, NOT_READY, READINESS_DRIFT.

Truth rule: only PG/SQL-derived existence sets READY. manifest_jsonb, declared_maturity, notes, and any future RAG output can route and suggest but can never set READY. A procedure whose note says "done" but whose required dot:/collection: ingredients are MISSING computes NOT_READY + READINESS_DRIFT.

The 49 existing v_*readiness* views are precedent that this rollup pattern is normal and performant on this DB.


5. wf_procedure seed table (the one declared table)

procedure_code     text PRIMARY KEY      -- PROC_* (self) ; legacy WF-* indexed via inventory, not stored here
title              text
short_note         text
assembly_layer     text                  -- from PG (meta_catalog.composition_level) if known, else 'UNMAPPED'
domain_group       text                  -- controlled tag; 'UNMAPPED' if unknown
intent             text
declared_maturity  text                  -- known|text_ready|machine_readable|checklist_ready|dot_sequence_ready|one_button_ready (HINT)
automation_mode    text                  -- manual|checklist|dot_sequence|one_button|agent_assisted
safety_class       text                  -- e.g. owner_gated | ai_delegatable | read_only
status             text                  -- draft|active|retired
manifest_jsonb     jsonb                 -- light structured note (problem-statement §11 shape)
source_ref         text                  -- provenance (KB path / WF-* / DOT chain)
updated_at         timestamptz

assembly_layer/domain_group default to UNMAPPED — the survey proved procedures carry no layer metadata today. They are never a blocker.


6. Q7 — Performance & refresh model

Default rule (locked): Use a simple view first. Materialize only if measured slow. Event-driven only if the source already emits a change signal. No new scheduler/bus in v0.1.

  1. Hot path is narrow by procedure_code. Agent asks readiness for ONE procedure → Readiness View filters to that procedure's ingredient rows (tens at most) → resolves each ref via a correlated EXISTS subquery on the right catalog/registry. Cost = O(#ingredients of that procedure), not O(catalog). Never build the full wf_inventory_current to answer one question.
  2. Catalog-derived kinds are inherently real-time. dot/collection/table/view/field/trigger/function resolve straight from information_schema/pg_class/dot_tools point lookups → freshness_status='fresh', no cache needed, no force_refresh semantics required (a pure view is always current).
  3. Registry-cache kinds may lag. trigger_registry, collection_registry can trail DDL; when a probe uses them, mark authority_level='registry_cache' and derive freshness_status from their date_updated. Prefer the catalog over the cache when both exist.
  4. Materialize later, narrowly. If the browsing/treeview wf_inventory_current UNION is slow, convert only the heavy branches (e.g. field over information_schema.columns) to a materialized cache with last_seen_at + refresh_mode; keep cheap branches as live views.
  5. Reuse existing change signals, don't build one. If event-driven refresh is ever wanted, ride registry_changelog (~87k), directus_revisions, _recon_dot_fs_inventory, or event_outbox to mark a branch dirty. v0.1 does not need this.
  6. force_refresh is only meaningful once a matview exists; for v0.1 pure views it is a no-op. Before a sensitive action, the Agent re-runs the (always-fresh) view.
  7. Freshness vocabulary: fresh | stale_but_acceptable | refresh_required | unknown_source. Stale-but-acceptable returns with last_seen_at; sensitive ops demand fresh/force_refresh; unknown source never triggers a full catalog sweep.

Must never happen: scanning all 346 tables / 685 views / 618 functions / entity_labels (852k) for every Agent question. The grammar + per-ref EXISTS probes make every question point-lookup-shaped.


7. Q8 — RAG / vector / fuzzy decision

Live fact: installed extensions = btree_gist, pgcrypto, plpgsql, postgres_fdw. No pg_trgm. No vector.

Recommendation: DEFER all vector/RAG in v0.1.

  • Exact questions (exists? missing? ready?) → SQL / Readiness View only. Vector/RAG may never answer exists/missing/ready.
  • Discovery/fuzzy over 30–50 procedures → native ILIKE on wf_procedure.title/short_note/intent/domain_group. No extension, no pipeline. Sufficient at this scale.
  • pg_trgm (trigram ranking) would need CREATE EXTENSION pg_trgm — an owner-gated, out-of-read-only-scope step. Propose it as a separate small step only if procedure count reaches the hundreds and ILIKE ranking is too blunt. It stays PG-native (keeps "PG reads PG").
  • Vector / Qdrant is external (iu_qdrant_collection_registry exists in PG as a pointer). Reserve strictly for large scale and ONLY as two-step: vector suggests candidate procedure_code → SQL/Readiness confirms exists/missing/route. Never authoritative. DEFER.

8. What v0.1 deliberately does NOT do

Per problem-statement non-goals: no workflow engine, no governance/KG/birth integration as a prerequisite, no all-objects registry, no auto-fix, no new scheduler/bus, no manifest-as-truth, no requirement that all procedures/ingredients be mapped first, no DDL executed by this macro. The Procedure Index is the eye, not the hand.