Minimal Technical Design v0.1 — WF Procedure Index
Minimal Technical Design v0.1 — WF Procedure Index
Path:
knowledge/dev/laws-new/workflow-manage/design/minimal-technical-design-v0.1.mdStatus: DESIGN ONLY · v0.1 · Non-authorizing · NO DDL proposed for execution · 0 PG objects created Date: 2026-06-23 Companion:ref-grammar-v0.1.md(join key),reports/wf-procedure-index-readonly-survey-2026-06-23.md(evidence). Principle locked: PG reads PG. PG is the truth. Index is the eye. Declarations are hints/cache. Readiness is computed from PG/SQL facts.
1. Q6 — Physical design options & recommendation
| A — 1 seed table | B — 3 tables + 2 views (RECOMMENDED) | C — B + materialized inventory | |
|---|---|---|---|
| Shape | wf_procedure only; ingredients & links live in manifest_jsonb; views unnest JSONB |
wf_procedure, wf_procedure_ingredient, wf_procedure_link + wf_inventory_current (view) + wf_procedure_readiness (view) |
as B but wf_inventory_current is a MATERIALIZED view/cache |
| Impl speed | fast to write, slow to query | moderate | slowest |
| Complexity | hidden in JSONB unnest | explicit columns | + refresh machinery |
| Drift risk | high — ref join key buried in JSONB, can't index/constrain cleanly | low — ingredient_ref is a real, indexable column |
low |
| Query perf | poor — every readiness call unnests JSONB | good — plain joins, EXISTS probes | good, + staleness |
| Agent-readability | poor (must understand JSONB layout) | best (plain SELECT) | good |
| "PG reads PG"? | OK | Yes — inventory is a pure view over catalogs | weaker (cache duplicates catalog) |
| New monster? | no | no (3 tables = the stated ceiling) | risk of a refresh engine |
Recommendation: Option B, and even B can start reduced:
- B-minus for the very first prototype: ship
wf_procedure+wf_procedure_ingredient+ the two views; deferwf_procedure_linkto alinks_jsonbcolumn onwf_procedureuntil M2M links are actually needed. This honors "if fewer than 3 tables works, choose fewer." - Promote to full B (real
wf_procedure_link) only when reuse/sub-procedure links are queried enough to need indexing. - Do NOT start at C. Inventory stays a pure view until measured slowness justifies a matview. This matches the substrate's own idiom: many small governed registry tables + a forest of
v_*computed views (49*readiness*views already exist).
Rationale: Option B keeps ingredient_ref (the grammar join key) as a first-class, indexable column; keeps Inventory as a non-duplicating "pane of glass" over dot_tools/catalogs/registries; and keeps Readiness as a plain joinable view — exactly the problem statement's procedure + ingredient + link + view skeleton.
2. Q2 — Inventory View contract (wf_inventory_current)
A VIEW (not a table) = a UNION ALL of per-kind sub-selects over EXISTING sources. It never copies the tens-of-millions-of-objects universe; each branch reads its own catalog/registry and emits a normalized row. Output columns:
object_kind text -- ref-grammar kind (dot, collection, view, field, trigger, function, approval, event, procedure)
object_ref text -- canonical normalized ref (PK-ish; grammar §1)
object_code text -- raw code/name as stored in source
object_name text -- human label if available
source_schema text -- e.g. public
source_object text -- source table/catalog (dot_tools, information_schema.tables, …)
source_pk text -- source row identity (code / schema.table / oid)
exists_bool boolean -- always true for a materialized source row (a row = it exists)
authority_level text -- authoritative_catalog | authoritative_registry | registry_cache | declared | unknown
metadata_jsonb jsonb -- kind-specific extras (risk_level, handler_ref, overloaded, logical/physical, facet…)
freshness_status text -- fresh | stale_but_acceptable | refresh_required | unknown_source
last_seen_at timestamptz -- source timestamp where available (date_updated/last_executed/created_at); now() for live catalog reads
notes text
v0.1 UNION branches (SAFE kinds):
dot←dot_tools(authoritative_registry; last_seen_at=date_updated).collection←information_schema.tables∪directus_collections(authoritative_catalog/registry; fresh).view←information_schema.views+pg_classmatviews (authoritative_catalog; fresh).field←information_schema.columns(authoritative_catalog; fresh). (large; see perf note — exposed but not eagerly scanned)trigger←information_schema.triggers(authoritative_catalog; fresh).function←pg_proc/information_schema.routines(authoritative_catalog; fresh).approval←apr_action_types(authoritative_registry; metadata carriesrisk_level,handler_ref).event←event_type_registry(authoritative_registry).procedure←wf_procedure(self) ∪workflows.process_code(authoritative for those rows).
Deliberately NOT branches in v0.1 (carry UNKNOWN_SOURCE): io, checker, report, template. label is an OPTIONAL 10th branch (taxonomy) with the facet caveat.
Anti-monster guard:
wf_inventory_currentis for browsing / treeview, not the readiness hot path. The hot path uses correlatedEXISTSprobes keyed by a single procedure's refs (see §6). The view must never beSELECT *-scanned per agent question.
Object kinds safe for v0.1: dot, collection, view, field, trigger, function, approval, event, procedure (+label PARTIAL). Kinds that remain UNKNOWN_SOURCE: io, checker, report, template.
3. Q4 — Ingredient Map: thin & non-authoritative
wf_procedure_ingredient — one row = one ingredient a procedure declares it needs. It is a lazy human/Agent cache, never truth.
procedure_code text NOT NULL -- FK→wf_procedure (declared)
step_code text -- optional step grouping (declared)
ingredient_kind text NOT NULL -- ref-grammar kind (declared)
ingredient_ref text NOT NULL -- normalized ref, grammar §1 (declared)
usage_role text -- reads | writes | calls | requires | produces (declared)
required_level text -- required | optional | nice_to_have | UNKNOWN | NEEDS_TRIAGE (declared)
source_probe_kind text -- which inventory branch should verify (declared hint)
missing_route_procedure_code text -- where to go if missing (declared)
ref_status text -- NORMALIZED | UNNORMALIZED (declared/validated)
note text
- Declared fields (above) = what the procedure says. Computed fields are NOT stored here —
exists_bool,freshness_status,computed_statusare produced only by the Readiness View at query time. This separation is the whole point. - Stale/unknown handling: an un-parseable ref →
ref_status='UNNORMALIZED'→ Readiness yieldsINVALID_REF. A kind with no v0.1 source → Readiness yieldsUNKNOWN_SOURCE.required_levelmay beUNKNOWN/NEEDS_TRIAGE— both legal, neither blocks recording the procedure. - Avoid mapping all 20,000: ingredients are declared only on demand — for a procedure that is being called, that needs real readiness, or that is one of the 30–50 painful seeds. No bulk backfill. Procedures with zero ingredient rows are valid; they compute as
UNMAPPED, notREADY. - Seed only the painful first 30–50 (see
seed-procedure-candidates-v0.1.md).
4. Q5 — Readiness computation (wf_procedure_readiness)
A VIEW joining wf_procedure_ingredient (declared) to live PG existence (computed via the §6 resolver), rolled up per procedure. Two grains:
Per-ingredient computed_status (precedence top-down):
ref_status='UNNORMALIZED'→INVALID_REF- kind has no v0.1 source →
UNKNOWN_SOURCE - source schema read-denied →
READ_BLOCKED - ref resolves & found →
EXISTS(downgrade toSTALEif the inventory branch reportsfreshness_status='refresh_required') - ref resolves & not found →
MISSING
Per-procedure computed_readiness (rollup):
- zero ingredient rows →
UNMAPPED - any required ingredient
INVALID_REForREAD_BLOCKED→NOT_READY(+ data-quality flag) - any required ingredient
MISSINGorUNKNOWN_SOURCE→NOT_READY - all required ingredients
EXISTS(STALE allowed but flagged) →READY READINESS_DRIFTis raised (orthogonal flag) whendeclared_maturityclaims ready (checklist_ready/dot_sequence_ready/one_button_ready) butcomputed_readiness≠READY.
Full status set emitted (matches macro Q5): EXISTS, MISSING, UNKNOWN_SOURCE, UNMAPPED, INVALID_REF, READ_BLOCKED, STALE, READY, NOT_READY, READINESS_DRIFT.
Truth rule: only PG/SQL-derived existence sets READY. manifest_jsonb, declared_maturity, notes, and any future RAG output can route and suggest but can never set READY. A procedure whose note says "done" but whose required dot:/collection: ingredients are MISSING computes NOT_READY + READINESS_DRIFT.
The 49 existing v_*readiness* views are precedent that this rollup pattern is normal and performant on this DB.
5. wf_procedure seed table (the one declared table)
procedure_code text PRIMARY KEY -- PROC_* (self) ; legacy WF-* indexed via inventory, not stored here
title text
short_note text
assembly_layer text -- from PG (meta_catalog.composition_level) if known, else 'UNMAPPED'
domain_group text -- controlled tag; 'UNMAPPED' if unknown
intent text
declared_maturity text -- known|text_ready|machine_readable|checklist_ready|dot_sequence_ready|one_button_ready (HINT)
automation_mode text -- manual|checklist|dot_sequence|one_button|agent_assisted
safety_class text -- e.g. owner_gated | ai_delegatable | read_only
status text -- draft|active|retired
manifest_jsonb jsonb -- light structured note (problem-statement §11 shape)
source_ref text -- provenance (KB path / WF-* / DOT chain)
updated_at timestamptz
assembly_layer/domain_group default to UNMAPPED — the survey proved procedures carry no layer metadata today. They are never a blocker.
6. Q7 — Performance & refresh model
Default rule (locked): Use a simple view first. Materialize only if measured slow. Event-driven only if the source already emits a change signal. No new scheduler/bus in v0.1.
- Hot path is narrow by
procedure_code. Agent asks readiness for ONE procedure → Readiness View filters to that procedure's ingredient rows (tens at most) → resolves each ref via a correlatedEXISTSsubquery on the right catalog/registry. Cost = O(#ingredients of that procedure), not O(catalog). Never build the fullwf_inventory_currentto answer one question. - Catalog-derived kinds are inherently real-time.
dot/collection/table/view/field/trigger/functionresolve straight frominformation_schema/pg_class/dot_toolspoint lookups →freshness_status='fresh', no cache needed, noforce_refreshsemantics required (a pure view is always current). - Registry-cache kinds may lag.
trigger_registry,collection_registrycan trail DDL; when a probe uses them, markauthority_level='registry_cache'and derivefreshness_statusfrom theirdate_updated. Prefer the catalog over the cache when both exist. - Materialize later, narrowly. If the browsing/treeview
wf_inventory_currentUNION is slow, convert only the heavy branches (e.g.fieldoverinformation_schema.columns) to a materialized cache withlast_seen_at+refresh_mode; keep cheap branches as live views. - Reuse existing change signals, don't build one. If event-driven refresh is ever wanted, ride
registry_changelog(~87k),directus_revisions,_recon_dot_fs_inventory, orevent_outboxto mark a branch dirty. v0.1 does not need this. force_refreshis only meaningful once a matview exists; for v0.1 pure views it is a no-op. Before a sensitive action, the Agent re-runs the (always-fresh) view.- Freshness vocabulary:
fresh | stale_but_acceptable | refresh_required | unknown_source. Stale-but-acceptable returns withlast_seen_at; sensitive ops demandfresh/force_refresh; unknown source never triggers a full catalog sweep.
Must never happen: scanning all 346 tables / 685 views / 618 functions / entity_labels (852k) for every Agent question. The grammar + per-ref EXISTS probes make every question point-lookup-shaped.
7. Q8 — RAG / vector / fuzzy decision
Live fact: installed extensions = btree_gist, pgcrypto, plpgsql, postgres_fdw. No pg_trgm. No vector.
Recommendation: DEFER all vector/RAG in v0.1.
- Exact questions (exists? missing? ready?) → SQL / Readiness View only. Vector/RAG may never answer exists/missing/ready.
- Discovery/fuzzy over 30–50 procedures → native
ILIKEonwf_procedure.title/short_note/intent/domain_group. No extension, no pipeline. Sufficient at this scale. pg_trgm(trigram ranking) would needCREATE EXTENSION pg_trgm— an owner-gated, out-of-read-only-scope step. Propose it as a separate small step only if procedure count reaches the hundreds and ILIKE ranking is too blunt. It stays PG-native (keeps "PG reads PG").- Vector / Qdrant is external (
iu_qdrant_collection_registryexists in PG as a pointer). Reserve strictly for large scale and ONLY as two-step: vector suggests candidateprocedure_code→ SQL/Readiness confirms exists/missing/route. Never authoritative. DEFER.
8. What v0.1 deliberately does NOT do
Per problem-statement non-goals: no workflow engine, no governance/KG/birth integration as a prerequisite, no all-objects registry, no auto-fix, no new scheduler/bus, no manifest-as-truth, no requirement that all procedures/ingredients be mapped first, no DDL executed by this macro. The Procedure Index is the eye, not the hand.