06 — Process Discovery Engine v1 Design (Workstream E)

Generic engine for discovering, scoring, and governing existing automated workflows from live evidence. dot-kg is the pilot, not the hardcode: the design and the live views already cover 17 candidates across 14 DOT families + job pipelines + workflows.

6.0 Principle

Owner-accepted lifecycle: Scan → Evidence graph → Orphan check → Score/verify → Label → Birth admission → Governance handoff → RP publish → Continuous drift. Engine proposes, never canonizes (golden rule, Đ39). Live-evidence-wins. No candidate is "verified" without runtime+correlation.

6.1 Stage 1 — Discovery scan

Sweep the sources that hold process evidence, generically:

Components: dot_tools (Type-1), job_queue.job_kind (Type-2 steps), workflows (Type-3).
Triggers: dot_tools.trigger_type + cron_schedule; PG triggers (pg_trigger); job_queue lease.
Runtime: dot_iu_command_run, job_queue rows (picked/finished/attempts), dot_tools.last_executed.
Events: event_type_registry.
Content: information_unit (SOP/spec IUs), knowledge_documents.
Governance: governance_registry, governance_object_ownership, approval_requests.
Health/logs: kg_quality_log, last_error, docker logs (out-of-band). Clustering keys (generic, not per-process): dots by split_part(code,'_',2); jobs by split_part(job_kind,'.',1); workflows by process_code.

6.2 Stage 2 — Evidence graph

One node per component with: component_ref, name, kind, candidate_code, sub_domain, trigger_type, paired_with, declared_engine, role_hint (producer/verifier/start/step/end/root/child), has_runtime_evidence, in_process_inventory, source_table. Live: v_process_discovery_evidence_graph (113 rows). Edges: pairing (paired_with), parent (parent_workflow_id), and—future—correlation (shared run id), sequence (job_kind order / cron order).

6.3 Stage 3 — Orphan / uncovered scan

Three orphan classes, generic:

ORPHAN_PRODUCER_PAIR_COVERED — component absent from inventory while its pair is present (the dot-kg producer case, ×18).
ORPHAN_UNCOUNTED_DOT / ORPHAN_UNCOUNTED_COMPONENT — component not in inventory at all. Live: v_process_discovery_orphan_components (84 rows). Also: candidate-missing-components (a producer with no verifier, or vice-versa) and component-orphan (a job step never reached). This stage exists because the existing v_axis_process_inventory silently drops on-demand DOTs — the engine's first job is to expose that blind spot.

6.4 Stage 4 — Quality scoring / verification

Per candidate, five independent evidence axes (weights):

axis	signal	weight
start	has producer/start/root component	20
end	has verifier/end component	20
cross-component correlation	≥2 components share a run id (e.g. `job_queue.run_id`)	25
runtime evidence	any component has executed	25
pairing/structure	any paired component	10
`confidence_score` = Σ (0–100). Live: `v_process_discovery_quality_score`. Correlation + runtime weighted highest because they are exactly what distinguishes a real running process from a declared one — the gap the whole DOT layer currently sits in (0 executions). False-grouping risk is mitigated by clustering on stable naming keys and reporting `members_orphaned` so over/under-grouping is visible.

6.5 Stage 5 — Label / classification

Type: TYPE_1_DOT_CONTAINED · TYPE_2_AUTOMATED_MULTI · TYPE_3_HUMAN_IN_LOOP (from source_table + structure).
Readiness band: not_a_process (≤1 member) · weak_candidate · strong_candidate_structural (start+end, no runtime) · verified_candidate (start+end+runtime+correlation). Live in quality_score. No candidate may skip to verified without runtime+correlation — enforced by the band logic, not by reviewer opinion.

6.6 Stage 6 — Birth admission (proposal only)

Only verified_candidate → BIRTH_READY_PENDING_OWNER; everything else → a specific BLOCKED_* status with the missing evidence named. Live: v_process_discovery_birth_ready_queue. Birth itself (insert process definition / components / relations, reusing the MOW substrate per the AX-PROCESS pilot) is owner-gated and out of engine scope — the engine produces the queue, a human admits. No blind birth; births are unretirable, so the gate is advisory-strict.

6.7 Stage 7 — Governance handoff

For each birth-ready candidate, emit the governance packet fields: owner (governance_registry / governance_object_ownership), lifecycle policy (Đ34 draft→active→deprecated→retired), health policy (which verifier/HEALTH DOT, what SLA), escalation, change policy (workflow_change_requests / approval_requests with action='review' — never 'add', which auto-approves + births). Today every candidate is OWNER_MISSING (ownership table empty system-wide).

6.8 Stage 8 — Registries-Pivot publish

Separate candidate vs official surfaces. Candidates render from the discovery views under AX-PROCESS as report-only (PIV-3xx style), with warning flags (members_orphaned, birth_gate_status, no_runtime_evidence) visible. Official processes only after birth. Nuxt renders the contract; PG computes all counts/scores (no Nuxt math). See doc 09.

6.9 Stage 9 — Continuous drift update

Live: v_process_discovery_drift_signals computes per-candidate component_hash, trigger_hash, route_hash. Drift detection = compare current hashes to a stored baseline:

new/removed component → component_hash changes;
trigger reconfigured → trigger_hash changes;
role/sequence reorder → route_hash changes. On change, raise a system_issue / change_request (proposal). v1 computes the hashes; v2 adds a process_discovery_baseline table (the one new table the engine needs) to persist last-seen hashes and diff automatically.

6.10 Required tables / views / functions

Live now (read-only): the 6 v_process_discovery_* views (doc 07).
Needed for full lifecycle (future, owner-gated):
1. process_discovery_baseline (persist hashes for drift) — only genuinely new table.
2. Reuse AX-PROCESS substrate for births (no island table): axis_registry, fn_process_node_substrate, v_axis_process_pivots.
3. A correlation column — see §6.11.
4. event_type_registry rows for KG (declare the producer events).

6.11 Correlation_id / process_run_id (the critical addition)

Today: dot_iu_command_run.run_id is per-command; job_queue.run_id correctly groups a pipeline run (why job:cut scores verified). Gap: DOT-contained processes have no cross-DOT correlation — you cannot stitch EXTRACT→VALIDATE into one process run. Proposal (design only, see next macro): add a nullable process_run_id (uuid) to dot_iu_command_run (and a correlation_id convention across emitted events), populated when a process orchestrates multiple DOTs. This is the single highest-leverage change to move TYPE_1 families from strong_candidate_structural to verified_candidate.

6.12 Hermes / Agent double-check fit

The KG family already embeds double-check structurally: every producer (Cấp B) is paired with a verifier (Cấp A); "Cấp A IDLE = producer correct." Map this onto the engine: the verifier DOT is the in-process Hermes check; the discovery engine is the meta-Agent double-check that audits whether both halves are present, correlated, and running. Adversarial verification at the engine level = the orphan scan (does every covered verifier have its producer counted?) + the correlation check (do paired runs actually link?).

6.13 What can be automated vs must stay review

Automate: stages 1–5 + 8–9 (scan, evidence graph, orphan scan, scoring, labelling, publish, drift) — all read-only, all live or apply-ready.
Review/owner-gated: stage 6 birth admission, stage 7 governance handoff (ownership, lifecycle, change policy), and any write to event_type_registry / correlation columns. The engine never crosses from proposing to enacting.

6.14 No hardcoding

Every rule keys off generic columns (trigger_type, paired_dot, run_id, job_kind, process_code) and naming-convention splits, not literal kg/dot-kg strings. Proven live: the same views score dot:nrm, dot:doc, dot:kb, job:cut, wf:WF-001, etc., with no KG-specific branch.