KB-770E

investigation

3 min read Revision 1

Playbook: Investigation

Use this playbook when investigating bugs, errors, or technical questions.

Before Starting

Load Context Packs

  • governance — Verify-don't-guess principle
  • Relevant domain pack (agent-data, infrastructure, web-frontend, directus)
  • current-sprint — Current context

Define the Question

  • What exactly is broken or unclear?
  • When did it start? (Check recent commits/deploys)
  • What is the expected behavior vs actual behavior?
  • Who reported it and what evidence exists?

Investigation Process

Step 1: Gather Evidence (DO NOT GUESS)

  • Read actual error messages/logs
  • Read the actual source code involved
  • Check git log for recent changes: git log --oneline -20
  • Check environment: local vs cloud, which endpoint
# Common diagnostic commands
curl -s http://localhost:8000/health | python3 -m json.tool
curl -s http://localhost:8000/info | python3 -m json.tool
gcloud run services logs read agent-data-test --region=asia-southeast1 --limit=20

Step 2: Reproduce

  • Can you reproduce the issue locally?
  • What are the exact steps to trigger it?
  • Document the reproduction steps

Step 3: Narrow Down

  • Is it a data issue? (Check Firestore/Qdrant)
  • Is it a code issue? (Read the function, trace the flow)
  • Is it a config issue? (Check env vars, secrets)
  • Is it a deployment issue? (Compare local vs cloud behavior)

Step 4: Root Cause

  • Identify the exact line/config causing the issue
  • Understand WHY it happens (not just WHERE)
  • Check if this is a known pattern or new issue

Step 5: Fix Assessment

  • Is the fix within scope of current ticket?
  • Does the fix follow governance rules? (No new code, hybrid, etc.)
  • What's the blast radius of the fix?
  • Can we verify the fix without side effects?

Documenting Findings

Investigation Report Format

Write to: docs/operations/research/WEB-XX-investigation.md

# [WEB-XX] Investigation: <Title>

## Question
What we set out to understand.

## Evidence
What we found (logs, code snippets, test results).

## Root Cause
The actual reason for the behavior.

## Recommendation
What to do about it.

## Verification
How we confirmed our findings.

Upload to Agent Data

# Via MCP tool
upload_document(path="docs/operations/research/web-XX-investigation.md", content="...")

Rules

  • NEVER guess the cause — verify with actual code/data
  • NEVER fix something you don't understand
  • ALWAYS document findings even if issue was minor
  • If issue is out of scope → REPORT with findings, don't fix
  • Keep investigation focused — don't go down unrelated rabbit holes