KB-770E
investigation
3 min read Revision 1
Playbook: Investigation
Use this playbook when investigating bugs, errors, or technical questions.
Before Starting
Load Context Packs
-
governance— Verify-don't-guess principle - Relevant domain pack (agent-data, infrastructure, web-frontend, directus)
-
current-sprint— Current context
Define the Question
- What exactly is broken or unclear?
- When did it start? (Check recent commits/deploys)
- What is the expected behavior vs actual behavior?
- Who reported it and what evidence exists?
Investigation Process
Step 1: Gather Evidence (DO NOT GUESS)
- Read actual error messages/logs
- Read the actual source code involved
- Check git log for recent changes:
git log --oneline -20 - Check environment: local vs cloud, which endpoint
# Common diagnostic commands
curl -s http://localhost:8000/health | python3 -m json.tool
curl -s http://localhost:8000/info | python3 -m json.tool
gcloud run services logs read agent-data-test --region=asia-southeast1 --limit=20
Step 2: Reproduce
- Can you reproduce the issue locally?
- What are the exact steps to trigger it?
- Document the reproduction steps
Step 3: Narrow Down
- Is it a data issue? (Check Firestore/Qdrant)
- Is it a code issue? (Read the function, trace the flow)
- Is it a config issue? (Check env vars, secrets)
- Is it a deployment issue? (Compare local vs cloud behavior)
Step 4: Root Cause
- Identify the exact line/config causing the issue
- Understand WHY it happens (not just WHERE)
- Check if this is a known pattern or new issue
Step 5: Fix Assessment
- Is the fix within scope of current ticket?
- Does the fix follow governance rules? (No new code, hybrid, etc.)
- What's the blast radius of the fix?
- Can we verify the fix without side effects?
Documenting Findings
Investigation Report Format
Write to: docs/operations/research/WEB-XX-investigation.md
# [WEB-XX] Investigation: <Title>
## Question
What we set out to understand.
## Evidence
What we found (logs, code snippets, test results).
## Root Cause
The actual reason for the behavior.
## Recommendation
What to do about it.
## Verification
How we confirmed our findings.
Upload to Agent Data
# Via MCP tool
upload_document(path="docs/operations/research/web-XX-investigation.md", content="...")
Rules
- NEVER guess the cause — verify with actual code/data
- NEVER fix something you don't understand
- ALWAYS document findings even if issue was minor
- If issue is out of scope → REPORT with findings, don't fix
- Keep investigation focused — don't go down unrelated rabbit holes