Template
Incident postmortem template.
A blameless framework for documenting incidents, finding root cause, and preventing recurrence with actionable follow‑ups.
Template sections
What to capture
- Incident summary with impact, duration, and severity.
- Timeline of key events: detection, mitigation, recovery.
- Root cause analysis (5 Whys / fishbone) and contributing factors.
- Customer and stakeholder communications.
- Action items with owners, dates, and verification steps.
Good practices
Run effective postmortems
- Keep tone blameless; focus on systems and guardrails.
- Automate data collection: logs, dashboards, and alerts.
- Track actions to completion and verify with tests or playbooks.
- Share learnings broadly to prevent similar incidents.
- Conduct review meeting within 48 hours of resolution.
Incident severity levels
Classification framework
- SEV-1 (Critical): Full service outage, data loss, security breach.
- SEV-2 (High): Major feature degradation, significant user impact.
- SEV-3 (Medium): Partial feature issues, workaround available.
- SEV-4 (Low): Minor bugs, cosmetic issues, limited impact.
- SEV-1 and SEV-2 incidents always require postmortems.
Root cause analysis
Finding the real cause
- Use 5 Whys technique to drill down to system failures.
- Identify contributing factors beyond immediate trigger.
- Look for process gaps, missing monitoring, or unclear ownership.
- Avoid blaming individuals—focus on improving systems.
- Document what went well, not just what failed.
Action items
Drive improvements
- Each action item must have an owner and due date.
- Prioritize actions based on impact and likelihood of recurrence.
- Include both immediate fixes and long-term improvements.
- Track action items in team's backlog or incident tracker.
- Verify completion with tests, runbooks, or monitoring updates.