The investigation is complete. The root cause is understood. Now what?
A post-mortem review is where incident response transforms into organizational learning. Done well, it prevents recurrence and improves capability. Done poorly—or skipped entirely—it guarantees you'll face the same problems again.
This guide provides a practical framework for conducting AI incident post-mortems that drive genuine improvement.
Executive Summary
- Post-mortems are for learning, not blame: Create psychological safety for honest discussion
- Structure enables consistency: Use a standard format so post-mortems are comparable and complete
- Action items must be tracked: Insights without implementation are worthless
- Timing matters: Too soon and emotions interfere; too late and memory fades
- Include the right people: Those who responded, those who will prevent, and those who can authorize changes
- Share learnings: Organizational learning requires dissemination beyond the immediate team
- Follow through: Post-mortems are only valuable if recommendations are implemented
Why This Matters Now
Most organizations skip or shortcut post-mortems. The incident is over, everyone's tired, and there's pressure to move on. This is a mistake.
Without effective post-mortems:
- The same incidents keep happening
- Teams don't learn from each other's experiences
- Root causes go unaddressed
- Confidence in AI systems erodes
Organizations that invest in post-mortems build resilience. They don't just respond to incidents—they prevent them.
When to Conduct a Post-Mortem
Always conduct post-mortems for:
- Critical severity incidents
- High severity incidents
- Any incident requiring regulatory notification
- Incidents with external impact
- Incidents revealing systemic issues
- Novel incident types
Consider post-mortems for:
- Medium severity incidents
- Near-misses that could have been serious
- Incidents with valuable learning potential
- Request from team members
May skip formal post-mortems for:
- Low severity, routine incidents with known causes
- Incidents already covered by recent similar post-mortems
Post-Mortem Process
Step 1: Schedule and Prepare
Timing: 3-7 days after incident closure
- Soon enough that memory is fresh
- Late enough that emotions have settled
Attendees:
- Incident response team members
- System owners/operators
- Relevant technical experts
- Management stakeholder (to authorize changes)
- Facilitator (ideally not involved in incident)
Preparation:
- Distribute investigation report in advance
- Gather timeline and key facts
- Send pre-read to all participants
- Reserve 90-120 minutes
Step 2: Facilitate the Session
Ground Rules (Facilitator sets these at the start)
- Blameless: We're here to improve systems, not assign blame
- Assume good intentions: Everyone did what made sense to them at the time
- Focus on facts: What happened, not what we imagine happened
- Seek to understand: Ask questions before judging
- All perspectives welcome: Everyone's view adds value
- Constructive only: Criticize problems, not people
Agenda:
| Time | Topic | Purpose |
|---|---|---|
| 10 min | Context setting | Review incident summary, set ground rules |
| 20 min | Timeline review | Walk through what happened |
| 20 min | What went well | Identify what worked in response |
| 30 min | What went wrong | Identify failures and contributing factors |
| 20 min | Improvement actions | Develop specific, actionable improvements |
| 10 min | Wrap-up | Confirm action items, assign owners |
Step 3: Document Findings
Post-Mortem Report Template
AI INCIDENT POST-MORTEM
Incident ID: [ID]
Date of Incident: [Date]
Date of Post-Mortem: [Date]
Facilitator: [Name]
Attendees: [Names and roles]
INCIDENT SUMMARY
[2-3 paragraph summary of what happened]
TIMELINE
[Key events with timestamps]
- [Time]: [Event]
- [Time]: [Event]
IMPACT
- Users affected: [Number]
- Duration: [Time]
- Business impact: [Description]
- Other impacts: [Description]
ROOT CAUSES
1. [Primary root cause]
- Contributing factor: [Detail]
- Contributing factor: [Detail]
2. [Secondary root cause if applicable]
- Contributing factor: [Detail]
WHAT WENT WELL
- [Item 1]
- [Item 2]
- [Item 3]
WHAT WENT WRONG
- [Item 1]: [Description and impact]
- [Item 2]: [Description and impact]
- [Item 3]: [Description and impact]
WHERE WE GOT LUCKY
[Things that could have made this worse but didn't]
- [Item 1]
LESSONS LEARNED
1. [Lesson 1]
2. [Lesson 2]
3. [Lesson 3]
ACTION ITEMS
| # | Action | Owner | Due Date | Status |
|---|--------|-------|----------|--------|
| 1 | [Specific action] | [Name] | [Date] | Open |
| 2 | [Specific action] | [Name] | [Date] | Open |
| 3 | [Specific action] | [Name] | [Date] | Open |
METRICS TO TRACK
- [Metric that would indicate improvement]
- [Metric that would indicate recurrence]
FOLLOW-UP
- Next review date: [Date]
- Distribution: [Who receives this document]
Step 4: Track Action Items
Good action items are:
- Specific: Clear what needs to be done
- Assigned: One owner (not "the team")
- Time-bound: Due date specified
- Measurable: Can verify completion
Bad action items:
- "Be more careful"
- "Improve monitoring"
- "Train people better"
Good action items:
- "Implement alerting for model accuracy below 85% by [date] - Owner: [Name]"
- "Add input validation for [specific field] by [date] - Owner: [Name]"
- "Update runbook section 3.4 to include [procedure] by [date] - Owner: [Name]"
Tracking:
- Review action items weekly until complete
- Report on completion to management
- Don't close post-mortem until all critical actions are done
Step 5: Share Learnings
Internal sharing:
- Post to incident learning repository
- Brief relevant teams
- Include in regular safety/quality meetings
- Update training materials
Consider sharing:
- Across departments facing similar AI risks
- At organizational learning forums
- Anonymized sharing in industry groups (if appropriate)
Post-Mortem Templates
Template 1: Standard Post-Mortem (Medium/High Severity)
[Full template provided above]
Template 2: Brief Post-Mortem (Low Severity / Near-Miss)
BRIEF POST-MORTEM
Incident: [One-line description]
Date: [Date]
Severity: [Level]
Reviewed by: [Names]
WHAT HAPPENED
[2-3 sentences]
ROOT CAUSE
[One sentence]
KEY LEARNING
[One sentence]
ACTION ITEM
| Action | Owner | Due |
|--------|-------|-----|
| [Action] | [Name] | [Date] |
No detailed post-mortem required because: [Reason]
Template 3: Major Incident Post-Mortem (Critical Severity)
[Use standard template with additions:]
ADDITIONAL SECTIONS FOR MAJOR INCIDENTS
EXTERNAL COMMUNICATION REVIEW
- What we communicated: [Summary]
- What worked: [Items]
- What could improve: [Items]
REGULATORY INTERACTION REVIEW
- Notifications made: [List]
- Regulator response: [Summary]
- Lessons for future notifications: [Items]
RECOVERY ASSESSMENT
- Recovery time: [Duration]
- Recovery completeness: [Assessment]
- Recovery gaps: [Items]
COST ANALYSIS
- Direct costs: [Amount]
- Indirect costs: [Amount]
- Opportunity costs: [Amount]
- Potential avoided costs (if improvements made): [Amount]
EXECUTIVE SUMMARY
[One-page summary suitable for board/executive distribution]
Discussion Questions for Post-Mortems
On Detection
- How was the incident detected?
- Could we have detected it earlier?
- What monitoring was in place? What was missing?
- Did alerts fire? Were they actionable?
On Response
- Did we have the right people engaged?
- Was escalation appropriate?
- Were procedures followed? Were they adequate?
- What slowed us down?
On Containment
- How quickly was containment achieved?
- Was the containment approach appropriate?
- What tools or access did we lack?
- Could we contain faster next time?
On Communication
- Did the right people know what was happening?
- Was communication timely and clear?
- Were stakeholders appropriately informed?
- What communication gaps existed?
On Root Cause
- Have we found the true root cause or just symptoms?
- Why did our controls fail to prevent this?
- What systemic issues contributed?
- Have we seen similar issues before?
On Prevention
- What would have prevented this incident?
- What changes would reduce similar risks?
- Are we treating symptoms or causes?
- How do we know our fixes will work?
Common Failure Modes
1. Blame Culture
People don't speak honestly because they fear consequences. Solution: Explicit blameless principles, leadership modeling, separating post-mortems from performance evaluation.
2. Surface Analysis
Stopping at the obvious cause without digging deeper. Solution: Use structured root cause techniques, ask "why" repeatedly.
3. Action Item Graveyard
Items identified but never implemented. Solution: Track completion, escalate delays, tie to regular work planning.
4. Wrong Attendees
Missing key perspectives or including too many people. Solution: Thoughtful attendee selection, keep groups focused.
5. Rushed Sessions
Not allowing enough time for thorough discussion. Solution: Protect 90-120 minutes, don't shortcut.
6. No Follow-Through
Post-mortem happens but learnings aren't disseminated. Solution: Required sharing, learning repositories, training updates.
7. Skipping Post-Mortems
Pressure to move on. Solution: Make post-mortems mandatory for qualifying incidents, schedule them automatically.
Implementation Checklist
Building the Capability
- Define post-mortem criteria (when required)
- Create templates and procedures
- Train facilitators
- Establish action tracking mechanism
- Create learning sharing channels
- Gain leadership commitment to blameless culture
For Each Post-Mortem
- Schedule within 3-7 days of incident closure
- Distribute pre-read materials
- Facilitate session with ground rules
- Document findings completely
- Assign action items with owners and dates
- Track action completion
- Share learnings
- Close when all critical actions complete
Metrics to Track
| Metric | Target | Purpose |
|---|---|---|
| Post-mortem completion rate | 100% for qualifying incidents | Ensure reviews happen |
| Time to post-mortem | 3-7 days | Ensure timely review |
| Action item completion rate | >90% on time | Ensure follow-through |
| Recurrence rate | <10% within 12 months | Measure effectiveness |
| Lessons shared | 100% to relevant audiences | Ensure learning spreads |
Taking Action
Post-mortems are where incidents become learning. The organizations that improve their AI systems fastest are those that treat every incident as an opportunity to get better.
Don't skip post-mortems. Don't rush them. And most importantly—don't let action items die in a document. Follow through until improvements are real.
Ready to build effective AI incident learning processes?
Pertama Partners helps organizations establish post-mortem practices that drive genuine improvement. Our AI Readiness Audit includes incident response and continuous improvement assessment.
Practical Next Steps
To put these insights into practice for ai incident post, consider the following action items:
- Establish a cross-functional governance committee with clear decision-making authority and regular review cadences.
- Document your current governance processes and identify gaps against regulatory requirements in your operating markets.
- Create standardized templates for governance reviews, approval workflows, and compliance documentation.
- Schedule quarterly governance assessments to ensure your framework evolves alongside regulatory and organizational changes.
- Build internal governance capabilities through targeted training programs for stakeholders across different business functions.
Effective governance structures require deliberate investment in organizational alignment, executive accountability, and transparent reporting mechanisms. Without these foundational elements, governance frameworks remain theoretical documents rather than living operational systems.
The distinction between mature and immature governance programs often comes down to enforcement consistency and stakeholder engagement breadth. Organizations that treat governance as an ongoing discipline rather than a checkbox exercise develop significantly more resilient operational capabilities.
Common Questions
Focus on learning, not blame. Document what happened, why it happened, what worked in response, what didn't, and specific improvements. Assign owners and timelines for action items.
Establish ground rules that focus on systems and processes, not individuals. Assume people made reasonable decisions with available information. Look for systemic improvements.
Include incident timeline, root causes, contributing factors, impact assessment, response effectiveness evaluation, and specific action items with owners and deadlines.
References
- AI Risk Management Framework (AI RMF 1.0). National Institute of Standards and Technology (NIST) (2023). View source
- Cybersecurity Framework (CSF) 2.0. National Institute of Standards and Technology (NIST) (2024). View source
- ISO/IEC 42001:2023 — Artificial Intelligence Management System. International Organization for Standardization (2023). View source
- Model AI Governance Framework (Second Edition). PDPC and IMDA Singapore (2020). View source
- OWASP Top 10 for Large Language Model Applications 2025. OWASP Foundation (2025). View source
- EU AI Act — Regulatory Framework for Artificial Intelligence. European Commission (2024). View source
- Guide on Managing and Notifying Data Breaches Under the PDPA. Personal Data Protection Commission Singapore (2021). View source

