It's 2 AM. Your AI system just made a decision that affected thousands of customers. Something went wrong. The board is asking what happened. Regulators want answers. And your team is scrambling to figure out what to do.
This is not the moment to create your incident response plan.
AI systems create new categories of incidents—model failures, data leakage, biased decisions, adversarial attacks—that traditional IT incident response doesn't fully address. You need an AI-specific incident response plan ready before incidents occur.
This guide provides a template and framework for building that plan.
Executive Summary
- AI incidents differ from traditional IT incidents in detection, investigation, and remediation approaches
- Categories include: Model failure, data breach, bias incidents, security attacks, governance violations, and third-party AI failures
- Response must be fast but also careful—rushing can cause additional harm
- Roles and responsibilities must be clear before an incident occurs
- Documentation is essential for investigation, regulatory response, and improvement
- Notification requirements vary by jurisdiction and incident type—know your obligations
- Post-incident review prevents recurrence and improves capability
- Regular testing ensures the plan works when needed
Why This Matters Now
AI incidents are inevitable. The question isn't whether your AI systems will experience problems—it's whether you'll be prepared when they do.
Several factors make AI incident response urgent:
AI failures can scale instantly. A traditional software bug might affect one user at a time. An AI model problem can affect every decision the system makes—potentially thousands before anyone notices.
Detection is harder. Traditional systems fail obviously (error messages, downtime). AI systems can fail subtly—making increasingly bad decisions while appearing to function normally.
Regulatory expectations are rising. Regulators expect organisations to have AI incident response capabilities. In some jurisdictions, AI-related breaches have specific notification requirements.
Reputational stakes are high. AI incidents—especially those involving bias or privacy—attract media attention and public concern in ways traditional technical failures don't.
What Constitutes an AI Incident?
An AI incident is any event involving AI systems that:
- Causes or threatens to cause harm (to individuals, the organisation, or third parties)
- Violates laws, regulations, or organisational policies
- Compromises data security or privacy
- Results in significant business impact
- Creates reputational risk
- Represents unexpected or unexplained AI behavior
AI Incident Categories
| Category | Examples | Key Considerations |
|---|---|---|
| Model Failure | Degraded accuracy, incorrect predictions, hallucinations | May be gradual; detection challenging |
| Data Breach | Personal data exposed via AI, training data leakage | Regulatory notification may be required |
| Bias Incident | Discriminatory decisions, unfair outcomes | Legal and reputational implications |
| Security Attack | Prompt injection, adversarial manipulation, model extraction | May involve sophisticated actors |
| Governance Violation | Unapproved AI use, policy breach, shadow AI | Internal investigation needed |
| Third-Party AI Failure | Vendor AI system failure affecting your operations | Contractual and operational implications |
| Output Harm | AI-generated content causes harm (misinformation, harmful advice) | May have legal liability implications |
AI Incident Response Plan Template
Section 1: Purpose and Scope
Purpose This plan establishes procedures for responding to incidents involving artificial intelligence systems used by [Organisation Name].
Scope This plan applies to:
- All AI systems owned or operated by the organisation
- AI systems provided by third parties that process organisational data
- AI systems used by employees in the course of their work
Objectives
- Minimise harm from AI incidents
- Ensure rapid, coordinated response
- Meet regulatory and contractual notification obligations
- Preserve evidence for investigation
- Learn from incidents to prevent recurrence
Section 2: Roles and Responsibilities
AI Incident Response Team (AI-IRT)
| Role | Responsibilities | Primary Contact |
|---|---|---|
| Incident Commander | Overall coordination, decision authority, external communication | [Name, contact] |
| Technical Lead | Technical investigation, containment, remediation | [Name, contact] |
| Data Protection Officer | Privacy implications, regulatory notification | [Name, contact] |
| Legal Counsel | Legal implications, liability, regulatory response | [Name, contact] |
| Communications Lead | Internal/external messaging, media response | [Name, contact] |
| Business Owner | Business impact assessment, customer implications | [Name, contact] |
| AI/ML Specialist | Model-specific investigation, technical expertise | [Name, contact] |
Escalation contacts:
- Executive Sponsor: [Name, contact]
- Board notification: [Process]
- Regulatory notification: [Process]
Section 3: Incident Classification
Severity Levels
| Severity | Definition | Response Time | Escalation |
|---|---|---|---|
| Critical | Active harm occurring; significant data breach; regulatory notification required; major business impact | Immediate (<1 hour) | Executive immediately; Board within 4 hours |
| High | Potential for significant harm; contained breach; likely regulatory interest | <4 hours | Executive within 4 hours |
| Medium | Limited harm; internal policy violation; moderate business impact | <24 hours | Management within 24 hours |
| Low | Minimal impact; improvement opportunity; near-miss | <72 hours | Normal reporting |
Classification Criteria
Consider when assessing severity:
- Number of people affected
- Type of data involved
- Harm potential (financial, physical, reputational)
- Regulatory implications
- Business continuity impact
- Media/reputational risk
- Reversibility of harm
Section 4: Response Procedures
Phase 1: Detection and Initial Assessment (0-2 hours)
| Step | Action | Owner | Documentation |
|---|---|---|---|
| 1.1 | Receive incident report or alert | On-call/reporting party | Incident log |
| 1.2 | Perform initial triage | On-call responder | Triage form |
| 1.3 | Classify severity | On-call + Incident Commander | Classification record |
| 1.4 | Activate AI-IRT if Medium+ | Incident Commander | Activation log |
| 1.5 | Document initial facts | Technical Lead | Incident record |
| 1.6 | Preserve evidence | Technical Lead | Evidence log |
Phase 2: Containment (2-4 hours for Critical/High)
| Step | Action | Owner | Documentation |
|---|---|---|---|
| 2.1 | Assess containment options | Technical Lead + AI Specialist | Options assessment |
| 2.2 | Decide containment approach | Incident Commander | Decision log |
| 2.3 | Implement containment | Technical Lead | Implementation record |
| 2.4 | Verify containment effective | Technical Lead | Verification record |
| 2.5 | Assess collateral impact | Business Owner | Impact assessment |
| 2.6 | Communicate containment status | Communications Lead | Communication log |
Containment Options:
- Disable affected AI system
- Route to manual processing
- Apply input/output filters
- Reduce AI authority (human approval)
- Isolate affected data
- Revoke access
Phase 3: Investigation (4-48 hours)
| Step | Action | Owner | Documentation |
|---|---|---|---|
| 3.1 | Define investigation scope | Incident Commander | Scope document |
| 3.2 | Collect and preserve evidence | Technical Lead | Evidence chain of custody |
| 3.3 | Conduct technical analysis | AI Specialist | Technical analysis report |
| 3.4 | Determine root cause | Technical Lead | Root cause analysis |
| 3.5 | Assess full impact | Business Owner + DPO | Impact assessment |
| 3.6 | Document findings | Technical Lead | Investigation report |
Phase 4: Notification (as required)
| Step | Action | Owner | Documentation |
|---|---|---|---|
| 4.1 | Assess notification obligations | DPO + Legal | Notification assessment |
| 4.2 | Prepare notification content | Communications + Legal | Notification drafts |
| 4.3 | Obtain approvals | Incident Commander | Approval record |
| 4.4 | Execute notifications | Communications Lead | Notification log |
| 4.5 | Document compliance | DPO | Compliance record |
Phase 5: Remediation (varies)
| Step | Action | Owner | Documentation |
|---|---|---|---|
| 5.1 | Develop remediation plan | Technical Lead | Remediation plan |
| 5.2 | Implement fixes | Technical Lead | Implementation record |
| 5.3 | Test remediation | AI Specialist | Test results |
| 5.4 | Monitor for recurrence | Technical Lead | Monitoring log |
| 5.5 | Return to normal operations | Incident Commander | Closure decision |
Phase 6: Post-Incident (1-2 weeks after closure)
| Step | Action | Owner | Documentation |
|---|---|---|---|
| 6.1 | Conduct post-mortem | Incident Commander | Post-mortem report |
| 6.2 | Identify improvements | All | Improvement list |
| 6.3 | Update procedures | Relevant owners | Updated documentation |
| 6.4 | Share lessons learned | Communications Lead | Lessons learned summary |
| 6.5 | Close incident | Incident Commander | Closure record |
Section 5: Communication Templates
Internal Escalation Template
SUBJECT: [SEVERITY] AI Incident - [Brief Description]
SUMMARY:
An AI incident has been identified requiring [immediate/urgent/routine] attention.
DETAILS:
- System affected: [Name]
- Discovered: [Date/Time]
- Severity: [Level]
- Current status: [Investigating/Contained/Remediating]
IMPACT:
- Users/customers affected: [Number/scope]
- Data involved: [Type]
- Business impact: [Description]
CURRENT ACTIONS:
[List of actions being taken]
REQUIRED DECISIONS:
[List any decisions needed from escalation recipients]
NEXT UPDATE: [Time]
Contact: [Incident Commander name and contact]
External Notification Template (General)
SUBJECT: Important Notice Regarding [Service/System]
Dear [Recipient],
We are writing to inform you of an incident affecting [description].
What happened:
[Brief, factual description without speculation]
What information was involved:
[If applicable]
What we are doing:
[Actions taken and in progress]
What you can do:
[Any actions recommended for recipients]
For more information:
[Contact details, FAQ link]
We take this matter seriously and are committed to [resolving the issue/protecting your information].
Sincerely,
[Name, Title]
Section 6: Specific Incident Playbooks
Playbook A: AI Model Failure/Degradation
- Confirm model is producing incorrect/degraded outputs
- Document examples of failures
- Assess scope (what decisions/outputs affected)
- Implement containment (fallback model, human review, disable)
- Investigate cause (data drift, model drift, training issue)
- Determine affected decisions that may need review
- Plan remediation (retrain, rollback, replace)
- Test before restoring
- Monitor closely after restoration
Playbook B: AI Data Breach
- Confirm data exposure and scope
- Preserve evidence (logs, access records)
- Contain breach (revoke access, isolate system)
- Assess regulatory notification requirements
- Identify affected individuals
- Prepare and execute notifications
- Investigate root cause
- Implement preventive measures
- Document compliance
Playbook C: AI Bias Incident
- Document specific evidence of bias
- Assess scope (how many decisions, over what period)
- Determine impact on affected individuals
- Contain (add human review, adjust thresholds, disable)
- Investigate root cause (training data, model design, implementation)
- Assess remediation for affected individuals
- Develop bias mitigation measures
- Test for bias before restoring
- Implement ongoing bias monitoring
Section 7: Testing and Maintenance
Plan Testing
- Tabletop exercise: Quarterly
- Simulation exercise: Annually
- Contact verification: Monthly
Plan Updates
- Review after each incident
- Annual comprehensive review
- Update when AI systems change
- Update when regulations change
Common Failure Modes
1. No Plan at All
Creating a plan during an incident guarantees poor response. Build and test the plan before you need it.
2. Generic IT Plan Only
Traditional IT incident response doesn't address AI-specific issues (model behavior, bias, explainability). AI-specific procedures are essential.
3. Unclear Ownership
"Someone will handle it" means no one handles it effectively. Clear roles and contacts must be designated.
4. Untested Plan
A plan that's never been exercised will fail when needed. Regular testing reveals gaps.
5. Missing AI Expertise
Incident response teams without AI/ML expertise will struggle with AI-specific investigation and remediation.
6. Poor Documentation
Inadequate documentation during response creates problems for investigation, compliance, and improvement.
7. Rushing Remediation
Pressure to restore service quickly can lead to incomplete fixes and recurrence.
Implementation Checklist
Plan Development
- Identify AI systems in scope
- Define incident categories relevant to your AI use
- Establish severity classification criteria
- Define roles and assign individuals
- Create escalation paths
- Develop phase-by-phase procedures
- Create communication templates
- Develop incident-specific playbooks
- Document notification requirements by jurisdiction
- Establish evidence preservation procedures
Preparation
- Train AI-IRT members
- Distribute plan to responders
- Establish communication channels
- Create incident logging system
- Verify escalation contacts
- Prepare response toolkits
Testing
- Conduct tabletop exercise
- Review and incorporate learnings
- Schedule regular testing
Maintenance
- Assign plan owner
- Schedule regular reviews
- Establish update triggers
Metrics to Track
Response Effectiveness
| Metric | Measurement | Target |
|---|---|---|
| Time to detection | Discovery time - incident start | Minimize |
| Time to containment | Containment time - detection time | <4 hours for Critical |
| Time to resolution | Resolution - detection | Minimize |
| Notification compliance | Within required timeframe | 100% |
Response Quality
| Metric | Measurement | Target |
|---|---|---|
| Recurrence rate | Same incident type within 6 months | <10% |
| Post-mortem completion | % of Medium+ incidents reviewed | 100% |
| Improvement implementation | % of identified improvements made | >80% |
Frequently Asked Questions
How often should we test our AI incident response plan?
Tabletop exercises quarterly, full simulations annually. Also test after significant changes to AI systems or the plan itself.
What's the difference between an AI incident and a regular IT incident?
AI incidents may involve unique elements: model behavior anomalies, bias manifestation, training data issues, explainability challenges. Standard IT procedures may not address these adequately.
Should we have separate teams for AI incidents and regular IT incidents?
Integrate, don't separate. AI incident response should extend existing IT incident response capabilities, with AI-specific expertise available when needed.
How do we detect AI incidents that don't cause obvious failures?
Implement AI monitoring—track model performance, output distributions, decision patterns. Many AI problems manifest gradually, not as sudden failures.
What if our AI vendor's system causes an incident?
Your incident response still applies. You're responsible to your customers and regulators regardless of where the AI is hosted. Your plan should cover vendor AI with appropriate escalation to the vendor.
Taking Action
The time to build your AI incident response capability is before you need it. An incident at 2 AM is not the moment to figure out who's responsible, what to do, or who to notify.
Build the plan. Assign the roles. Test the procedures. When the incident comes—and it will—you'll be ready.
Ready to strengthen your AI incident response capability?
Pertama Partners helps organisations build comprehensive AI incident response plans tailored to their systems and regulatory environment. Our AI Readiness Audit includes incident response capability assessment.
Disclaimer
This guide provides general information about AI incident response planning. It does not constitute legal advice. Notification requirements and compliance obligations vary by jurisdiction and should be verified with legal counsel. Organisations should adapt this framework to their specific context and regulatory environment.
References
- NIST. (2024). AI Risk Management Framework.
- ENISA. (2024). AI Cybersecurity Challenges.
- Singapore IMDA. (2024). Model AI Governance Framework.
- ISO/IEC 27001. Information Security Management.
- PDPC Singapore. (2024). Guide to Managing Data Breaches 2.0.
Frequently Asked Questions
Include incident classification criteria, response procedures, roles and responsibilities, escalation paths, communication templates, containment steps, and post-incident review processes.
Response time depends on severity. Critical incidents (safety, major data breach) require immediate response. Have pre-defined response times for each severity level.
Core team includes AI/technical leads, security, legal, communications, and affected business units. Executive involvement scales with severity. Define roles before incidents occur.
References
- NIST. (2024). *AI Risk Management Framework*.. NIST *AI Risk Management Framework* (2024)
- ENISA. (2024). *AI Cybersecurity Challenges*.. ENISA *AI Cybersecurity Challenges* (2024)
- Singapore IMDA. (2024). *Model AI Governance Framework*.. Singapore IMDA *Model AI Governance Framework* (2024)
- ISO/IEC 27001. Information Security Management.. ISO/IEC Information Security Management
- PDPC Singapore. (2024). *Guide to Managing Data Breaches 2.0*.. PDPC Singapore *Guide to Managing Data Breaches * (2024)

