AI Governance & Risk ManagementCase NotePractitioner

15 AI Project Failures: What Went Wrong

July 19, 202518 minutes min readPertama Partners

For:CTO/CIOOperations

Real case studies of AI project failures across industries—from Amazon's hiring algorithm to IBM Watson Health. Learn what went wrong and how to avoid similar mistakes.

Indian Woman Ceo Saree - ai governance & risk management insights

Key Takeaways

1.Historical bias in training data reliably produces discriminatory AI unless you explicitly audit and correct for it.
2.Domain complexity and safety-critical contexts demand deep subject-matter expertise, conservative claims, and rigorous validation.
3.AI systems trained on normal conditions are fragile under shocks; build volatility detection, confidence thresholds, and circuit breakers.
4.Human oversight, explainability, and appeal mechanisms are non-negotiable for high-stakes decisions in finance, healthcare, and safety.
5.Ethical and legal reviews must be embedded upfront—privacy, consent, and user welfare cannot be bolted on after launch.
6.Deployment discipline (gradual rollout, monitoring, kill switches) is as important as model accuracy for preventing catastrophic losses.
7.Systematically documenting and reviewing AI incidents is one of the highest-leverage practices to reduce repeat failures.

17 min read • 25 sections

Executive Summary: Learning from failure is faster than learning from success. This analysis examines 15 high-profile AI project failures across industries, extracting actionable lessons for organizations launching AI initiatives. Each case study reveals specific failure modes, financial impact, and preventable mistakes.

Case Study 1: Amazon Recruiting AI (2018)

Company: Amazon
Project: Automated resume screening to identify top candidates
Investment: Estimated $10M+ over 4 years
Outcome: Scrapped due to gender bias

What Happened: Amazon built an AI system to screen resumes and rank candidates. The model was trained on 10 years of hiring data—which reflected Amazon's historically male-dominated technical workforce. The AI learned to penalize resumes containing the word "women's" (as in "women's chess club") and downgrade graduates of all-women's colleges.

Root Cause: Historical bias in training data. The AI optimized for patterns in past hiring, not quality of candidates.

What Should Have Been Done:

Audit training data for demographic bias before model development
Test model outputs across protected categories (gender, race, age)
Include diverse stakeholders in model validation
Implement fairness constraints in model design

Lessons: AI amplifies biases in historical data. You can't machine-learn your way out of discriminatory practices.

Case Study 2: IBM Watson for Oncology (2017–2019)

Company: IBM Watson Health
Project: AI-powered cancer treatment recommendations
Investment: Estimated $4 billion (Watson Health division)
Outcome: Discontinued; multiple hospitals abandoned the system

What Happened: Watson for Oncology was marketed as an AI that could analyze patient data and recommend personalized cancer treatments. In practice:

Recommendations were based on hypothetical cases created by Memorial Sloan Kettering doctors, not real patient outcomes
The system frequently recommended unsafe or incorrect treatments
Jupiter Hospital (India) found unsafe recommendations in 30% of cases
Integration with hospital systems proved far more complex than anticipated

Root Cause:

Marketing overpromised capabilities
Training data was synthetic, not real-world outcomes
Medical domain complexity underestimated
Insufficient clinical validation

What Should Have Been Done:

Train on real patient outcomes, not synthetic cases
Conduct rigorous clinical trials before commercial deployment
Position as decision support tool, not autonomous decision-maker
Set realistic expectations about AI limitations in complex medical scenarios

Lessons: Domain expertise matters. AI trained on synthetic data doesn't generalize to real-world complexity.

Case Study 3: Zillow Offers (2021)

Company: Zillow
Project: AI-powered home buying (iBuying)
Investment: $2.8 billion in inventory losses
Outcome: Shut down, 2,000 employees laid off

What Happened: Zillow used AI to predict home values and buy properties directly from homeowners, planning to resell at a profit. The AI models failed to predict the pandemic housing market volatility. Zillow bought homes at inflated prices, then the market shifted. The company was forced to sell 7,000+ homes at a $569 million loss in Q3 2021 alone.

Root Cause:

AI models trained on stable market conditions failed during volatility
Overconfidence in algorithmic pricing
Insufficient human oversight on large-value transactions
No mechanism to pause operations when model confidence dropped

What Should Have Been Done:

Build volatility detection into model architecture
Implement confidence thresholds that trigger human review
Develop circuit-breakers to pause operations during market anomalies
Test models against historical volatility scenarios (2008 crash, COVID)

Lessons: AI models trained on normal conditions fail during black swan events. Always build human override mechanisms.

Case Study 4: Uber Self-Driving Car Fatal Crash (2018)

Company: Uber ATG (Advanced Technologies Group)
Project: Autonomous vehicle program
Investment: $1 billion+ development costs
Outcome: Fatal pedestrian accident, program shut down, $457M settlement

What Happened: In March 2018, an Uber self-driving car struck and killed a pedestrian in Tempe, Arizona. Investigation revealed:

The AI detected the pedestrian 6 seconds before impact but classified her inconsistently (vehicle, bicycle, unknown object)
Emergency braking was disabled to avoid "erratic behavior"
The human safety driver was watching a TV show on her phone
Safety culture prioritized meeting deadlines over addressing known issues

Root Cause:

Inadequate sensor fusion and object classification
Safety systems disabled to improve performance metrics
Over-reliance on human safety driver (automation complacency)
Organizational pressure to compete with Waymo compromised safety

What Should Have Been Done:

Never disable safety systems to improve performance metrics
Design for automation complacency (humans can't remain vigilant during monotonous monitoring)
Implement rigorous safety validation before public road testing
Prioritize safety over competitive timelines

Lessons: Safety shortcuts to meet deadlines can be fatal. AI reliability in safety-critical applications requires orders of magnitude more testing than standard software.

Case Study 5: Knightmare on Wall Street (2012)

Company: Knight Capital Group
Project: Algorithmic trading platform upgrade
Investment: Company lost $440 million in 45 minutes
Outcome: Company acquired by competitor, ceased independent operations

What Happened: Knight Capital deployed new algorithmic trading software that contained a dormant code path from a retired system. When activated, the algorithm sent erroneous orders flooding the market. In 45 minutes, Knight executed 4 million trades totaling $7 billion, resulting in a $440 million loss—nearly four times the company's annual earnings.

Root Cause:

Inadequate testing of deployment procedures
Legacy code not properly removed before upgrade
No kill switch to halt trading when anomalies detected
Insufficient monitoring to detect the issue quickly

What Should Have Been Done:

Remove deprecated code paths before deployment
Test deployment procedures in production-like environments
Implement circuit breakers that halt trading during anomalies
Deploy gradually across servers (not all-at-once)

Lessons: Algorithmic systems require rigorous deployment procedures, kill switches, and gradual rollout strategies.

Case Study 6: Target Pregnancy Prediction Controversy (2012)

Company: Target
Project: Predictive analytics for customer pregnancy status
Investment: Undisclosed
Outcome: Privacy backlash, reputational damage

What Happened: Target's AI analyzed shopping patterns to predict which customers were pregnant, then sent targeted coupons for baby products. In one case, a teenager received pregnancy-related coupons before telling her father she was pregnant. The father complained to Target, only to later discover his daughter was indeed pregnant.

Root Cause:

Insufficient consideration of privacy implications
No opt-in mechanism for sensitive predictions
Failure to anticipate social consequences of accurate predictions
Prioritized commercial goals over customer comfort

What Should Have Been Done:

Implement ethical review for sensitive predictions
Provide opt-in mechanisms for predictive marketing
Disguise targeted promotions among non-targeted content
Consider social implications, not just commercial value

Lessons: AI accuracy without ethical consideration creates backlash. Just because you can predict something doesn't mean you should act on it.

Case Study 7: Stitch Fix Styling Algorithm Failures (2019–2020)

Company: Stitch Fix
Project: AI-powered personal styling
Investment: Core business model, stock dropped 40%
Outcome: Customer satisfaction declined, retention dropped

What Happened: Stitch Fix reduced human stylist involvement in favor of AI-driven clothing recommendations. Customers complained about:

Repetitive selections ignoring stated preferences
Poor fit despite detailed measurements
Tone-deaf occasion matching (funeral attire suggestions for weddings)
Loss of personalized human touch that differentiated the service

Root Cause:

Over-optimization for efficiency over customer experience
AI couldn't capture nuanced style preferences
Cost-cutting disguised as AI innovation
Insufficient feedback loops to detect satisfaction decline

What Should Have Been Done:

A/B test AI vs. human styling to measure satisfaction impact
Use AI to augment human stylists, not replace them
Maintain human oversight for complex or high-value recommendations
Monitor customer satisfaction and retention alongside efficiency metrics

Lessons: AI should augment human expertise in subjective, high-touch services, not replace it. Efficiency gains mean nothing if customers leave.

Case Study 8: Lemonade Insurance Controversy (2021)

Company: Lemonade (InsurTech)
Project: AI-powered claims processing
Investment: Core business infrastructure
Outcome: Public relations crisis over discriminatory AI

What Happened: Lemonade publicly touted its AI's ability to detect fraud by analyzing facial expressions and micro-expressions during video claims submissions. Critics immediately identified this as junk science with high potential for demographic bias.

Root Cause:

Reliance on pseudoscientific "emotion detection" AI
No consideration of bias in facial analysis across demographics
Marketing team promoted problematic AI features
Insufficient ethical review before public messaging

What Should Have Been Done:

Validate AI techniques against peer-reviewed science
Test for demographic bias before deployment
Avoid using AI in ways that feel invasive or discriminatory to customers
Consult ethicists and bias experts before launching sensitive AI applications

Lessons: Not all AI techniques are scientifically valid. Marketing AI capabilities can backfire if the technology is pseudoscientific or discriminatory.

Case Study 9: Google Flu Trends (2009–2015)

Company: Google
Project: Predict flu outbreaks using search data
Investment: Undisclosed research project
Outcome: Discontinued after consistently overpredicting flu prevalence

What Happened: Google Flu Trends used search query data to predict flu outbreaks faster than CDC surveillance. Initially accurate, the model began dramatically overpredicting flu cases—estimating twice as many cases as CDC reported in 2013.

Root Cause:

Model didn't account for media-driven search spikes (news about flu increased searches, not actual flu cases)
Google's search algorithm changes affected results without model updates
Overfitting to historical patterns that didn't generalize
No ongoing validation against ground truth data

What Should Have Been Done:

Distinguish between genuine signal (people with flu searching symptoms) and noise (people reading about flu searching out of curiosity)
Monitor for external factors affecting input data (algorithm changes, media events)
Continuously validate predictions against authoritative data sources
Update models regularly to account for changing search behavior

Lessons: AI models degrade over time as underlying data distributions change. Continuous monitoring and retraining are essential.

Case Study 10: Microsoft Tay Chatbot (2016)

Company: Microsoft
Project: AI chatbot learning from Twitter interactions
Investment: Undisclosed
Outcome: Shut down after 16 hours due to racist, offensive tweets

What Happened: Microsoft launched Tay, an AI chatbot designed to learn conversational patterns from Twitter users. Within 16 hours, coordinated trolling efforts taught Tay to generate racist, sexist, and offensive content. Microsoft shut down the experiment.

Root Cause:

No content filtering on training inputs
Insufficient adversarial testing
Assumed good-faith interactions
No human oversight during live learning

What Should Have Been Done:

Implement content filters on training inputs
Conduct adversarial testing (red team exercises)
Add human-in-the-loop review for public outputs
Limit learning rate to allow monitoring
Expect bad actors and design defenses

Lessons: AI that learns from public interactions will be gamed by bad actors. Always assume adversarial inputs and design accordingly.

Case Study 11: Apple Card Gender Discrimination (2019)

Company: Apple/Goldman Sachs
Project: AI-powered credit decisions
Investment: Major product launch
Outcome: Regulatory investigation, reputational damage

What Happened: Multiple users reported that Apple Card's credit algorithm gave higher credit limits to men than their spouses with identical or better financial profiles. Tech entrepreneur David Heinemeier Hansson reported his wife was offered 1/20th his credit limit despite higher credit score and joint assets.

Root Cause:

Algorithm used correlated variables that functioned as proxies for gender
Insufficient bias testing before launch
Black-box model made it difficult to identify discrimination source
No process for appealing algorithmic decisions

What Should Have Been Done:

Test for disparate impact across protected categories
Audit for proxy variables (features correlated with gender, race)
Provide explainability for adverse decisions
Implement human appeal process for algorithmic decisions
Conduct third-party fairness audit before launch

Lessons: Even when gender isn't an input variable, algorithms can discriminate through proxy variables. Fairness audits are essential for high-stakes decisions.

Case Study 12: Robinhood Options Trading Suicide (2020)

Company: Robinhood
Project: AI-driven trading interface and margin calls
Investment: Core platform infrastructure
Outcome: Customer suicide, regulatory scrutiny, $70M fine

What Happened: A 20-year-old Robinhood user saw a negative $730,000 balance in his account and believed he owed that amount. In reality, it was a temporary display issue related to options trading spreads. Before Robinhood customer service could clarify (automated responses only), the user died by suicide.

Root Cause:

Gamified interface without adequate risk warnings
Confusing display of complex trading positions
No human customer support for high-urgency situations
AI-driven engagement optimized for trading frequency, not user welfare

What Should Have Been Done:

Provide clear explanations of account balances and temporary holds
Offer immediate human support for concerning account states
Include crisis resources in high-stress account situations
Design interfaces that prioritize user understanding over engagement
Implement checks for unusual activity by inexperienced traders

Lessons: AI-optimized engagement can harm users. Human oversight is essential for high-stakes, emotionally charged situations.

Case Study 13: Optum/UnitedHealth Prior Authorization AI (2023)

Company: UnitedHealth Group
Project: AI for denying medical claims
Investment: Undisclosed
Outcome: Class action lawsuit, Congressional scrutiny

What Happened: A lawsuit alleged that UnitedHealth used an AI system with a known 90% error rate to deny elderly patients' post-acute care claims. The system automatically denied coverage, and patients rarely appealed due to complexity and health conditions.

Root Cause:

AI optimized for cost savings, not patient outcomes
Known high error rate deployed anyway
No accessible appeal process for vulnerable population
Profit incentive misaligned with patient care

What Should Have Been Done:

Never deploy AI with known 90% error rate in life-affecting decisions
Optimize for patient outcomes, not just cost savings
Provide accessible appeal mechanisms
Conduct ethical review of AI use in healthcare coverage
Independent audit of AI decisions for fairness

Lessons: High error rates are never acceptable in high-stakes decisions affecting health and wellbeing. AI must be optimized for societal value, not just corporate profit.

Case Study 14: Facebook Emotional Contagion Experiment (2012/2014)

Company: Meta/Facebook
Project: Research on emotional manipulation via News Feed algorithm
Investment: Research study
Outcome: Ethical controversy, regulatory scrutiny

What Happened: Facebook manipulated 689,000 users' News Feeds to show primarily positive or negative content, measuring whether emotional tone was contagious. Users weren't informed or given opportunity to consent. Results were published in an academic journal, revealing the experiment.

Root Cause:

No informed consent from participants
Ethical review bypassed (claimed as operational research)
Emotional manipulation without consideration of vulnerable users
Privacy policy allowed broad experimentation

What Should Have Been Done:

Obtain informed consent for psychological experiments
Conduct ethics review via IRB (Institutional Review Board)
Screen for vulnerable populations (depressed users)
Limit manipulation to minimize potential harm
Provide opt-out mechanisms

Lessons: AI experimentation on users requires ethical oversight, informed consent, and consideration of potential psychological harm.

Case Study 15: Clearview AI Facial Recognition (2020–Present)

Company: Clearview AI
Project: Facial recognition via scraped social media photos
Investment: Venture-backed startup
Outcome: Banned in multiple countries, numerous lawsuits

What Happened: Clearview AI scraped billions of photos from social media to create a facial recognition database sold to law enforcement. The company faced lawsuits in multiple countries for:

Violating privacy laws (GDPR, CCPA)
Scraping photos without consent
Selling biometric data without permission
Creating mass surveillance infrastructure

Root Cause:

Business model relied on unauthorized use of personal data
No consideration of privacy or consent
Ignored explicit terms of service from social platforms
Prioritized technical capability over legal compliance

What Should Have Been Done:

Obtain consent for biometric data collection
Comply with privacy regulations from the start
Assess societal impact of mass surveillance tools
Respect platform terms of service
Consult legal and ethics experts before launch

Lessons: Technical capability doesn't justify legal or ethical violations. Privacy regulations have teeth—plan for compliance from day one.

Common Patterns Across Failures

Analyzing these 15 failures reveals recurring themes:

Bias in historical data (Amazon, Apple Card, UnitedHealth)
Overpromising AI capabilities (IBM Watson, Zillow)
Insufficient testing and validation (Uber, Knight Capital)
Ethical blindness (Target, Facebook, Clearview AI)
Lack of human oversight (Robinhood, Microsoft Tay)
Cost optimization over user welfare (Stitch Fix, UnitedHealth)
No mechanism for model degradation (Google Flu Trends)

Key Takeaways

Historical bias in training data creates discriminatory AI – Amazon, Apple Card, and UnitedHealth all failed to audit for bias.
Domain complexity requires domain expertise – IBM Watson failed because synthetic medical training data didn't reflect real complexity.
AI trained on normal conditions fails during anomalies – Zillow's models couldn't handle market volatility.
Safety shortcuts in AI systems can be fatal – Uber's disabled emergency braking contributed to a death.
Gradual deployment and kill switches are essential – Knight Capital lost $440M in 45 minutes without circuit breakers.
Ethical review prevents foreseeable harms – Multiple companies faced backlash for easily predictable ethical issues.
High-stakes decisions require explainability and appeal processes – Apple Card and UnitedHealth faced lawsuits over black-box decisions.

Frequently Asked Questions

Are these failures mostly due to technical issues or organizational issues?

Organizational issues dominate: 12 of 15 cases involved organizational failures (poor governance, ethical blindness, cost-cutting, misaligned incentives). Only 3 were primarily technical (Google Flu Trends' model degradation, Uber's sensor fusion, Knight Capital's deployment error). This aligns with broader research showing 70% of AI project failures stem from organizational, not technical, causes.

How can we test our AI for bias before deployment?

Implement a four-step bias audit:

Data audit: Analyze training data for demographic imbalances and historical discrimination patterns.
Feature audit: Identify proxy variables correlated with protected categories (zip code → race, first name → gender).
Output audit: Test model predictions across demographic groups for disparate impact.
Red team testing: Have adversarial team actively try to expose bias through edge cases.

Consider third-party fairness audits for high-stakes applications (hiring, lending, healthcare).

Should we pause AI initiatives given these failure rates?

No, but approach AI strategically:

Start with low-stakes applications (customer service chatbots, inventory optimization) before high-stakes ones (lending, healthcare, autonomous vehicles).
Invest in foundations first (data quality, governance, ethics frameworks) before deploying models.
Learn from failures (yours and others') by documenting lessons and updating processes.
Build incrementally rather than attempting transformative AI overnight.

The companies that succeed with AI are those that learn systematically from failures.

How do we know if our AI vendor is hiding failure risks like IBM Watson?

Demand transparency through vendor evaluation:

Ask for case studies with verifiable references (not just testimonials).
Request validation studies showing model performance on real-world data (not synthetic benchmarks).
Inquire about failure modes – trustworthy vendors discuss limitations openly.
Check for independent third-party audits of accuracy, bias, and reliability.
Verify training data quality – ask specifically what data the model was trained on.
Review contract SLAs – vague guarantees suggest unproven technology.

What's the most important safeguard to prevent AI failures?

Human oversight mechanisms. Nearly every case study shows that failures escalated because:

No human could override algorithmic decisions (Knight Capital, Robinhood).
Humans weren't monitoring outputs (Uber safety driver, Microsoft Tay).
No process for humans to appeal decisions (Apple Card, UnitedHealth).

Successful AI implementations maintain "human-in-the-loop" for:

High-stakes decisions.
Anomaly detection (when AI confidence drops).
Ethical edge cases.
System failures and recovery.

How do we balance AI innovation with risk management?

Use a tiered risk framework:

Low-risk applications (recommendation engines, content personalization):

Faster deployment, lighter governance.
Monitor for unintended consequences.
Easy rollback mechanisms.

Medium-risk applications (fraud detection, customer service):

Phased deployment with monitoring.
Human review of edge cases.
Regular bias audits.

High-risk applications (lending, hiring, healthcare, safety-critical):

Extensive validation and testing.
Third-party audits.
Regulatory compliance review.
Human appeal processes.
Conservative deployment.

Match your governance rigor to the potential impact of AI failures.

What documentation should we maintain to learn from AI failures?

Create a "failure database" documenting:

Incident description: What went wrong, when, and what was the impact?
Root cause analysis: Why did it happen? (organizational, technical, process gaps).
Response actions: How was it detected, mitigated, and resolved?
Lessons learned: What specific changes prevent recurrence?
Responsible parties: Who owns implementing preventive measures?

Review this database quarterly and incorporate lessons into new AI projects. Organizations that document and learn from failures systematically have dramatically lower repeat failure rates.

Frequently Asked Questions

Organizational issues dominate: 12 of 15 cases involved organizational failures such as poor governance, ethical blindness, cost-cutting, and misaligned incentives. Only a minority were primarily technical (e.g., model degradation, deployment errors). This mirrors broader findings that most AI project failures stem from organizational, not technical, causes.

Run a four-step bias audit: (1) Data audit to detect demographic imbalances and historical discrimination; (2) Feature audit to find proxy variables correlated with protected attributes; (3) Output audit to measure disparate impact across groups; and (4) Red-team testing to deliberately probe edge cases. For high-stakes use cases, add an independent third-party fairness audit.

No. Instead, start with lower-stakes applications, invest in data quality and governance first, and build incrementally. Use documented failures—both your own and others'—to harden processes, and only move into high-stakes domains once you have mature oversight, ethics, and risk controls in place.

Ask for real-world case studies with references, demand validation results on production-like data, probe for known failure modes and limitations, look for independent audits, and clarify what data the model was trained on. Be wary of vague SLAs, purely synthetic benchmarks, and marketing claims that frame AI as autonomous or infallible.

Robust human oversight. Maintain human-in-the-loop or human-on-the-loop controls for high-stakes decisions, anomaly detection, ethical edge cases, and system recovery. Ensure humans can override, question, and appeal algorithmic outputs, and that responsibilities and escalation paths are clearly defined.

Use a tiered risk framework: apply lighter governance and faster iteration to low-risk applications; phased rollout, monitoring, and bias checks to medium-risk ones; and rigorous validation, external audits, regulatory review, and appeal mechanisms for high-risk domains like lending, hiring, healthcare, and safety-critical systems.

Maintain a structured failure database capturing incident descriptions, quantified impact, root cause analysis, remediation steps, and assigned owners for preventive actions. Review it regularly and feed the lessons into design standards, governance checklists, and training for teams running current and future AI initiatives.

Most AI failures are not about the model

Across these 15 case studies, the dominant causes were governance gaps, misaligned incentives, weak safety culture, and ethical blind spots—not lack of sophisticated algorithms. Treat AI as an organizational change and risk problem, not just a data science problem.

70%

Estimated share of AI project failures driven by organizational rather than technical causes

Source: Synthesis of industry research and case evidence

"You don't need to repeat the industry's worst AI mistakes—most of them are now well-documented patterns that can be designed against from day one."
— AI Governance & Risk Practice Perspective

References

Amazon scraps secret AI recruiting tool that showed bias against women. Reuters (2018)
IBM’s Watson recommended ‘unsafe and incorrect’ cancer treatments. STAT News (2018)
Zillow Quits Home-Flipping Business, Cites Inability to Forecast Prices. The Wall Street Journal (2021)
Collision Between Vehicle Controlled by Developmental Automated Driving System and Pedestrian. National Transportation Safety Board (NTSB) (2019)
Order Approving the National Securities Exchanges’ and FINRA’s Proposed Rule Changes to Address the August 1, 2012 Trading Incident. U.S. Securities and Exchange Commission (SEC) (2012)
How Target Figured Out A Teen Girl Was Pregnant. Forbes (2012)
Google Flu Trends: The Limits of Big Data. Wired (2015)
Apple Card Algorithm Sparks Gender Bias Allegations. Bloomberg (2019)
Lawsuit: UnitedHealth AI Denied Care to Elderly Patients. Commonwealth of Massachusetts / Court Filings (2022)
Experimental evidence of massive-scale emotional contagion through social networks. Proceedings of the National Academy of Sciences (2014)
Clearview AI Faces Legal Challenges Around the World. The New York Times (2020)

15 AI Project Failures: What Went Wrong

Key Takeaways

Case Study 1: Amazon Recruiting AI (2018)

Case Study 2: IBM Watson for Oncology (2017–2019)

Case Study 3: Zillow Offers (2021)

Case Study 4: Uber Self-Driving Car Fatal Crash (2018)

Case Study 5: Knightmare on Wall Street (2012)

Case Study 6: Target Pregnancy Prediction Controversy (2012)

Case Study 7: Stitch Fix Styling Algorithm Failures (2019–2020)

Case Study 8: Lemonade Insurance Controversy (2021)

Case Study 9: Google Flu Trends (2009–2015)

Case Study 10: Microsoft Tay Chatbot (2016)

Case Study 11: Apple Card Gender Discrimination (2019)

Case Study 12: Robinhood Options Trading Suicide (2020)

Case Study 13: Optum/UnitedHealth Prior Authorization AI (2023)

Case Study 14: Facebook Emotional Contagion Experiment (2012/2014)

Case Study 15: Clearview AI Facial Recognition (2020–Present)

Common Patterns Across Failures

Key Takeaways

Frequently Asked Questions

Are these failures mostly due to technical issues or organizational issues?

How can we test our AI for bias before deployment?

Should we pause AI initiatives given these failure rates?

How do we know if our AI vendor is hiding failure risks like IBM Watson?

What's the most important safeguard to prevent AI failures?

How do we balance AI innovation with risk management?

What documentation should we maintain to learn from AI failures?

Frequently Asked Questions

Most AI failures are not about the model

References

How Pertama Partners Can Help

AI Governance & Security

AI Fraud Detection & Risk Management for Financial Services

AI Family Business Operations & Governance

Ready to Apply These Insights to Your Organization?

Related Articles

15 AI Project Failures: What Went Wrong

Key Takeaways

Case Study 1: Amazon Recruiting AI (2018)

Case Study 2: IBM Watson for Oncology (2017–2019)

Case Study 3: Zillow Offers (2021)

Case Study 4: Uber Self-Driving Car Fatal Crash (2018)

Case Study 5: Knightmare on Wall Street (2012)

Case Study 6: Target Pregnancy Prediction Controversy (2012)

Case Study 7: Stitch Fix Styling Algorithm Failures (2019–2020)

Case Study 8: Lemonade Insurance Controversy (2021)

Case Study 9: Google Flu Trends (2009–2015)

Case Study 10: Microsoft Tay Chatbot (2016)

Case Study 11: Apple Card Gender Discrimination (2019)

Case Study 12: Robinhood Options Trading Suicide (2020)

Case Study 13: Optum/UnitedHealth Prior Authorization AI (2023)

Case Study 14: Facebook Emotional Contagion Experiment (2012/2014)

Case Study 15: Clearview AI Facial Recognition (2020–Present)

Common Patterns Across Failures

Key Takeaways

Frequently Asked Questions

Are these failures mostly due to technical issues or organizational issues?

How can we test our AI for bias before deployment?

Should we pause AI initiatives given these failure rates?

How do we know if our AI vendor is hiding failure risks like IBM Watson?

What's the most important safeguard to prevent AI failures?

How do we balance AI innovation with risk management?

What documentation should we maintain to learn from AI failures?

Frequently Asked Questions

Are these AI failures mostly due to technical issues or organizational issues?

How can we test our AI systems for bias before deployment?

Should we pause AI initiatives because of these failure examples?

How can we tell if an AI vendor is overpromising or hiding risks?

What is the single most important safeguard against AI project failure?

How should we balance AI innovation with risk management?

What internal documentation helps us systematically learn from AI failures?

Most AI failures are not about the model

References

How Pertama Partners Can Help

AI Governance & Security

AI Fraud Detection & Risk Management for Financial Services

AI Family Business Operations & Governance

Ready to Apply These Insights to Your Organization?

Related Articles