Back to Insights
AI Security & Data ProtectionGuide

AI Threat Modeling: Identifying Risks Before They Become Incidents

January 13, 20266 min readMichael Lansdowne Hauge
Updated March 15, 2026
For:CISOConsultantCTO/CIO

Extend threat modeling methodology to AI systems. STRIDE-AI framework, threat categories, and AI-specific risk assessment.

Summarize and fact-check this article with:
Tech Code Review - ai security & data protection insights

Key Takeaways

  • 1.AI systems introduce unique threat vectors including adversarial attacks and model poisoning
  • 2.Structured threat modeling identifies vulnerabilities before malicious actors exploit them
  • 3.AI-specific attack surfaces require security controls beyond traditional application security
  • 4.Regular red team exercises test AI system resilience against sophisticated attacks
  • 5.Integration of AI threat modeling into existing security frameworks ensures comprehensive coverage

Traditional threat modeling doesn't fully address AI-specific vulnerabilities. This guide extends threat modeling methodology for AI systems.


Executive Summary

  • AI introduces new threats — Model manipulation, training data attacks, adversarial inputs
  • Traditional threat modeling adapts — STRIDE and other frameworks extend to AI
  • System-level view essential — AI threats span data, model, infrastructure, and integration
  • Threat modeling early — Design phase is cheapest time to address threats
  • Continuous process — Threats evolve as AI capabilities and attacks advance
  • Cross-functional effort — Security, AI, and business perspectives all matter

AI Threat Categories

1. Training Data Attacks

  • Data poisoning
  • Backdoor insertion
  • Data extraction/inference

2. Model Attacks

3. Infrastructure Attacks

  • Traditional IT attacks (network, compute, storage)
  • API vulnerabilities
  • Access control bypass

4. Output Manipulation


AI Threat Modeling Methodology

Step 1: Define System Scope

  • What AI capabilities are in scope?
  • What data flows through the system?
  • What are the trust boundaries?
  • Who are the legitimate users?

Step 2: Identify Threats (STRIDE-AI)

CategoryTraditionalAI Extension
SpoofingIdentity spoofingTraining data source spoofing
TamperingData tamperingModel tampering, adversarial inputs
RepudiationAction denialAI decision audit gaps
Information DisclosureData leakageModel extraction, training data leakage
Denial of ServiceSystem unavailabilityModel degradation attacks
Elevation of PrivilegeUnauthorized accessPrompt injection privilege escalation

Step 3: Assess and Prioritize

  • Likelihood of each threat
  • Impact if exploited
  • Existing controls
  • Residual risk

Step 4: Define Mitigations

  • Preventive controls
  • Detective controls
  • Response procedures

Step 5: Document and Review

  • Threat model documentation
  • Regular updates as system evolves
  • Review upon significant changes

AI Threat Register Snippet

ThreatCategoryLikelihoodImpactRiskMitigation
Adversarial input bypassModelMediumHighHighInput validation, robust training
Prompt injectionOutputHighMediumHighOutput filtering, prompt engineering
Training data poisoningDataLowHighMediumData provenance, validation
Model extractionModelMediumMediumMediumAPI rate limiting, output perturbation
Sensitive data in outputOutputMediumHighHighOutput filtering, content classification

Checklist for AI Threat Modeling

  • System scope and boundaries defined
  • Data flows documented
  • Trust boundaries identified
  • AI-specific threats enumerated
  • STRIDE-AI analysis completed
  • Threats prioritized by risk
  • Mitigations defined for high/critical threats
  • Threat model documented
  • Review schedule established

How Threat Landscapes Evolved for Artificial Intelligence Systems Between 2024 and 2026

The cybersecurity threat environment targeting artificial intelligence deployments escalated dramatically throughout 2025 as adversarial techniques matured from academic research demonstrations into weaponized attack toolkits distributed through underground forums. Understanding this evolution helps security practitioners calibrate threat models against realistic contemporary attack surfaces rather than theoretical vulnerability categories.

The MITRE ATLAS framework (Adversarial Threat Landscape for Artificial-Intelligence Systems) expanded from forty-seven documented techniques in January 2024 to eighty-three techniques by December 2025, reflecting rapid adversarial innovation. OWASP released version 2.0 of its Machine Learning Security Top 10 in October 2025, reorganizing risk categories to reflect production deployment patterns rather than research environment assumptions. The National Institute of Standards and Technology published NIST AI 600-1 (Artificial Intelligence Risk Management Framework: Generative AI Profile) in July 2024, followed by supplementary guidance addressing adversarial robustness evaluation methodologies in March 2025.

Comprehensive Threat Taxonomy for Enterprise Deployments

Category 1 — Prompt Injection and Instruction Manipulation

Direct and indirect prompt injection remains the most frequently exploited vulnerability category in generative system deployments. Direct injection involves crafting adversarial inputs designed to override system instructions, extract confidential system prompts, or manipulate output behavior. Indirect injection operates through compromised contextual data — malicious content embedded in documents, emails, or web pages processed by retrieval-augmented generation pipelines.

Mitigation Strategies. Implement input sanitization layers using libraries like Guardrails AI, NeMo Guardrails by NVIDIA, or Rebuff. Deploy output monitoring classifying responses against behavioral policy violations. Establish separate privilege boundaries between system instructions and user-supplied content. Conduct regular red-team exercises using adversarial prompt datasets published by researchers at Carnegie Mellon University, Anthropic, and the Allen Institute for Artificial Intelligence.

Category 2 — Data Poisoning and Training Manipulation

Adversaries targeting training pipelines can compromise model behavior by injecting malicious samples into training corpora, manipulating labeling processes, or exploiting data supply chain vulnerabilities. Backdoor attacks implant trigger patterns that activate predetermined malicious behavior only when specific input conditions occur, evading standard evaluation benchmarks.

Mitigation Strategies. Implement data provenance tracking using frameworks like DataHub, Great Expectations, or Weights and Biases Artifacts. Deploy statistical anomaly detection on training datasets using tools including Cleanlab, Aquarium Learning, or Scale AI Nucleus. Maintain cryptographic integrity verification for dataset versions using content-addressable storage systems. Conduct periodic model behavioral testing against known backdoor trigger patterns cataloged by researchers at the University of California Berkeley, Massachusetts Institute of Technology, and Google DeepMind.

Category 3 — Model Extraction and Intellectual Property Theft

Sophisticated adversaries systematically query deployed models to reconstruct functionally equivalent replicas, enabling competitive intelligence theft, attack surface mapping, and downstream adversarial example generation. Extraction attacks against language model APIs demonstrated by researchers at ETH Zurich and the University of Maryland in 2025 achieved ninety-three percent functional equivalence using fewer than fifty thousand strategically constructed queries.

Mitigation Strategies. Implement rate limiting and query pattern anomaly detection using API management platforms including Kong Gateway, Apigee, AWS API Gateway, or Azure API Management. Deploy watermarking techniques embedding verifiable provenance markers in model outputs. Monitor for suspicious query patterns characteristic of model extraction campaigns including high-frequency structurally similar requests and systematic input space exploration sequences.

Category 4 — Supply Chain and Dependency Vulnerabilities

Modern deployments incorporate numerous third-party dependencies — pretrained foundation models from Hugging Face Hub repositories, inference optimization libraries, vector database connectors, and orchestration frameworks like LangChain, LlamaIndex, or Semantic Kernel. Each dependency introduces potential supply chain compromise vectors analogous to traditional software supply chain attacks documented in the SolarWinds and Log4Shell incidents.

Mitigation Strategies. Implement software bill of materials documentation for all model components following SPDX or CycloneDX specifications. Deploy dependency scanning using Snyk, Dependabot, or Socket.dev. Verify cryptographic signatures for downloaded model weights and adapter checkpoints. Maintain approved vendor registries restricting which external model repositories and library sources may be integrated into production pipelines.

Category 5 — Inference Manipulation and Evasion Attacks

Adversarial examples — inputs intentionally crafted to cause misclassification or incorrect outputs while appearing normal to human observers — threaten computer vision, natural language understanding, and multimodal systems. Physical-domain attacks demonstrated by researchers at Tsinghua University and Stanford University in 2025 successfully evaded commercial autonomous driving perception systems using printed adversarial patches and projected light perturbations.

Mitigation Strategies. Deploy adversarial training incorporating perturbation-augmented datasets generated using Foolbox, Adversarial Robustness Toolbox by IBM Research, or CleverHans. Implement ensemble inference combining multiple model architectures to reduce single-point vulnerability exposure. Conduct periodic robustness evaluations using standardized benchmarks including RobustBench and ARES-Bench maintained by academic research consortiums.

Integrating Threat Modeling Into Organizational Security Programs

Effective threat modeling requires integration with existing enterprise security governance rather than isolated assessment exercises. Pertama Partners recommends conducting threat model reviews at four lifecycle stages: initial system design review before development commencement, pre-deployment security assessment during staging environment validation, periodic operational review at quarterly intervals following production deployment, and triggered reassessment following significant system modifications, vendor dependency updates, or newly published vulnerability disclosures relevant to deployed technology components. Threat model documentation should reference organizational risk appetite statements approved by chief information security officers and maintained within governance risk compliance platforms including ServiceNow GRC, Archer by RSA, LogicGate, or MetricStream.

Common Questions

AI systems face unique threats including adversarial attacks, model poisoning, extraction attacks, and prompt injection that require AI-specific threat identification and mitigation.

Consider adversarial inputs, data poisoning, model extraction, privacy attacks, prompt injection, and supply chain attacks through training data or models.

Extend existing threat modeling frameworks (like STRIDE) to include AI-specific threats. Don't create separate processes—integrate with enterprise security practices.

References

  1. AI Risk Management Framework (AI RMF 1.0). National Institute of Standards and Technology (NIST) (2023). View source
  2. Cybersecurity Framework (CSF) 2.0. National Institute of Standards and Technology (NIST) (2024). View source
  3. OWASP Top 10 for Large Language Model Applications 2025. OWASP Foundation (2025). View source
  4. OWASP Top 10 Web Application Security Risks. OWASP Foundation (2021). View source
  5. ISO/IEC 42001:2023 — Artificial Intelligence Management System. International Organization for Standardization (2023). View source
  6. Artificial Intelligence Cybersecurity Challenges. European Union Agency for Cybersecurity (ENISA) (2020). View source
  7. Model AI Governance Framework (Second Edition). PDPC and IMDA Singapore (2020). View source
Michael Lansdowne Hauge

Managing Director · HRDF-Certified Trainer (Malaysia), Delivered Training for Big Four, MBB, and Fortune 500 Clients, 100+ Angel Investments (Seed–Series C), Dartmouth College, Economics & Asian Studies

Managing Director of Pertama Partners, an AI advisory and training firm helping organizations across Southeast Asia adopt and implement artificial intelligence. HRDF-certified trainer with engagements for a Big Four accounting firm, a leading global management consulting firm, and the world's largest ERP software company.

AI StrategyAI GovernanceExecutive AI TrainingDigital TransformationASEAN MarketsAI ImplementationAI Readiness AssessmentsResponsible AIPrompt EngineeringAI Literacy Programs

EXPLORE MORE

Other AI Security & Data Protection Solutions

INSIGHTS

Related reading

Talk to Us About AI Security & Data Protection

We work with organizations across Southeast Asia on ai security & data protection programs. Let us know what you are working on.