Traditional threat modeling doesn't fully address AI-specific vulnerabilities. This guide extends threat modeling methodology for AI systems.
Executive Summary
- AI introduces new threats — Model manipulation, training data attacks, adversarial inputs
- Traditional threat modeling adapts — STRIDE and other frameworks extend to AI
- System-level view essential — AI threats span data, model, infrastructure, and integration
- Threat modeling early — Design phase is cheapest time to address threats
- Continuous process — Threats evolve as AI capabilities and attacks advance
- Cross-functional effort — Security, AI, and business perspectives all matter
AI Threat Categories
1. Training Data Attacks
- Data poisoning
- Backdoor insertion
- Data extraction/inference
2. Model Attacks
- Model extraction/theft
- Adversarial examples
- Model inversion
- Membership inference
3. Infrastructure Attacks
- Traditional IT attacks (network, compute, storage)
- API vulnerabilities
- Access control bypass
4. Output Manipulation
- Prompt injection
- Jailbreaking
- Output filtering bypass
AI Threat Modeling Methodology
Step 1: Define System Scope
- What AI capabilities are in scope?
- What data flows through the system?
- What are the trust boundaries?
- Who are the legitimate users?
Step 2: Identify Threats (STRIDE-AI)
| Category | Traditional | AI Extension |
|---|---|---|
| Spoofing | Identity spoofing | Training data source spoofing |
| Tampering | Data tampering | Model tampering, adversarial inputs |
| Repudiation | Action denial | AI decision audit gaps |
| Information Disclosure | Data leakage | Model extraction, training data leakage |
| Denial of Service | System unavailability | Model degradation attacks |
| Elevation of Privilege | Unauthorized access | Prompt injection privilege escalation |
Step 3: Assess and Prioritize
- Likelihood of each threat
- Impact if exploited
- Existing controls
- Residual risk
Step 4: Define Mitigations
- Preventive controls
- Detective controls
- Response procedures
Step 5: Document and Review
- Threat model documentation
- Regular updates as system evolves
- Review upon significant changes
AI Threat Register Snippet
| Threat | Category | Likelihood | Impact | Risk | Mitigation |
|---|---|---|---|---|---|
| Adversarial input bypass | Model | Medium | High | High | Input validation, robust training |
| Prompt injection | Output | High | Medium | High | Output filtering, prompt engineering |
| Training data poisoning | Data | Low | High | Medium | Data provenance, validation |
| Model extraction | Model | Medium | Medium | Medium | API rate limiting, output perturbation |
| Sensitive data in output | Output | Medium | High | High | Output filtering, content classification |
Checklist for AI Threat Modeling
- System scope and boundaries defined
- Data flows documented
- Trust boundaries identified
- AI-specific threats enumerated
- STRIDE-AI analysis completed
- Threats prioritized by risk
- Mitigations defined for high/critical threats
- Threat model documented
- Review schedule established
How Threat Landscapes Evolved for Artificial Intelligence Systems Between 2024 and 2026
The cybersecurity threat environment targeting artificial intelligence deployments escalated dramatically throughout 2025 as adversarial techniques matured from academic research demonstrations into weaponized attack toolkits distributed through underground forums. Understanding this evolution helps security practitioners calibrate threat models against realistic contemporary attack surfaces rather than theoretical vulnerability categories.
The MITRE ATLAS framework (Adversarial Threat Landscape for Artificial-Intelligence Systems) expanded from forty-seven documented techniques in January 2024 to eighty-three techniques by December 2025, reflecting rapid adversarial innovation. OWASP released version 2.0 of its Machine Learning Security Top 10 in October 2025, reorganizing risk categories to reflect production deployment patterns rather than research environment assumptions. The National Institute of Standards and Technology published NIST AI 600-1 (Artificial Intelligence Risk Management Framework: Generative AI Profile) in July 2024, followed by supplementary guidance addressing adversarial robustness evaluation methodologies in March 2025.
Comprehensive Threat Taxonomy for Enterprise Deployments
Category 1 — Prompt Injection and Instruction Manipulation
Direct and indirect prompt injection remains the most frequently exploited vulnerability category in generative system deployments. Direct injection involves crafting adversarial inputs designed to override system instructions, extract confidential system prompts, or manipulate output behavior. Indirect injection operates through compromised contextual data — malicious content embedded in documents, emails, or web pages processed by retrieval-augmented generation pipelines.
Mitigation Strategies. Implement input sanitization layers using libraries like Guardrails AI, NeMo Guardrails by NVIDIA, or Rebuff. Deploy output monitoring classifying responses against behavioral policy violations. Establish separate privilege boundaries between system instructions and user-supplied content. Conduct regular red-team exercises using adversarial prompt datasets published by researchers at Carnegie Mellon University, Anthropic, and the Allen Institute for Artificial Intelligence.
Category 2 — Data Poisoning and Training Manipulation
Adversaries targeting training pipelines can compromise model behavior by injecting malicious samples into training corpora, manipulating labeling processes, or exploiting data supply chain vulnerabilities. Backdoor attacks implant trigger patterns that activate predetermined malicious behavior only when specific input conditions occur, evading standard evaluation benchmarks.
Mitigation Strategies. Implement data provenance tracking using frameworks like DataHub, Great Expectations, or Weights and Biases Artifacts. Deploy statistical anomaly detection on training datasets using tools including Cleanlab, Aquarium Learning, or Scale AI Nucleus. Maintain cryptographic integrity verification for dataset versions using content-addressable storage systems. Conduct periodic model behavioral testing against known backdoor trigger patterns cataloged by researchers at the University of California Berkeley, Massachusetts Institute of Technology, and Google DeepMind.
Category 3 — Model Extraction and Intellectual Property Theft
Sophisticated adversaries systematically query deployed models to reconstruct functionally equivalent replicas, enabling competitive intelligence theft, attack surface mapping, and downstream adversarial example generation. Extraction attacks against language model APIs demonstrated by researchers at ETH Zurich and the University of Maryland in 2025 achieved ninety-three percent functional equivalence using fewer than fifty thousand strategically constructed queries.
Mitigation Strategies. Implement rate limiting and query pattern anomaly detection using API management platforms including Kong Gateway, Apigee, AWS API Gateway, or Azure API Management. Deploy watermarking techniques embedding verifiable provenance markers in model outputs. Monitor for suspicious query patterns characteristic of model extraction campaigns including high-frequency structurally similar requests and systematic input space exploration sequences.
Category 4 — Supply Chain and Dependency Vulnerabilities
Modern deployments incorporate numerous third-party dependencies — pretrained foundation models from Hugging Face Hub repositories, inference optimization libraries, vector database connectors, and orchestration frameworks like LangChain, LlamaIndex, or Semantic Kernel. Each dependency introduces potential supply chain compromise vectors analogous to traditional software supply chain attacks documented in the SolarWinds and Log4Shell incidents.
Mitigation Strategies. Implement software bill of materials documentation for all model components following SPDX or CycloneDX specifications. Deploy dependency scanning using Snyk, Dependabot, or Socket.dev. Verify cryptographic signatures for downloaded model weights and adapter checkpoints. Maintain approved vendor registries restricting which external model repositories and library sources may be integrated into production pipelines.
Category 5 — Inference Manipulation and Evasion Attacks
Adversarial examples — inputs intentionally crafted to cause misclassification or incorrect outputs while appearing normal to human observers — threaten computer vision, natural language understanding, and multimodal systems. Physical-domain attacks demonstrated by researchers at Tsinghua University and Stanford University in 2025 successfully evaded commercial autonomous driving perception systems using printed adversarial patches and projected light perturbations.
Mitigation Strategies. Deploy adversarial training incorporating perturbation-augmented datasets generated using Foolbox, Adversarial Robustness Toolbox by IBM Research, or CleverHans. Implement ensemble inference combining multiple model architectures to reduce single-point vulnerability exposure. Conduct periodic robustness evaluations using standardized benchmarks including RobustBench and ARES-Bench maintained by academic research consortiums.
Integrating Threat Modeling Into Organizational Security Programs
Effective threat modeling requires integration with existing enterprise security governance rather than isolated assessment exercises. Pertama Partners recommends conducting threat model reviews at four lifecycle stages: initial system design review before development commencement, pre-deployment security assessment during staging environment validation, periodic operational review at quarterly intervals following production deployment, and triggered reassessment following significant system modifications, vendor dependency updates, or newly published vulnerability disclosures relevant to deployed technology components. Threat model documentation should reference organizational risk appetite statements approved by chief information security officers and maintained within governance risk compliance platforms including ServiceNow GRC, Archer by RSA, LogicGate, or MetricStream.
Common Questions
AI systems face unique threats including adversarial attacks, model poisoning, extraction attacks, and prompt injection that require AI-specific threat identification and mitigation.
Consider adversarial inputs, data poisoning, model extraction, privacy attacks, prompt injection, and supply chain attacks through training data or models.
Extend existing threat modeling frameworks (like STRIDE) to include AI-specific threats. Don't create separate processes—integrate with enterprise security practices.
References
- AI Risk Management Framework (AI RMF 1.0). National Institute of Standards and Technology (NIST) (2023). View source
- Cybersecurity Framework (CSF) 2.0. National Institute of Standards and Technology (NIST) (2024). View source
- OWASP Top 10 for Large Language Model Applications 2025. OWASP Foundation (2025). View source
- OWASP Top 10 Web Application Security Risks. OWASP Foundation (2021). View source
- ISO/IEC 42001:2023 — Artificial Intelligence Management System. International Organization for Standardization (2023). View source
- Artificial Intelligence Cybersecurity Challenges. European Union Agency for Cybersecurity (ENISA) (2020). View source
- Model AI Governance Framework (Second Edition). PDPC and IMDA Singapore (2020). View source

