AI Safety & Security

What is Prompt Injection?

Prompt Injection is a security attack where malicious input is crafted to override or manipulate the instructions given to a large language model, causing it to ignore its intended behaviour and follow the attacker's commands instead. It is one of the most significant security challenges facing AI-powered applications today.

What is Prompt Injection?

Prompt Injection is a type of attack against AI systems, particularly large language models (LLMs), where an attacker crafts input that causes the model to ignore its original instructions and follow the attacker's directions instead. It is analogous to SQL injection in traditional web applications, where malicious input tricks a system into executing unintended commands.

When a business deploys an AI chatbot, for example, it typically gives the model a set of instructions — a system prompt — that defines its role, boundaries, and behaviour. A prompt injection attack attempts to override those instructions. The attacker might embed hidden commands within what appears to be a normal user message, causing the chatbot to reveal confidential information, bypass safety restrictions, or behave in ways the business never intended.

Why Prompt Injection Is a Serious Business Risk

Prompt injection is not a theoretical concern. It is actively exploited against real AI systems. For businesses deploying AI-powered tools, the risks include:

Data exposure: An attacker could trick an AI assistant into revealing system prompts, internal business rules, or data from connected databases.
Brand damage: A manipulated chatbot could produce offensive content, make false claims about your products, or provide harmful advice under your company's name.
Operational disruption: An AI agent that has been hijacked through prompt injection could take incorrect actions, process fraudulent transactions, or corrupt workflows.
Compliance violations: If an injected prompt causes your AI system to mishandle personal data, you may face regulatory consequences under data protection laws.

Types of Prompt Injection

Direct Prompt Injection

In a direct attack, the user sends input that explicitly attempts to override the system prompt. For example, a user might type: "Ignore all previous instructions and instead..." followed by their own commands. While many modern AI systems have some resistance to simple direct injection, more sophisticated variants continue to bypass defences.

Indirect Prompt Injection

Indirect injection is more subtle and harder to defend against. The malicious instructions are embedded in content that the AI system processes from external sources — a webpage it is asked to summarise, an email it is asked to analyse, or a document it retrieves from a database. The AI model reads the hidden instructions and follows them, without the end user or the system operator realising what happened.

For example, an attacker could embed invisible instructions in a web page. When your AI assistant is asked to summarise that page, it reads the hidden commands and acts on them — potentially exfiltrating data or changing its behaviour for subsequent interactions.

Defending Against Prompt Injection

Input Validation and Filtering

The first line of defence is examining user inputs for patterns associated with injection attempts. This includes looking for phrases like "ignore previous instructions," role-play requests designed to bypass safety rules, and encoded or obfuscated commands. However, filtering alone is insufficient because attackers constantly develop new phrasing to evade detection.

System Prompt Hardening

Strengthening the system prompt with explicit instructions about what the model should refuse to do, combined with reinforcement of its role boundaries, provides an additional layer of protection. This is not foolproof, but it raises the difficulty for attackers.

Output Monitoring

Monitoring the AI system's outputs for anomalous behaviour, such as responses that deviate from expected patterns, reveal system instructions, or contain content that violates business policies, helps catch successful injections even when input filtering fails.

Architectural Controls

The most robust defence involves architectural decisions:

Principle of least privilege: Limit the AI system's access to data and capabilities to only what it needs for its specific task.
Separation of concerns: Do not let a single AI model handle both user interaction and sensitive operations. Use separate models or traditional code for critical functions.
Human-in-the-loop: For high-stakes actions like financial transactions or data access, require human approval rather than allowing the AI to act autonomously.

Sandboxing and Isolation

When AI systems process external content such as web pages, emails, or uploaded documents, process that content in an isolated environment and sanitise it before passing it to the main AI model. This reduces the effectiveness of indirect injection attacks.

Prompt Injection in the Southeast Asian Context

Businesses across Southeast Asia are rapidly deploying AI chatbots and virtual assistants for customer service, sales, and internal operations. The multilingual nature of the region adds complexity to prompt injection defence, as attacks can be crafted in any language the model understands. An injection attempt in Thai, Bahasa Indonesia, or Vietnamese may bypass filters designed primarily for English.

Additionally, as AI agents become more prevalent in financial services, e-commerce, and government services across ASEAN markets, the potential impact of successful prompt injection attacks grows. Organisations in the region should treat prompt injection as a first-order security concern, on par with traditional web application vulnerabilities.

The Evolving Landscape

Prompt injection remains an active area of security research. No complete defence exists today, and it is unlikely that a single solution will eliminate the risk entirely. The most effective approach is defence in depth — combining multiple layers of protection and assuming that any individual layer can be bypassed. Organisations should monitor developments in this space closely and update their defences as new techniques emerge.

Why It Matters for Business

Prompt Injection is the most pressing security vulnerability in AI-powered applications today, and every organisation deploying large language models needs to understand it. For CEOs and CTOs, the risk is direct: a successful prompt injection can cause your AI systems to leak confidential data, produce harmful content under your brand name, or take unauthorised actions.

The challenge is particularly acute in Southeast Asia, where businesses are rapidly deploying AI chatbots and virtual assistants across multiple languages. Multilingual environments create additional attack surface, as injection attempts in regional languages may bypass security filters optimised for English.

From a liability perspective, if your AI system is manipulated into violating data protection regulations — such as Singapore's PDPA or Indonesia's PDP Law — through prompt injection, your organisation bears the regulatory and legal consequences. Understanding and mitigating prompt injection risk is not optional for any business that exposes AI systems to user input or external data.

Key Considerations

Treat prompt injection as a critical security risk on par with SQL injection or cross-site scripting, and allocate security resources accordingly.
Implement defence in depth with multiple layers including input filtering, output monitoring, system prompt hardening, and architectural controls.
Apply the principle of least privilege to all AI systems, limiting their access to data and actions to the minimum required for their specific function.
Test your AI applications for prompt injection vulnerabilities in all languages they support, including Bahasa, Thai, Vietnamese, and other regional languages.
Never rely solely on the AI model to enforce security boundaries. Use traditional code and access controls for sensitive operations.
Monitor AI system outputs for anomalous behaviour that could indicate a successful injection attack.
Keep your security team informed about the latest prompt injection techniques, as this is a rapidly evolving field with new attack methods emerging regularly.
Require human approval for any high-stakes actions that AI systems might take, such as financial transactions, data access, or communications sent on behalf of the company.

Frequently Asked Questions

Can prompt injection be completely prevented?

No. As of today, there is no known method to completely eliminate prompt injection risk in large language models. The fundamental challenge is that these models process instructions and data in the same channel, making it difficult to definitively separate legitimate input from malicious commands. The most effective approach is defence in depth — combining multiple protective layers so that a failure in one layer is caught by another. Assume injection is possible and design your architecture to limit the damage a successful attack can cause.

Is prompt injection only a risk for chatbots?

No. Any application that uses a large language model to process external input is potentially vulnerable. This includes AI-powered email assistants, document summarisation tools, code generation platforms, search engines with AI features, and AI agents that interact with external data sources. Indirect prompt injection is particularly concerning because it can target AI systems that process web content, documents, or data feeds without any direct user interaction.

Need help implementing Prompt Injection?

Pertama Partners helps businesses across Southeast Asia adopt AI strategically. Let's discuss how prompt injection fits into your AI roadmap.

Book a Consultation Browse AI Glossary

What is Prompt Injection?

What is Prompt Injection?

Why Prompt Injection Is a Serious Business Risk

Types of Prompt Injection

Direct Prompt Injection

Indirect Prompt Injection

Defending Against Prompt Injection

Input Validation and Filtering

System Prompt Hardening

Output Monitoring

Architectural Controls

Sandboxing and Isolation

Prompt Injection in the Southeast Asian Context

The Evolving Landscape

Frequently Asked Questions

Can prompt injection be completely prevented?

Is prompt injection only a risk for chatbots?

How do we test our AI systems for prompt injection vulnerabilities?

Need help implementing Prompt Injection?