The Data Leakage Risk with ChatGPT
When employees use ChatGPT at work, every prompt they type potentially shares company data with an external service. While enterprise AI plans have stronger data protections, the risk of data leakage is real — and one careless prompt can expose customer information, trade secrets, or confidential business data.
This guide explains the specific risks and practical steps to prevent data leakage.
How Data Leakage Happens
Scenario 1: Direct Input of Sensitive Data
An employee pastes a customer complaint email (including the customer's name, account number, and order details) into ChatGPT to draft a response. The customer's personal data is now processed by an external service.
Scenario 2: Contextual Accumulation
Over multiple prompts, an employee shares enough context about a confidential project — team names, financial targets, strategic plans — that the accumulated information constitutes a confidential briefing.
Scenario 3: Code and Intellectual Property
A developer pastes proprietary source code into ChatGPT for debugging help. The code may contain algorithms, API keys, or business logic that constitutes trade secrets.
Scenario 4: Training Data Concerns
With consumer-tier AI products, user prompts may be used to improve the model. This means sensitive data could theoretically influence future outputs visible to other users. (Enterprise plans typically exclude data from training.)
Data Classification Framework
The first defence against data leakage is a clear data classification system. Every piece of information in your company falls into one of these categories:
Green — Public Data
Information that is already publicly available or intended for public distribution.
- Published press releases, marketing materials
- Job listings, company website content
- Industry statistics and public data
- General business knowledge
AI Rule: Can be freely used with any AI tool.
Yellow — Internal Data
Information that is not confidential but is meant for internal use only.
- Internal process documents, SOPs
- Meeting agendas and non-sensitive notes
- General project updates (non-strategic)
- Team communications
AI Rule: May be used with approved enterprise AI tools only (not free-tier consumer products).
Orange — Confidential Data
Information that could harm the company or individuals if disclosed.
- Financial results (before public release)
- Strategic plans and competitive intelligence
- Employee performance data
- Customer lists and contact databases
- Pricing strategies
AI Rule: Must be anonymised before use. Remove all identifying details (names, numbers, dates). Use only with approved enterprise AI tools.
Red — Restricted Data
Information that must never enter any external AI system.
- Personal identifiable information (PII): NRIC, IC, passport numbers
- Financial data: bank accounts, credit cards, salary details
- Medical records and health information
- Legal privileged communications
- API keys, passwords, access credentials
- Source code containing proprietary algorithms
AI Rule: NEVER enter into any AI tool, under any circumstances.
Practical Safeguards
1. Use Enterprise Plans Only
Consumer-tier AI products (free ChatGPT, free Claude) have different data handling practices than enterprise plans. Key differences:
| Feature | Consumer/Free | Enterprise |
|---|---|---|
| Data used for training | Often yes | Typically no |
| Data retention | Extended | Limited/configurable |
| Admin controls | None | Full |
| Usage monitoring | None | Audit logs |
| Data processing agreement | None | Available |
| Compliance certifications | Limited | SOC 2, ISO 27001 |
2. Implement Technical Controls
- Block consumer AI websites on corporate networks (allow only enterprise endpoints)
- Enable data loss prevention (DLP) tools that flag sensitive data in AI prompts
- Configure AI tool admin settings to restrict data sharing
- Enable audit logging for all AI tool usage
3. Train Every Employee
Every employee who uses AI tools must understand:
- The data classification framework (Green/Yellow/Orange/Red)
- How to anonymise data before using it with AI
- Which AI tools are approved (and which are blocked)
- What to do if they accidentally share sensitive data
4. Create an Anonymisation Checklist
Before pasting any text into an AI tool, check for and remove:
- Personal names → Replace with [Person A], [Employee B]
- Company names → Replace with [Company X]
- Account/ID numbers → Remove entirely
- Contact details (email, phone, address) → Remove
- Financial figures → Replace with approximations
- Dates that could identify events → Generalise
- Location details that narrow identification → Generalise
5. Establish Incident Response
When data leakage occurs (or is suspected):
- Stop using the AI tool immediately for that session
- Document what data was shared (screenshot if possible)
- Report to IT Security within 1 hour
- IT assesses the severity and determines response steps
- Notify affected parties if PII was involved (PDPA requirement)
- Update safeguards to prevent recurrence
Regulatory Context
Singapore PDPA
The Personal Data Protection Act requires organisations to protect personal data and obtain consent for its use. Inputting personal data into AI tools without proper safeguards may constitute a breach. Penalties can reach S$1 million per breach.
Malaysia PDPA
Malaysia's Personal Data Protection Act similarly requires organisations to safeguard personal data. Sharing personal data with AI services may violate data processing principles if proper consent and safeguards are not in place.
What Good Looks Like
A company with effective AI data protection:
- Has a written AI usage policy that all employees have read and signed
- Uses only enterprise-tier AI tools with data processing agreements
- Trains every employee on data classification and anonymisation
- Monitors AI tool usage through admin dashboards and audit logs
- Responds to incidents within 1 hour with a defined process
- Reviews and updates its AI policy quarterly
Related Reading
- ChatGPT Company Policy — Build a comprehensive ChatGPT usage policy
- AI Risk Assessment Template — Identify and mitigate risks from AI use in your organisation
- Copilot Governance & Access — Enterprise-grade governance for Microsoft Copilot
What's Changed in Data Leakage Prevention Since 2024
The landscape of ChatGPT data leakage prevention shifted dramatically between early 2024 and March 2026, driven by three converging developments: OpenAI's enterprise architecture updates, regulatory enforcement actions, and the emergence of dedicated interception technologies.
Enterprise API Controls versus Browser-Based Usage. Organizations that relied solely on acceptable use policies discovered through incident reports that browser-based ChatGPT sessions remained the primary exfiltration vector. OpenAI introduced Team and Enterprise workspace tiers with data retention opt-outs and administrative conversation logging, but these controls only apply when employees use sanctioned accounts. Shadow usage through personal subscriptions continues to bypass organizational safeguards entirely.
DLP Gateway Solutions. Dedicated proxy tools now inspect prompts before they reach external language model endpoints. Nightfall AI, Microsoft Purview (formerly Compliance Manager), Zscaler GenAI Security, and Harmonic Security each intercept outbound requests and scan for sensitive patterns including personally identifiable information, source code fragments, financial projections, and intellectual property markers. Nightfall's classification engine uses context-aware detection trained on healthcare records, legal documents, and engineering codebases, achieving approximately ninety-two percent precision according to their published benchmark from September 2025.
Regulatory Enforcement Precedents. Italy's Garante temporarily suspended ChatGPT operations in March 2023, and subsequent GDPR enforcement guidance from the European Data Protection Board (EDPB Opinion 28/2024) established that submitting personal data into generative models constitutes processing under Article 4(2). South Korea's Personal Information Protection Commission (PIPC) issued similar interpretive guidance in January 2025, requiring organizations to conduct data protection impact assessments before deploying conversational AI tools.
Building a Layered Prevention Architecture
Effective prevention combines technical controls with procedural safeguards across four layers:
- Network perimeter: DNS-level blocking of unauthorized AI endpoints using tools like Cisco Umbrella or Cloudflare Gateway
- Endpoint monitoring: Browser extension policies that detect and warn when sensitive content enters form fields on generative AI domains
- API governance: Centralized prompt routing through approved enterprise API keys with automatic PII redaction via middleware libraries such as Presidio (Microsoft open-source) or Private AI
- Training reinforcement: Quarterly tabletop exercises simulating data leakage scenarios specific to each department's workflows, documented through learning management platforms like Docebo or Cornerstone OnDemand
Enterprise-grade prevention architectures incorporate Symantec DLP, Microsoft Purview Information Protection, and Nightfall AI detection engines performing real-time lexical and regex pattern matching against sensitive data taxonomies including PII, PHI, and PCI-DSS cardholder attributes. Organizations operating across Cyberjaya, Changi Business Park, and Batam Industrial Estate implement tokenization gateways through Protegrity, Voltage SecureData, or Thales CipherTrust platforms ensuring plaintext credentials never traverse egress boundaries. CASB (Cloud Access Security Broker) configurations through Netskope, Zscaler, and Palo Alto Prisma enforce inline inspection policies calibrated against MITRE ATT&CK tactics and OWASP LLM Top Ten vulnerability classifications, generating tamper-evident forensic telemetry satisfying ISO 27701 privacy information management system attestation requirements.
Common Questions
Yes, if employees input sensitive information into AI tools. The risks include: direct input of personal data, accumulation of confidential context across prompts, and exposure of intellectual property. Enterprise AI plans provide stronger protections, but employee training and data classification are essential safeguards.
ChatGPT Enterprise is significantly safer than consumer/free versions. Data is not used for model training, retention is configurable, admin controls are available, and SOC 2 compliance is maintained. However, even with Enterprise, employees must follow data classification guidelines — do not input restricted data (PII, credentials, source code).
Immediately stop the session, document what was shared, and report to IT Security within 1 hour. If personal data was involved, assess PDPA notification requirements. Then update safeguards to prevent recurrence — this may include additional training, technical controls, or policy updates.
References
- OWASP Top 10 for Large Language Model Applications 2025. OWASP Foundation (2025). View source
- Cybersecurity Framework (CSF) 2.0. National Institute of Standards and Technology (NIST) (2024). View source
- ISO/IEC 27001:2022 — Information Security Management. International Organization for Standardization (2022). View source
- Personal Data Protection Act 2012. Personal Data Protection Commission Singapore (2012). View source
- AI Risk Management Framework (AI RMF 1.0). National Institute of Standards and Technology (NIST) (2023). View source
- Model AI Governance Framework (Second Edition). PDPC and IMDA Singapore (2020). View source
- Guide on Managing and Notifying Data Breaches Under the PDPA. Personal Data Protection Commission Singapore (2021). View source
