The millions of dollars Data Problem Nobody Saw Coming
A Thai insurance company invested 18 months building a fraud detection AI. The model was sophisticated: graph neural networks analyzing claim patterns, natural language processing on adjuster notes, computer vision examining damage photos. Technical architecture was flawless.
Two months before planned production launch, data scientists discovered a problem: 40% of historical claims data had missing fields, customer addresses were inconsistent across three legacy systems ("Bangkok" vs "BKK" vs "กรุงเทพมหานคร"), fraud labels existed for only 8% of historical claims (making supervised learning impossible), and critical data lived in an AS/400 mainframe with no API access.
The AI project stalled. Data remediation took 14 months and cost millions of dollars—more than the original AI budget. When production finally launched, business value was a fraction of projections because market conditions had changed.
This pattern repeats constantly. Gartner's 2025 AI Implementation Study found that a majority of organizations lack data readiness for AI, yet they launch AI projects anyway, discovering data problems after committing resources, timelines, and executive credibility.
Data readiness isn't a technical detail to handle during implementation. It's a prerequisite that determines whether AI projects succeed or fail. Organizations that assess and fix data readiness before starting AI achieve 5.significantly higher ROI than those treating data as an afterthought.
What Data Readiness Actually Means
Most organizations confuse "having data" with "being data-ready for AI."
They have terabytes of data. Customer transactions, operational logs, sensor readings, support tickets, financial records. Data exists.
But data readiness requires six conditions rarely met:
1. Data Quality
Data must be accurate, complete, consistent, and current.
Accurate: Reflects reality without significant errors Complete: Contains all fields AI models need Consistent: Uses standard formats, codes, and definitions across systems Current: Fresh enough for AI decisions to be relevant
Most enterprise data fails at least one criterion. A 2024 Experian study found:
- most organizations rate data quality as "poor" or "fair" (not "good" or "excellent")
- Average data accuracy is 84%—meaning 16% of data contains errors
- 30-40% Of records have at least one missing critical field
- Data inconsistency across systems affects 60% of integrated data sets
2. Data Accessibility
Data must be practically available when and where AI needs it.
Accessibility failures:
- Siloed systems: Critical data trapped in systems that don't communicate
- Permission barriers: Teams lack authorization to access required data
- Legacy constraints: Mainframes and legacy systems without modern APIs
- Manual extraction: Data exists but requires manual export/import
- Format incompatibility: Data in formats AI systems can't ingest
A Singapore manufacturing company had perfect machine sensor data for predictive maintenance AI—all stored in proprietary vendor formats with no export capability. Data existed but was practically inaccessible.
3. Data Governance
Clear policies on data usage, privacy, security, and compliance.
Governance addresses:
- Usage authorization: What data can be used for AI training and inference?
- Privacy compliance: How to handle PII, sensitive data, and consent?
- Data provenance: Where did data come from? Who owns it?
- Audit trails: Can you prove AI decisions used appropriate data?
- Bias mitigation: How to detect and address biased training data?
Data governance adequate for traditional analytics often fails for AI. Using customer data for quarterly reporting requires different governance than using it to train models that make automated decisions affecting those customers.
4. Data Volume and Completeness
Sufficient quantity and breadth of data for AI models to learn effectively.
Volume requirements vary by AI approach:
- Supervised learning: Thousands to millions of labeled examples
- Deep learning: Often requires millions of training instances
- Rare event detection: Needs sufficient examples of rare cases
- Generalization: Data must cover full range of production scenarios
Organizations often have plenty of data for common cases but insufficient data for edge cases, seasonal variations, rare events, new product categories, or emerging customer segments.
5. Data Documentation
Clear descriptions of what data means, how it's collected, and its limitations.
Critical documentation:
- Data dictionaries: What does each field contain?
- Business context: Why was data collected? What does it represent?
- Collection methods: How, when, and by whom was data gathered?
- Known limitations: What biases, gaps, or quirks exist?
- Update frequency: How often does data change?
Without documentation, data scientists waste months reverse-engineering what data means, make incorrect assumptions about data meaning, or use data inappropriately due to misunderstanding context.
6. Data Engineering Capability
Team capacity to prepare, transform, and maintain data for AI.
Required capabilities:
- Data pipeline development: Building automated data flows
- ETL/ELT expertise: Extracting, transforming, loading data
- Data quality monitoring: Detecting and addressing quality degradation
- Schema management: Handling evolving data structures
- Performance optimization: Making data access fast enough for AI
Most organizations understaff data engineering relative to data science—creating bottlenecks where brilliant models wait months for properly prepared data.
The Five Data Readiness Failure Modes
Failure Mode 1: Data Quality Issues (most organizations)
AI amplifies data quality problems that traditional analytics tolerated.
Missing values: Analysts can skip rows with nulls. AI models trained on incomplete data make unreliable predictions or fail entirely.
Inconsistent formats: Analysts normalize during analysis. AI models see "2024-01-15", "15/1/2024", "Jan 15 2024" as different values, breaking learning.
Duplicate records: Analysts deduplicate during queries. AI models trained on duplicates overweight duplicated patterns, creating bias.
Outdated information: Analysts know to question old data. AI models trained on stale data make predictions based on outdated reality.
Data entry errors: Analysts spot obvious mistakes. AI models treat errors as legitimate patterns, degrading accuracy.
Real example: A Malaysian e-commerce company's product recommendation AI performed poorly despite sophisticated algorithms. Investigation revealed 40% of product categories had typos ("Electornics", "Electroincs", "Electronics"), making it impossible for models to learn category patterns. Data quality, not model architecture, was the problem.
Southeast Asian context: Multilingual data creates unique quality challenges. Customer names in multiple scripts (Latin, Thai, Arabic, Chinese), addresses with mixed formatting conventions, dates in different calendar systems (Gregorian, Buddhist, Islamic), and product descriptions code-switching between English and local languages all complicate AI data quality.
Prevention: Conduct data quality assessment before AI projects. Profile data to measure completeness, accuracy, consistency. Establish data quality SLAs (e.g., <5% missing values, <2% format inconsistencies). Implement automated data quality monitoring. Budget 3-6 months for data quality remediation before AI development.
Failure Mode 2: Data Accessibility Problems (a majority of organizations)
Data exists but teams can't practically access it for AI.
System silos: Customer data in CRM, transaction data in ERP, support data in ticketing system, web behavior in analytics platform. Integration requires extensive ETL that wasn't budgeted.
Permission barriers: Data scientists need production customer data for model training. Security policies prohibit access. Legal review takes 6 months.
Legacy system constraints: Critical historical data on AS/400 mainframe. No REST API. Export requires mainframe expertise organization no longer employs.
Real-time requirements: AI needs sub-second data access. Legacy systems designed for batch processing provide daily updates. Gap is architectural, not solvable through optimization.
Cross-border constraints: Southeast Asian regional AI requires data from Indonesia, Thailand, Malaysia, Philippines. Data localization laws prohibit transferring data to centralized location. Federated learning is complex workaround.
Real example: A Philippine bank's credit scoring AI needed transaction history from core banking system, income data from HR system, payment behavior from credit card platform, and customer demographics from CRM. Each system had different owners, approval processes, and data formats. Data integration took 8 months—longer than building the AI model.
Southeast Asian context: Regional M&A creates data accessibility nightmares. Acquiring company's AI initiatives can't access acquired subsidiary data due to incompatible systems, varying privacy laws across markets, and legacy integration debt from previous acquisitions.
Prevention: Map data sources and accessibility during AI planning. Identify integration requirements early. Prototype data pipelines before full AI development. Budget dedicated data engineering resources (often 2:1 ratio to data scientists). Build reusable data access layers that serve multiple AI initiatives.
Failure Mode 3: Data Governance Failures (a majority of organizations)
Governance adequate for reporting fails for AI's automated decision-making.
Usage authorization ambiguity: Marketing team collected customer data for campaigns. Can it train AI for churn prediction? Legal isn't sure. AI project halts pending review.
Privacy compliance gaps: Data fine for human-reviewed reporting becomes problematic when feeding automated systems. PDPA consent for "improving services" doesn't clearly cover AI training.
Audit trail absence: Regulatory inquiry asks: "Why did AI deny this loan?" Organization can't reconstruct which data version the model used because data provenance wasn't tracked.
Bias detection failure: AI model inherits historical bias from training data. Organization had no process to assess training data for demographic bias before model deployment.
Cross-border complexity: Southeast Asian AI uses data from multiple jurisdictions. Each market has different privacy laws, consent requirements, and data transfer restrictions. Governance framework designed for single-market analytics doesn't scale.
Real example: An Indonesian fintech's lending AI used transaction data to predict creditworthiness. Privacy advocates challenged whether customers consented to AI-driven decisions when they agreed to "data analytics for service improvement." Regulatory investigation led to $2.8M fine and AI system shutdown pending consent rework.
Southeast Asian context: Regional privacy law fragmentation creates governance complexity:
- Singapore PDPA: Strict consent requirements, legitimate interests framework
- Malaysia PDPB: Personal data definition includes indirect identifiers
- Indonesia PDP Law: Mandatory data localization for certain data types
- Thailand PDPA: Explicit consent required for sensitive data processing
- Philippines DPA: Accountability framework requires impact assessments
AI governance framework must address most restrictive requirements across all operating markets.
Prevention: Establish AI-specific data governance before projects start. Get legal review on training data usage, implement data lineage tracking, conduct privacy impact assessments for AI uses, define bias detection and mitigation processes, and create clear policies on AI data retention and deletion.
Failure Mode 4: Insufficient Data Volume (a majority of organizations)
Organizations have data for common cases but insufficient data for AI effectiveness.
Rare event detection: Fraud detection AI needs thousands of fraud examples. Organization has millions of transactions but only 47 confirmed fraud cases. Insufficient signal for supervised learning.
New product categories: Recommendation AI trained on historical data can't recommend products launched in last 6 months—no training data exists.
Seasonal variations: Retail demand forecasting AI trained on normal periods fails during holiday spikes it never saw in training data.
Edge case coverage: Customer service chatbot handles 80% of common queries well but fails on unusual requests absent from training data.
Demographic gaps: AI trained primarily on urban customer data performs poorly for rural customers underrepresented in training set.
Real example: A Thai healthcare AI for diagnosis needed balanced training data across disease types. For 15 common conditions: abundant data. For 30 rare conditions: insufficient examples. AI worked for common cases but was clinically useless for rare diseases doctors most needed help diagnosing.
Southeast Asian context: Market diversity creates data volume challenges. Regional AI requires data across:
- Economic diversity: Singapore affluence vs. lower-income markets
- Digital maturity: High smartphone penetration vs. feature phone users
- Language variety: Formal language vs. colloquial dialects
- Cultural differences: Conservative vs. cosmopolitan markets
AI trained primarily on Singapore data often fails in Indonesia, Thailand, Philippines due to insufficient training data for different market contexts.
Prevention: Assess data volume requirements before committing to AI approach. For rare events, consider alternative techniques (anomaly detection, one-class classification). Generate synthetic data to augment underrepresented categories. Partner with others to pool data (with appropriate privacy protections). Start with use cases where data volume is adequate.
Failure Mode 5: Missing Documentation (many organizations)
Data exists but nobody understands what it means anymore.
Field meaning unclear: Database column "STATUS_CD" contains values 0-7. Original developer left 5 years ago. No documentation. Data scientists guess at meaning—guesses are wrong.
Business context lost: Field "CUST_SCORE" exists. Is it credit score? Satisfaction score? Propensity score? Documentation doesn't say. AI model uses it incorrectly.
Collection method unknown: Data shows seasonal patterns. Are they real or artifacts of changing collection methods? No documentation of collection history.
Known limitations undocumented: Field is accurate for customers in Singapore but unreliable for Thailand. Historical knowledge in someone's head, not written down.
Update frequency unclear: Model assumes daily data updates. Data actually updates weekly. Model degrades due to stale inputs.
Real example: A Vietnamese manufacturing company built predictive maintenance AI using sensor data. Model performed terribly. Investigation revealed: sensor A was replaced 2 years ago with different calibration, sensor B goes offline during maintenance (creating artificial patterns), and sensor C readings are aggregated differently across factories. None of this was documented. Data scientists wasted 4 months troubleshooting "model problems" that were actually data understanding problems.
Southeast Asian context: Regional organizations often have:
- Legacy system documentation in vendor-specific languages (Japanese, Korean, Chinese vendor systems with untranslated docs)
- Tribal knowledge concentrated in long-tenure employees approaching retirement
- Merger/acquisition history creating data from multiple undocumented sources
- Outsourced IT history where contractors built systems without leaving documentation
Prevention: Audit existing data documentation before AI projects. Create data dictionaries for all AI data sources. Document business context, collection methods, known limitations. Interview domain experts to capture tribal knowledge. Make documentation maintenance an ongoing operational requirement, not one-time project deliverable.
The Data Readiness Assessment Framework
Before committing resources to AI, assess data readiness systematically.
Step 1: Inventory Data Assets
Catalog all data potentially relevant to proposed AI use case:
- Data sources: Which systems contain needed data?
- Data types: Structured, unstructured, streaming, batch?
- Data ownership: Who owns each data source?
- Current usage: How is data currently used?
- Access methods: APIs, database queries, file exports, manual?
Step 2: Assess Data Quality
For each data source, measure:
- Completeness: % of records with all required fields populated
- Accuracy: % of records with verified accurate values
- Consistency: % of values following standard formats/definitions
- Timeliness: Average data age, update frequency
- Validity: % of values within expected ranges/formats
Score each dimension 0-100. Average score <70 indicates data quality remediation required before AI.
Step 3: Evaluate Data Accessibility
For needed data:
- Technical access: Can systems extract data programmatically?
- Permission access: Do teams have authorization to access data?
- Integration complexity: What's required to combine data from multiple sources?
- Latency: How fast can data be accessed?
- Cost: What does data access/integration cost?
Accessibility score <60 indicates significant integration work required.
Step 4: Review Data Governance
Assess governance readiness:
- Usage policies: Clear guidelines on AI training data usage?
- Privacy compliance: Adequate consent and privacy controls?
- Data provenance: Ability to track data sources and lineage?
- Audit capabilities: Can you reconstruct AI decisions?
- Bias assessment: Processes to detect/mitigate biased training data?
Governance score <65 indicates policy/process gaps that create legal/regulatory risk.
Step 5: Validate Data Volume
Confirm sufficient data:
- Training volume: Minimum records for effective model training?
- Coverage: Data spans full range of production scenarios?
- Balance: Adequate representation across important categories?
- Edge cases: Sufficient data for rare but important scenarios?
Volume assessment determines whether supervised learning is viable or alternative approaches (transfer learning, semi-supervised, few-shot) are needed.
Step 6: Audit Documentation
Review data documentation:
- Data dictionaries: Definitions for all fields?
- Business context: Purpose and meaning documented?
- Collection methods: How/when/why data is captured?
- Limitations: Known issues, biases, gaps documented?
- Update processes: Frequency and procedures documented?
Documentation score <50 indicates significant risk of data misuse due to misunderstanding.
Step 7: Assess Engineering Capacity
Evaluate team capability:
- Data engineering headcount: Sufficient resources for data preparation?
- Technical skills: ETL, pipelines, data quality tooling?
- Infrastructure: Data lake/warehouse, processing capacity?
- Tooling: Modern data engineering stack?
Engineering capacity inadequate signals need for hiring, training, or external support before AI projects.
The Data Readiness Roadmap
Based on assessment, create data readiness roadmap:
Phase 1: Quick Wins (0-3 Months)
Address blockers preventable with moderate effort:
- Document critical data: Create data dictionaries for primary AI data sources
- Fix obvious quality issues: Address known data quality problems
- Establish basic governance: Define usage policies, consent frameworks
- Prototype integration: Prove critical data sources can be combined
Phase 2: Foundation Building (3-9 Months)
Invest in sustainable data infrastructure:
- Data quality remediation: Systematic cleanup of priority data assets
- Integration architecture: Build reusable data access layers
- Governance frameworks: Implement privacy, audit, bias detection processes
- Team building: Hire data engineers, upskill existing team
- Tooling investment: Modern data engineering stack (Airflow, dbt, etc.)
Phase 3: Continuous Improvement (Ongoing)
Maintain and enhance data readiness:
- Automated quality monitoring: Real-time data quality dashboards
- Proactive documentation: Maintain documentation as systems evolve
- Governance maturity: Advance from compliance to strategic governance
- Architecture evolution: Modernize legacy systems incrementally
Case Study: Proactive Data Readiness
Company: Grab (Singapore-headquartered super-app across Southeast Asia)
Challenge: Scale AI across ride-hailing, food delivery, payments in 8 markets
Data readiness problems (discovered through systematic assessment):
- Driver data quality varied 40-85% across markets
- Siloed systems: rider app, driver app, merchant platform, payments each separate
- Privacy laws varied across 8 markets with no unified governance
- Insufficient data for rare scenarios (extreme weather, emergencies, new cities)
- Documentation scattered across teams, markets, systems
Approach: Data readiness before AI scale
Year 1: Assessment and prioritization
- Comprehensive data quality audit across all markets
- Accessibility mapping: catalog all data sources, integration points
- Governance gap analysis against most restrictive market requirements
- Volume assessment: identify data-poor scenarios
- Documentation audit: systematically capture tribal knowledge
Year 2-3: Remediation and infrastructure
- Data quality: Implemented automated quality monitoring, remediated priority issues
- Integration: Built unified data platform connecting all systems
- Governance: Regional privacy framework meeting all market requirements
- Augmentation: Synthetic data generation for underrepresented scenarios
- Documentation: Enterprise data catalog with business context
- Team: Grew data engineering 3x, established centers of excellence per market
Investment: $47M over 3 years (before AI projects)
Results (enabling AI at scale):
- AI project success rate: 78% (vs. industry average 23%)
- Time to production: 4-6 months (vs. industry average 12-18 months)
- Data quality: 91% average across markets (up from 63%)
- Compliance: Zero privacy violations in 3 years
- Reusability: Data infrastructure supports 40+ AI initiatives
- ROI: $280M cumulative AI value (5.9x data infrastructure investment)
Key lesson: Proactive data readiness is cheaper and faster than reactive remediation. Grab's $47M investment prevented estimated $150M+ in failed AI projects and delayed timelines.
Practical Recommendations
For Organizations Planning AI
- Assess data readiness before AI projects: 6-framework assessment covering quality, accessibility, governance, volume, documentation, capacity
- Budget data readiness at 40-60% of AI project costs: Underfunding data work guarantees failure
- Remediate data proactively: Reactive data fixes cost 3-significantly more than proactive preparation
- Build reusable data infrastructure: Serve multiple AI initiatives, not one-off solutions
- Staff data engineering properly: 2:1 ratio data engineers to data scientists
- Document ruthlessly: Capture knowledge before it walks out the door
- Measure data readiness as KPI: Track quality, accessibility, governance scores quarterly
For Organizations with Struggling AI
If AI projects are stalling:
- Audit data readiness: Often "AI problems" are data problems
- Quantify data quality: Profile data to measure completeness, accuracy, consistency
- Map accessibility blockers: Identify integration, permission, legacy system barriers
- Review governance gaps: Ensure legal foundation for AI data usage
- Assess documentation: Verify data scientists understand what data means
- Right-size volume expectations: Match AI approach to available data
- Invest in remediation: Better data enables simpler models to outperform complex models on bad data
For Executives Approving AI Investments
Before approving AI projects, ask:
- "What's our data readiness score?" Demand quantified assessment
- "What data quality issues exist?" Ensure blockers are identified
- "Can we access needed data?" Verify accessibility, not just existence
- "Do we have governance approval?" Legal/compliance review before commitment
- "Is documentation adequate?" Prevent months wasted on data archaeology
- "Do we have enough data?" Match AI approach to data volume
- "What's the data remediation plan?" Budget and timeline for fixes
If the team can't answer confidently, delay AI investment until data readiness improves.
Conclusion: Data First, AI Second
The AI industry perpetuates a misleading narrative: AI is the hard part. Build sophisticated models, deploy cutting-edge algorithms, scale infrastructure.
Reality: AI is often the easy part. Plenty of tools, frameworks, pre-trained models. The hard part is data.
Clean, accessible, governed, sufficient, documented data prepared by capable teams using modern infrastructure.
a majority of organizations aren't data-ready for AI. They launch AI projects anyway. Months later, they discover data problems that should have been obvious upfront.
Reactive data remediation costs 3-significantly more than proactive preparation. AI projects stall for 6-12 months addressing data issues. By the time AI finally works, business conditions have changed and value has evaporated.
The alternative: assess data readiness systematically before AI investment. Fix data problems proactively. Build reusable infrastructure. Establish governance. Document thoroughly. Staff appropriately.
This approach costs more upfront. Grab invested $47M in data readiness before scaling AI. But it enabled $280M in AI value—5.strong ROI.
Organizations that prioritize data readiness achieve:
- 3.significantly higher AI project success rates
- 2.significantly faster time to production
- 5.significantly better ROI on AI investments
- Reusable infrastructure serving multiple initiatives
- Sustainable AI capabilities vs. one-off projects
The constraint on AI value isn't model sophistication. It's data readiness.
Organizations that recognize this—that invest in data before AI, that treat data readiness as prerequisite rather than afterthought—build AI capabilities that compound over time.
Those that don't join the 68% discovering too late that AI projects can't succeed on broken data foundations.
Data first. AI second. Not the other way around.
Common Questions
Conduct systematic assessment across six dimensions: (1) Quality—measure completeness (>90%), accuracy (>95%), consistency (>85%) via data profiling; (2) Accessibility—verify programmatic access to all needed data sources; (3) Governance—confirm clear policies on AI data usage, privacy, audit trails; (4) Volume—validate sufficient training examples for your AI approach; (5) Documentation—ensure data dictionaries, business context, known limitations are documented; (6) Engineering capacity—assess team capability to prepare and maintain AI data. Average score <70 across dimensions indicates readiness problems requiring remediation before AI projects.
AI amplifies data quality problems tolerable in traditional analytics. Analysts can skip records with missing values, normalize inconsistent formats during queries, and spot obvious errors. AI models trained on incomplete data make unreliable predictions, see format variations as different values (breaking pattern learning), treat duplicates as independent observations (creating bias), and use errors as legitimate training signal. A data quality issue that's a minor reporting inconvenience becomes an AI deployment blocker. Industry data shows AI models trained on data with <90% completeness or <95% accuracy rarely achieve production-acceptable performance.
Reporting tolerates imperfect data through analyst judgment and manual intervention. AI requires higher standards: (1) Completeness—reports can work with partial data; AI needs comprehensive training examples. (2) Consistency—analysts normalize during analysis; AI needs pre-normalized data. (3) Freshness—reports use point-in-time snapshots; AI often needs real-time data. (4) Governance—reporting for internal use has lighter compliance; AI making automated decisions requires stricter privacy controls. (5) Volume—reports work on samples; AI needs full datasets for training. (6) Documentation—analysts ask colleagues about unclear data; AI teams need written documentation at scale. Organizations confuse having data (for reporting) with having AI-ready data.
Industry benchmarks suggest 40-60% of total AI project budget for data readiness (assessment, quality remediation, integration, governance, documentation, engineering capacity). For a $1M AI project, budget $400-600K for data work. This seems high but prevents far higher costs: reactive data remediation during AI deployment costs 3-5x more than proactive preparation, failed AI projects waste 100% of investment, and delayed timelines lose business value. Grab's case study showed $47M data investment enabled $280M AI value (5.9x ROI). Skimp on data readiness to save 40% upfront; lose 100% when AI fails on bad data.
Technically yes, practically no. Organizations that defer data remediation discover: (1) Problems emerge late—after timelines committed, budgets spent, executive credibility invested. (2) Remediation takes longer under pressure—rushed data fixes introduce new errors. (3) Costs multiply—reactive fixes cost 3-5x proactive preparation. (4) AI teams stall—data scientists wait idle for clean data. (5) Business value erodes—delays allow market conditions to change. (6) Shortcuts get taken—teams accept lower-quality data to meet deadlines, degrading AI performance. The Thai insurance case study showed reactive data remediation took 14 months and $22M. Proactive assessment would have revealed problems before commitment. Fix data before AI, not during.
Regional organizations face distinct challenges: (1) Multilingual data complexity—customer info in multiple scripts, mixed formatting conventions, code-switching text. (2) Regulatory fragmentation—PDPA, PDPB, PDP Law, PDPA Thailand vary by market; single governance framework difficult. (3) Legacy system prevalence—AS/400 mainframes, proprietary formats, no API access common in established enterprises. (4) M&A integration debt—acquisitions create data accessibility nightmares across incompatible systems. (5) Documentation in vendor languages—Japanese/Korean/Chinese vendor systems with untranslated docs. (6) Tribal knowledge concentration—long-tenure employees with undocumented data understanding approaching retirement. (7) Data localization requirements—Indonesia, Vietnam mandate local storage, complicating regional AI.
Not necessarily—use data readiness assessment to make informed go/no-go decisions. Calculate remediation effort: If data problems are fixable in 3-6 months with reasonable budget (<50% of AI project cost), proceed with data-first roadmap. If remediation requires >12 months or exceeds AI development cost, either pause until data improves through other initiatives, or pivot to different AI use case with better data readiness. If data problems are unfixable (insufficient volume for supervised learning, inaccessible legacy data, insurmountable governance barriers), kill the project. Better to kill after $50K assessment than after $5M failed deployment. Data readiness assessment prevents sunk cost fallacy from driving doomed AI investments.
References
- AI Risk Management Framework (AI RMF 1.0). National Institute of Standards and Technology (NIST) (2023). View source
- ISO/IEC 42001:2023 — Artificial Intelligence Management System. International Organization for Standardization (2023). View source
- Model AI Governance Framework (Second Edition). PDPC and IMDA Singapore (2020). View source
- OECD Principles on Artificial Intelligence. OECD (2019). View source
- ISO/IEC 27001:2022 — Information Security Management. International Organization for Standardization (2022). View source
- Enterprise Development Grant (EDG) — Enterprise Singapore. Enterprise Singapore (2024). View source
- ASEAN Guide on AI Governance and Ethics. ASEAN Secretariat (2024). View source
