Back to AI Glossary
AI Sustainability & Green AI

What is Model Compression (Sustainability)?

Model Compression reduces model size and inference compute through pruning, quantization, and distillation, lowering energy consumption and carbon emissions for deployment. Compressed models enable sustainable AI at scale.

This AI sustainability term is currently being developed. Detailed content covering environmental impact, optimization strategies, implementation approaches, and use cases will be added soon. For immediate guidance on sustainable AI development and green computing strategies, contact Pertama Partners for advisory services.

Why It Matters for Business

Model compression reduces AI inference energy consumption by 60-80%, directly lowering cloud computing costs while supporting corporate sustainability goals increasingly demanded by stakeholders. Compressed models run on standard CPUs instead of expensive GPU instances, cutting infrastructure expenses from thousands to hundreds monthly for mid-market workloads. Companies demonstrating measurable AI sustainability practices gain competitive advantage in procurement processes where ESG criteria influence vendor selection.

Key Considerations
  • Techniques: pruning, quantization, distillation, low-rank factorization.
  • Reduces inference energy and latency.
  • Enables deployment on edge devices (further energy savings).
  • Can achieve 10-100x size reduction with <1% accuracy loss.
  • Lower deployment costs at scale.
  • Critical for mobile and embedded AI.
  • Target 4-bit quantization as the default compression level for inference, achieving 75% memory reduction with typically less than 2% accuracy degradation on most tasks.
  • Measure energy consumption per inference request before and after compression to quantify sustainability improvements for ESG reporting and carbon reduction commitments.
  • Combine pruning with knowledge distillation sequentially, removing redundant parameters first then training a smaller student model on the pruned teacher's outputs.

Common Questions

How much energy does AI actually use?

Training large language models can emit 300+ tons of CO2 (equivalent to 125 flights NYC-Beijing). Inference for deployed models consumes ongoing energy. Google reported AI accounted for 10-15% of their data center energy in 2023. Energy use scales with model size and usage.

How can we reduce AI carbon footprint?

Strategies include: compute-optimal training (smaller models trained longer), model compression, using renewable-powered data centers, efficient hardware (specialized AI chips), batching requests, caching results, and choosing models appropriately sized for tasks.

More Questions

Not necessarily. Compute-optimal training (Chinchilla scaling) achieves same performance with less compute. Efficient architectures (MoE, pruning) maintain quality while reducing resources. The goal is performance-per-watt optimization, not performance reduction.

References

  1. NIST Artificial Intelligence Risk Management Framework (AI RMF 1.0). National Institute of Standards and Technology (NIST) (2023). View source
  2. Stanford HAI AI Index Report 2025. Stanford Institute for Human-Centered AI (2025). View source

Need help implementing Model Compression (Sustainability)?

Pertama Partners helps businesses across Southeast Asia adopt AI strategically. Let's discuss how model compression (sustainability) fits into your AI roadmap.