The Hidden Costs of LLMs: Energy Consumption at Inference Time

This research provides the first comprehensive energy consumption analysis of large language models during inference across diverse NLP tasks.

Key findings:

Inference energy costs are substantial but can be reduced by 70-90% through optimization techniques
Model size and batch processing significantly impact energy consumption
Quantization methods offer dramatic efficiency improvements with minimal performance loss
System-level configurations like CPU vs. GPU usage create substantial energy trade-offs

For engineering teams, this research offers practical guidance on implementing sustainable AI systems through careful model selection and configuration optimization, potentially reducing operational costs while maintaining performance.

Towards Sustainable NLP: Insights from Benchmarking Inference Energy in Large Language Models