Making LLMs Faster & More Efficient

LoLCATs is a two-step method that transforms computationally expensive LLMs into more efficient linear models while preserving their performance capabilities.

Reduces computational complexity from quadratic to subquadratic
Works effectively on models up to 70B parameters (previous methods limited to 7B)
Maintains performance quality that other linearization techniques couldn't achieve
Eliminates the need for expensive retraining on billions of tokens

This engineering breakthrough makes powerful LLMs more accessible for production deployment, enabling faster inference and reduced computing costs while maintaining high-quality outputs.

Original Paper: LoLCATs: On Low-Rank Linearizing of Large Language Models