EAGLE-3: Scaling up Inference Acceleration of Large Language...

Abstract:

The sequential nature of modern LLMs makes them expensive and slow, and speculative sampling has proven to be an effective solution to this problem. Methods like EAGLE perform autoregression at the feature level, reusing top-layer features from the target model to achieve better results than vanilla...

Key points:

Research on large language models
Engineering application

Source: EAGLE-3: Scaling up Inference Acceleration of Large Language Models via Training-Time Test