EAGLE-3: Scaling up Inference Acceleration of Large Language...

EAGLE-3: Scaling up Inference Acceleration of Large Language...

By Yuhui Li, Fangyun Wei...

Abstract:

The sequential nature of modern LLMs makes them expensive and slow, and speculative sampling has proven to be an effective solution to this problem. Methods like EAGLE perform autoregression at the feature level, reusing top-layer features from the target model to achieve better results than vanilla...

Key points:

  • Research on large language models
  • Engineering application

Source: EAGLE-3: Scaling up Inference Acceleration of Large Language Models via Training-Time Test

363 | 521