
Learning to Keep a Promise: Scaling Language Model Decoding ...
By Tian Jin, Ellie Y. Cheng...
Abstract:
Decoding with autoregressive large language models (LLMs) traditionally occurs sequentially, generating one token after another. An emerging line of work explored parallel decoding by identifying and simultaneously generating semantically independent chunks of LLM responses. However, these technique...
Key points:
- Research on large language models
- Engineering application