LLMs Reimagine Text-to-Image Generation

ARRA (Autoregressive Representation Alignment) unlocks the potential of large language models to generate coherent images directly, with significant applications in medical imaging.

Aligns LLM hidden states with visual representations from foundational models
Uses a hybrid token approach that maintains both local and global coherence
Achieves state-of-the-art results on medical datasets (MIMIC-CXR)
Requires no architectural changes to existing LLM frameworks

This breakthrough enables medical professionals to generate diagnostically relevant medical visualizations from text descriptions, potentially improving communication and understanding of patient conditions.

Unleashing the Potential of Large Language Models for Text-to-Image Generation through Autoregressive Representation Alignment