
Rethinking Vision-Language Models in Radiology
Evaluating the Reality of Text Integration in Medical Imaging
This study critically evaluates whether recent vision-language pre-training models truly leverage textual information effectively in radiology applications.
- Examines the gap between claimed progress and actual text utilization in medical vision-language models
- Critically assesses whether current radiology datasets provide sufficient text supervision for effective learning
- Questions if existing models actually leverage the fine-grained expert knowledge encoded in medical text
- Provides a reality check on current methods' limitations and potential directions for improvement
This research is significant for medical imaging as it challenges assumptions about the effectiveness of current vision-language approaches in radiology, potentially redirecting research toward more effective integration of textual and visual information in clinical applications.
A Reality Check of Vision-Language Pre-training in Radiology: Have We Progressed Using Text?