Rethinking Vision-Language Models in Radiology

This study critically evaluates whether recent vision-language pre-training models truly leverage textual information effectively in radiology applications.

Examines the gap between claimed progress and actual text utilization in medical vision-language models
Critically assesses whether current radiology datasets provide sufficient text supervision for effective learning
Questions if existing models actually leverage the fine-grained expert knowledge encoded in medical text
Provides a reality check on current methods' limitations and potential directions for improvement

This research is significant for medical imaging as it challenges assumptions about the effectiveness of current vision-language approaches in radiology, potentially redirecting research toward more effective integration of textual and visual information in clinical applications.

A Reality Check of Vision-Language Pre-training in Radiology: Have We Progressed Using Text?