Rethinking Vision-Language Models in Radiology

Rethinking Vision-Language Models in Radiology

Evaluating the Reality of Text Integration in Medical Imaging

This study critically evaluates whether recent vision-language pre-training models truly leverage textual information effectively in radiology applications.

  • Examines the gap between claimed progress and actual text utilization in medical vision-language models
  • Critically assesses whether current radiology datasets provide sufficient text supervision for effective learning
  • Questions if existing models actually leverage the fine-grained expert knowledge encoded in medical text
  • Provides a reality check on current methods' limitations and potential directions for improvement

This research is significant for medical imaging as it challenges assumptions about the effectiveness of current vision-language approaches in radiology, potentially redirecting research toward more effective integration of textual and visual information in clinical applications.

A Reality Check of Vision-Language Pre-training in Radiology: Have We Progressed Using Text?

153 | 167