AI-Powered Visual Grounding in Medical Imaging

AI-Powered Visual Grounding in Medical Imaging

Automating the connection between radiological text and image locations

This research develops an innovative vision-language model that automatically connects text descriptions in radiology reports to their precise locations within PET/CT images.

  • Created an automated pipeline to generate weakly-supervised labels from existing reports
  • Trained a specialized 3D vision-language model for visual grounding in medical imaging
  • Demonstrated potential for improving radiology workflow by linking text descriptions to image findings
  • Applied across multiple radiotracer types (FDG, DCFPyL, DOTATE, Fluciclovine)

This breakthrough addresses a critical gap in medical AI by enabling more precise identification of lesions and abnormalities without requiring extensive manual annotation, potentially enhancing diagnostic accuracy and radiologist efficiency.

Vision-Language Modeling in PET/CT for Visual Grounding of Positive Findings

59 | 167