AI-Powered Visual Grounding in Medical Imaging

This research develops an innovative vision-language model that automatically connects text descriptions in radiology reports to their precise locations within PET/CT images.

Created an automated pipeline to generate weakly-supervised labels from existing reports
Trained a specialized 3D vision-language model for visual grounding in medical imaging
Demonstrated potential for improving radiology workflow by linking text descriptions to image findings
Applied across multiple radiotracer types (FDG, DCFPyL, DOTATE, Fluciclovine)

This breakthrough addresses a critical gap in medical AI by enabling more precise identification of lesions and abnormalities without requiring extensive manual annotation, potentially enhancing diagnostic accuracy and radiologist efficiency.

Vision-Language Modeling in PET/CT for Visual Grounding of Positive Findings