BIOMEDICA: Democratizing Biomedical AI

BIOMEDICA: Democratizing Biomedical AI

Creating Open Vision-Language Models from Scientific Literature

BIOMEDICA introduces a groundbreaking open framework for extracting, annotating, and building vision-language models from biomedical literature, addressing the critical shortage of diverse medical imaging datasets.

  • Extracts 14 million figure-caption pairs across diverse biomedical domains
  • Enables development of generalist biomedical AI models spanning multiple specialties
  • Provides open-source architecture for continuous expansion and improvement
  • Demonstrates effectiveness across pathology, radiology, ophthalmology and other medical fields

This research fundamentally changes medical AI development by providing open access to diverse training data, potentially accelerating diagnostic tools, medical education, and research applications across healthcare disciplines.

BIOMEDICA: An Open Biomedical Image-Caption Archive, Dataset, and Vision-Language Models Derived from Scientific Literature

49 | 167