BIOMEDICA: Democratizing Biomedical AI

BIOMEDICA introduces a groundbreaking open framework for extracting, annotating, and building vision-language models from biomedical literature, addressing the critical shortage of diverse medical imaging datasets.

Extracts 14 million figure-caption pairs across diverse biomedical domains
Enables development of generalist biomedical AI models spanning multiple specialties
Provides open-source architecture for continuous expansion and improvement
Demonstrates effectiveness across pathology, radiology, ophthalmology and other medical fields

This research fundamentally changes medical AI development by providing open access to diverse training data, potentially accelerating diagnostic tools, medical education, and research applications across healthcare disciplines.

BIOMEDICA: An Open Biomedical Image-Caption Archive, Dataset, and Vision-Language Models Derived from Scientific Literature