Eye on the Future: MLLMs in Ophthalmology

This research introduces a novel benchmark dataset specifically designed for evaluating how multimodal large language models (MLLMs) interpret eye examination images.

Combines fundus photographs and OCT images with detailed clinical metadata
Tests AI models on their ability to diagnose common eye conditions like diabetic retinopathy
Evaluates performance across multiple state-of-the-art MLLMs including GPT-4V and Gemini Pro
Identifies current limitations in medical visual reasoning for ophthalmic applications

This benchmark addresses a critical gap in MLLM evaluation for specialized medical domains, potentially accelerating the development of AI assistants for ophthalmologists and improving diagnostic accuracy.

A Novel Ophthalmic Benchmark for Evaluating Multimodal Large Language Models with Fundus Photographs and OCT Images