Optimizing Vision-Language Models

Mordal is an automated framework that selects the optimal pretrained vision models for vision-language tasks, enhancing performance across diverse applications.

Addresses the challenge of choosing the right vision model from many available options
Helps maximize VLM capabilities across different benchmarks and use cases
Eliminates manual trial-and-error in model selection
Particularly valuable for specialized domains like healthcare

For medical applications, this research enables more efficient deployment of VLMs in diagnostic imaging, clinical documentation, and patient accessibility tools—ensuring the best visual processing capabilities are utilized without extensive experimentation.

Mordal: Automated Pretrained Model Selection for Vision Language Models