Optimizing Vision-Language Models for Edge Devices

This comprehensive survey examines how Vision-Language Models (VLMs) can be deployed effectively on edge devices despite hardware limitations.

VLMs combine visual understanding with natural language processing for image captioning, visual QA, and video analysis
Edge deployment faces challenges of limited processing power, memory, and energy
Applications span healthcare, autonomous vehicles, and smart surveillance systems
Recent optimizations enable lightweight VLMs suitable for medical applications at the edge

For healthcare providers, this research opens opportunities for on-device diagnostic assistance, patient monitoring, and medical imaging analysis without relying on cloud infrastructure.

Vision-Language Models for Edge Networks: A Comprehensive Survey