Optimizing Vision-Language Models for Edge Devices

Optimizing Vision-Language Models for Edge Devices

Advancing VLMs for resource-constrained environments in healthcare and beyond

This comprehensive survey examines how Vision-Language Models (VLMs) can be deployed effectively on edge devices despite hardware limitations.

  • VLMs combine visual understanding with natural language processing for image captioning, visual QA, and video analysis
  • Edge deployment faces challenges of limited processing power, memory, and energy
  • Applications span healthcare, autonomous vehicles, and smart surveillance systems
  • Recent optimizations enable lightweight VLMs suitable for medical applications at the edge

For healthcare providers, this research opens opportunities for on-device diagnostic assistance, patient monitoring, and medical imaging analysis without relying on cloud infrastructure.

Vision-Language Models for Edge Networks: A Comprehensive Survey

19 | 53