Reducing Hallucinations in Vision-Language Models

MINT presents a groundbreaking approach that reduces hallucinations in Large Vision-Language Models through attention-based token reduction techniques, requiring no additional training or data annotation.

Identifies and addresses attention redundancy in the decoding process
Achieves significant improvement in model reliability without compromising performance
Provides a practical solution applicable to existing deployed models
Demonstrates effectiveness across multiple vision-language tasks

For security applications, this advancement is crucial as it enhances the trustworthiness of AI systems that process visual information, potentially reducing risks in critical domains like autonomous vehicles, medical diagnostics, and surveillance systems.

MINT: Mitigating Hallucinations in Large Vision-Language Models via Token Reduction