Fighting Video Hallucinations

PaMi-VDPO introduces a novel framework that significantly reduces hallucinations in Video Multimodal Large Language Models through prompt-aware preference learning.

Eliminates the need for manual preference annotation by cleverly using video augmentations
Implements a multi-instance learning strategy that improves robustness across diverse prompts
Prevents models from generating false or misleading information about video content
Enhances security by reducing the risk of AI systems propagating misinformation

This research addresses critical security concerns by ensuring AI systems provide accurate, truthful responses about video content—essential for applications in content moderation, surveillance, and information verification systems.

PaMi-VDPO: Mitigating Video Hallucinations by Prompt-Aware Multi-Instance Video Preference Learning