
Combating Visual Disinformation in News
Using Vision-Language Models to Verify Cross-modal Entity Consistency
This research introduces a novel approach for detecting misleading news by specifically verifying if entities mentioned in text actually match what appears in accompanying images.
- Focuses on identifying inconsistencies between specific entities (people, places, events) across text and images
- Leverages advanced vision-language models to detect manipulated or false information
- Creates a benchmark dataset for evaluating cross-modal entity verification
- Demonstrates improved accuracy in identifying misleading content compared to previous approaches
For security professionals, this research provides essential tools to automatically verify information integrity across modalities, helping combat increasingly sophisticated disinformation campaigns.
Verifying Cross-modal Entity Consistency in News using Vision-language Models