Fighting Multimodal Misinformation

This research introduces a novel system that uses large language models to detect when images or videos are manipulated or used out of context in news articles.

Key innovations:

Analyzes both text content and image/video provenance metadata
Identifies mismatches between visual media and text narratives
Focuses on the dangerous multimodal aspect of misinformation campaigns
Leverages LLMs to understand contextual relationships across modalities

Security implications: The approach addresses a critical vulnerability in current misinformation detection systems that often miss the interplay between text and visual elements, strengthening digital information integrity and trust.

Large Language Models and Provenance Metadata for Determining the Relevance of Images and Videos in News Stories