Combating Visual Hallucinations in AI

THRONE introduces a novel benchmark to measure and mitigate object-based hallucinations in large vision-language models (LVLMs) during free-form text generation.

Distinguishes between Type I (free-form) and Type II (specific-question) hallucinations
Evaluates hallucinations across 3,800+ images and prominent LVLMs (GPT-4V, Claude, Gemini)
Provides an automated evaluation method without requiring external model access
Reveals widespread hallucination issues even in the most advanced commercial models

This research is critical for security applications where AI visual misinterpretations could lead to misinformation, compromised decision-making, or exploitation of AI vulnerabilities in sensitive contexts.

THRONE: An Object-based Hallucination Benchmark for the Free-form Generations of Large Vision-Language Models