Zero-Shot Anomaly Detection with MLLMs

This research introduces a groundbreaking approach that leverages Multimodal Large Language Models (MLLMs) to detect and reason about anomalies without requiring training on normal samples.

Establishes a new paradigm for anomaly detection that works with limited data
Creates MM-RAD, the first multimodal reasoning anomaly detection benchmark
Evaluates 12 state-of-the-art MLLMs on anomaly detection capabilities
Demonstrates MLLMs can identify and explain abnormalities across security, medical and engineering domains

For security applications, this approach enables rapid anomaly detection in scenarios where collecting large training datasets is impractical or impossible, potentially transforming threat detection and surveillance systems.

Towards Zero-Shot Anomaly Detection and Reasoning with Multimodal Large Language Models