Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

No Need For Real Anomaly: MLLM Empowered Zero-Shot Video Anomaly Detection

About

The collection and detection of video anomaly data has long been a challenging problem due to its rare occurrence and spatio-temporal scarcity. Existing video anomaly detection (VAD) methods under perform in open-world scenarios. Key contributing factors include limited dataset diversity, and inadequate understanding of context-dependent anomalous semantics. To address these issues, i) we propose LAVIDA, an end-to-end zero-shot video anomaly detection framework. ii) LAVIDA employs an Anomaly Exposure Sampler that transforms segmented objects into pseudo-anomalies to enhance model adaptability to unseen anomaly categories. It further integrates a Multimodal Large Language Model (MLLM) to bolster semantic comprehension capabilities. Additionally, iii) we design a token compression approach based on reverse attention to handle the spatio-temporal scarcity of anomalous patterns and decrease computational cost. The training process is conducted solely on pseudo anomalies without any VAD data. Evaluations across four benchmark VAD datasets demonstrate that LAVIDA achieves SOTA performance in both frame-level and pixel-level anomaly detection under the zero-shot setting. Our code is available in https://github.com/VitaminCreed/LAVIDA.

Zunkai Dai, Ke Li, Jiajia Liu, Jie Yang, Yuanyuan Qiao• 2026

Related benchmarks

TaskDatasetResultRank
Abnormal Event DetectionUCSD Ped2
AUC87.68
132
Video Anomaly DetectionUCF-Crime (frame-level)
AUC82.18
32
Video Anomaly DetectionUBnormal
AUC76.45
25
Frame-level Video Anomaly DetectionShanghaiTech
AUC0.8528
11
Frame-level Video Anomaly DetectionXD-Violence
AP90.62
11
Showing 5 of 5 rows

Other info

Follow for update