Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

VAU-R1: Advancing Video Anomaly Understanding via Reinforcement Fine-Tuning

About

Video Anomaly Understanding (VAU) is essential for applications such as smart cities, security surveillance, and disaster alert systems, yet remains challenging due to its demand for fine-grained spatio-temporal perception and robust reasoning under ambiguity. Despite advances in anomaly detection, existing methods often lack interpretability and struggle to capture the causal and contextual aspects of abnormal events. This limitation is further compounded by the absence of comprehensive benchmarks for evaluating reasoning ability in anomaly scenarios. To address both challenges, we introduce VAU-R1, a data-efficient framework built upon Multimodal Large Language Models (MLLMs), which enhances anomaly reasoning through Reinforcement Fine-Tuning (RFT). Besides, we propose VAU-Bench, the first Chain-of-Thought benchmark tailored for video anomaly reasoning, featuring multiple-choice QA, detailed rationales, temporal annotations, and descriptive captions. Empirical results show that VAU-R1 significantly improves question answering accuracy, temporal grounding, and reasoning coherence across diverse contexts. Together, our method and benchmark establish a strong foundation for interpretable and reasoning-aware video anomaly understanding. Our code is available at https://github.com/GVCLab/VAU-R1.

Liyun Zhu, Qixiang Chen, Xi Shen, Xiaodong Cun• 2025

Related benchmarks

TaskDatasetResultRank
Video Question AnsweringECVA
Accuracy89.53
14
Video Anomaly Question AnsweringMSAD
Acc (w/o think)88.33
8
Video Anomaly Question AnsweringUCF-Crime
Accuracy (w/o think)92.03
8
Video Anomaly Understanding EvaluationUCF-Crime
CLS4.42
8
Video Anomaly Understanding EvaluationMSAD
CLS Score5.97
8
Video Anomaly Reasoning EvaluationECVA
CLS Score1.45
7
Showing 6 of 6 rows

Other info

Follow for update