Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

SRVAU-R1: Enhancing Video Anomaly Understanding via Reflection-Aware Learning

About

Multi-modal large language models (MLLMs) have demonstrated significant progress in reasoning capabilities and shown promising effectiveness in video anomaly understanding (VAU) tasks. However, existing MLLM-based approaches remain largely focused on surface-level descriptions of anomalies, lacking deep reasoning over abnormal behaviors like explicit self-reflection and self-correction. To address that, we propose Self-Reflection-Enhanced Reasoning for Video Anomaly Understanding (SRVAU-R1), a reflection-aware learning framework that incorporates reflection in MLLM reasoning. Specifically, SRVAU-R1 introduces the first reflection-oriented Chain-of-Thought dataset tailored for VAU, providing structured supervision with initial reasoning, self-reflection, and revised reasoning. Based on that, it includes a novel reflection-aware learning paradigm with supervised fine-tuning and reinforcement fine-tuning to enhance multi-modal reasoning for VAU. Extensive experiments on multiple video anomaly benchmarks demonstrate that SRVAU-R1 consistently outperforms existing methods, achieving significant improvements in both temporal anomaly localization accuracy and reasoning quality.

Zihao Zhao, Shengting Cao, Muchao Ye• 2026

Related benchmarks

TaskDatasetResultRank
Video Question AnsweringECVA
Accuracy92.22
14
Video Anomaly Question AnsweringMSAD
Acc (w/o think)89.58
8
Video Anomaly Question AnsweringUCF-Crime
Accuracy (w/o think)92.82
8
Video Anomaly Understanding EvaluationMSAD
CLS Score7.65
8
Video Anomaly Understanding EvaluationUCF-Crime
CLS7.22
8
Video Anomaly Reasoning EvaluationECVA
CLS Score2.86
7
Temporal Anomaly GroundingMSAD OOD (test)
mIoU20.4
4
Temporal Anomaly GroundingECVA (test)
mIoU44.42
4
Showing 8 of 8 rows

Other info

Follow for update