VADMamba: Exploring State Space Models for Fast Video Anomaly Detection

About

Video anomaly detection (VAD) methods are mostly CNN-based or Transformer-based, achieving impressive results, but the focus on detection accuracy often comes at the expense of inference speed. The emergence of state space models in computer vision, exemplified by the Mamba model, demonstrates improved computational efficiency through selective scans and showcases the great potential for long-range modeling. Our study pioneers the application of Mamba to VAD, dubbed VADMamba, which is based on multi-task learning for frame prediction and optical flow reconstruction. Specifically, we propose the VQ-Mamba Unet (VQ-MaU) framework, which incorporates a Vector Quantization (VQ) layer and Mamba-based Non-negative Visual State Space (NVSS) block. Furthermore, two individual VQ-MaU networks separately predict frames and reconstruct corresponding optical flows, further boosting accuracy through a clip-level fusion evaluation strategy. Experimental results validate the efficacy of the proposed VADMamba across three benchmark datasets, demonstrating superior performance in inference speed compared to previous work. Code is available at https://github.com/jLooo/VADMamba.

Jiahao Lyu, Minghua Zhao, Jing Hu, Xuewen Huang, Yifei Chen, Shuangli Du• 2025

Related benchmarks

Task	Dataset	Result
Video Anomaly Detection	ShanghaiTech (test)	AUC0.77	211
Abnormal Event Detection	UCSD Ped2	AUC98.5	163
Video Anomaly Detection	CUHK Avenue (test)	Frame-level AUC0.915	112
Video Anomaly Detection	UCSD Ped2 (test)	Frame-level AUC98.5	107
Video Anomaly Detection	ShanghaiTech (SHT) (test)	Frame-level AUC77	103
Video Anomaly Detection	Avenue	Frame-AUC91.5	49
Video Anomaly Detection	ShanghaiTech (SHT)	AUC77	20
Video Anomaly Detection	UIT-ADrone	Micro-AUC68.6	13
Video Anomaly Detection	MUVAD	Micro-AUC66.2	13
Video Anomaly Detection	Drone-Anomaly	Micro-AUC66.7	13

Showing 10 of 10 rows

Other info

Code

Follow for update

@wizwand_team Discord