Regularity Learning via Explicit Distribution Modeling for Skeletal Video Anomaly Detection
About
Anomaly detection in surveillance videos is challenging and important for ensuring public security. Different from pixel-based anomaly detection methods, pose-based methods utilize highly-structured skeleton data, which decreases the computational burden and also avoids the negative impact of background noise. However, unlike pixel-based methods, which could directly exploit explicit motion features such as optical flow, pose-based methods suffer from the lack of alternative dynamic representation. In this paper, a novel Motion Embedder (ME) is proposed to provide a pose motion representation from the probability perspective. Furthermore, a novel task-specific Spatial-Temporal Transformer (STT) is deployed for self-supervised pose sequence reconstruction. These two modules are then integrated into a unified framework for pose regularity learning, which is referred to as Motion Prior Regularity Learner (MoPRL). MoPRL achieves the state-of-the-art performance by an average improvement of 4.7% AUC on several challenging datasets. Extensive experiments validate the versatility of each proposed module.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Video Anomaly Detection | ShanghaiTech (test) | AUC0.8126 | 194 | |
| Video Anomaly Detection | Corridor (test) | AUC70.66 | 11 | |
| Video Anomaly Detection | SHT (test) | EER0.24 | 10 | |
| Video Anomaly Detection | HR-SHT (test) | EER0.23 | 8 | |
| Video Anomaly Detection | ShanghaiTech-HR (test) | AUC0.8238 | 7 | |
| Anomaly Detection | CHAD (test) | EER0.38 | 6 | |
| Anomaly Detection | NWPUC (test) | EER40 | 6 |