MULDE: Multiscale Log-Density Estimation via Denoising Score Matching for Video Anomaly Detection
About
We propose a novel approach to video anomaly detection: we treat feature vectors extracted from videos as realizations of a random variable with a fixed distribution and model this distribution with a neural network. This lets us estimate the likelihood of test videos and detect video anomalies by thresholding the likelihood estimates. We train our video anomaly detector using a modification of denoising score matching, a method that injects training data with noise to facilitate modeling its distribution. To eliminate hyperparameter selection, we model the distribution of noisy video features across a range of noise levels and introduce a regularizer that tends to align the models for different levels of noise. At test time, we combine anomaly indications at multiple noise scales with a Gaussian mixture model. Running our video anomaly detector induces minimal delays as inference requires merely extracting the features and forward-propagating them through a shallow neural network and a Gaussian mixture model. Our experiments on five popular video anomaly detection benchmarks demonstrate state-of-the-art performance, both in the object-centric and in the frame-centric setup.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Video Anomaly Detection | ShanghaiTech (test) | -- | 194 | |
| Abnormal Event Detection | UCSD Ped2 | -- | 132 | |
| Video Anomaly Detection | UCF-Crime | -- | 129 | |
| Video Anomaly Detection | UCF-Crime (test) | -- | 122 | |
| Video Anomaly Detection | ShanghaiTech | Micro AUC0.867 | 51 | |
| Video Anomaly Detection | ShanghaiTech standard (test) | Frame-Level AUC81.3 | 50 | |
| Video Anomaly Detection | UBnormal (test) | -- | 37 | |
| Video Anomaly Detection | UCF-Crime (frame-level) | AUC78.5 | 32 | |
| Video Anomaly Detection | UBnormal | AUC72.8 | 25 | |
| Video Anomaly Detection | UCF-Crime standard (test) | Frame-Level AUC78.5 | 17 |