Learning Event Completeness for Weakly Supervised Video Anomaly Detection
About
Weakly supervised video anomaly detection (WS-VAD) is tasked with pinpointing temporal intervals containing anomalous events within untrimmed videos, utilizing only video-level annotations. However, a significant challenge arises due to the absence of dense frame-level annotations, often leading to incomplete localization in existing WS-VAD methods. To address this issue, we present a novel LEC-VAD, Learning Event Completeness for Weakly Supervised Video Anomaly Detection, which features a dual structure designed to encode both category-aware and category-agnostic semantics between vision and language. Within LEC-VAD, we devise semantic regularities that leverage an anomaly-aware Gaussian mixture to learn precise event boundaries, thereby yielding more complete event instances. Besides, we develop a novel memory bank-based prototype learning mechanism to enrich concise text descriptions associated with anomaly-event categories. This innovation bolsters the text's expressiveness, which is crucial for advancing WS-VAD. Our LEC-VAD demonstrates remarkable advancements over the current state-of-the-art methods on two benchmark datasets XD-Violence and UCF-Crime.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Video Anomaly Detection | UCF-Crime | AUC89.97 | 218 | |
| Video Anomaly Detection | XD-Violence (test) | AP88.47 | 146 | |
| Fine-grained Video Anomaly Detection | UCF-Crime | mAP@IoU 0.119.65 | 7 |