Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

On Occlusions in Video Action Detection: Benchmark Datasets And Training Recipes

About

This paper explores the impact of occlusions in video action detection. We facilitate this study by introducing five new benchmark datasets namely O-UCF and O-JHMDB consisting of synthetically controlled static/dynamic occlusions, OVIS-UCF and OVIS-JHMDB consisting of occlusions with realistic motions and Real-OUCF for occlusions in realistic-world scenarios. We formally confirm an intuitive expectation: existing models suffer a lot as occlusion severity is increased and exhibit different behaviours when occluders are static vs when they are moving. We discover several intriguing phenomenon emerging in neural nets: 1) transformers can naturally outperform CNN models which might have even used occlusion as a form of data augmentation during training 2) incorporating symbolic-components like capsules to such backbones allows them to bind to occluders never even seen during training and 3) Islands of agreement can emerge in realistic images/videos without instance-level supervision, distillation or contrastive-based objectives2(eg. video-textual training). Such emergent properties allow us to derive simple yet effective training recipes which lead to robust occlusion models inductively satisfying the first two stages of the binding mechanism (grouping/segregation). Models leveraging these recipes outperform existing video action-detectors under occlusion by 32.3% on O-UCF, 32.7% on O-JHMDB & 2.6% on Real-OUCF in terms of the vMAP metric. The code for this work has been released at https://github.com/rajatmodi62/OccludedActionBenchmark.

Rajat Modi, Vibhav Vineet, Yogesh Singh Rawat• 2024

Related benchmarks

TaskDatasetResultRank
Action DetectionUCF101 24
video-mAP@0.575.5
13
Spatio-temporal action detectionUCF-24 (test)
F-mAP (IoU=0.5)81.2
8
Spatio-temporal action detectionJHMDB-21 (test)
f-mAP (IoU=0.5)93
7
Spatio-temporal action detectionReal-OUCF (test)
mAP14.3
4
Action DetectionO-UCF FG2 (Averaged BG1/2/3)
mAP@0.5 IoU (Occ)42.8
3
Action DetectionO-UCF FG3 (Averaged BG1/2/3)
v-mAP@0.5 IoU (Occ)36.9
3
Action DetectionO-UCF FG1 (Averaged BG1/2/3)
v-mAP@0.5 IoU (Occ)51.5
3
Action DetectionO-JHMDB FG1 (test)
v-mAP49.2
3
Action DetectionO-JHMDB FG2 (test)
v-mAP35.4
3
Action DetectionJHMDB-21 Clean (test)
v-mAP65.7
3
Showing 10 of 11 rows

Other info

Code

Follow for update