Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Learning to track for spatio-temporal action localization

About

We propose an effective approach for spatio-temporal action localization in realistic videos. The approach first detects proposals at the frame-level and scores them with a combination of static and motion CNN features. It then tracks high-scoring proposals throughout the video using a tracking-by-detection approach. Our tracker relies simultaneously on instance-level and class-level detectors. The tracks are scored using a spatio-temporal motion histogram, a descriptor at the track level, in combination with the CNN features. Finally, we perform temporal localization of the action using a sliding-window approach at the track level. We present experimental results for spatio-temporal localization on the UCF-Sports, J-HMDB and UCF-101 action localization datasets, where our approach outperforms the state of the art with a margin of 15%, 7% and 12% respectively in mAP.

Philippe Weinzaepfel, Zaid Harchaoui, Cordelia Schmid• 2015

Related benchmarks

TaskDatasetResultRank
Action DetectionJHMDB (test)
F@0.545.8
11
Spatio-temporal action detectionUCF101D
video-mAP (IoU=0.2)46.8
11
Spatio-temporal action detectionJ-HMDB (3 splits)
video-mAP (IoU=0.2)63.1
10
Action DetectionUCF-101-24 (split 1)--
10
Action DetectionJHMDB (average over three splits)
Frame mAP0.458
6
Spatio-temporal action detectionUCF101 (split1)
mAP (IoU=0.05)62.8
5
Spatial action detectionJ-HMDB
Video mAP (IoU=0.5)60.7
5
Action DetectionUCF Sports (test)
Diving Score60.71
4
Action DetectionUCF-101 24 actions
f-mAP35.84
3
Showing 9 of 9 rows

Other info

Follow for update