Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

UntrimmedNets for Weakly Supervised Action Recognition and Detection

About

Current action recognition methods heavily rely on trimmed videos for model training. However, it is expensive and time-consuming to acquire a large-scale trimmed video dataset. This paper presents a new weakly supervised architecture, called UntrimmedNet, which is able to directly learn action recognition models from untrimmed videos without the requirement of temporal annotations of action instances. Our UntrimmedNet couples two important components, the classification module and the selection module, to learn the action models and reason about the temporal duration of action instances, respectively. These two components are implemented with feed-forward networks, and UntrimmedNet is therefore an end-to-end trainable architecture. We exploit the learned models for action recognition (WSR) and detection (WSD) on the untrimmed video datasets of THUMOS14 and ActivityNet. Although our UntrimmedNet only employs weak supervision, our method achieves performance superior or comparable to that of those strongly supervised approaches on these two datasets.

Limin Wang, Yuanjun Xiong, Dahua Lin, Luc Van Gool• 2017

Related benchmarks

TaskDatasetResultRank
Temporal Action DetectionTHUMOS-14 (test)
mAP@tIoU=0.513.7
330
Temporal Action LocalizationTHUMOS14 (test)
AP @ IoU=0.516.2
319
Temporal Action LocalizationTHUMOS-14 (test)
mAP@0.328.2
308
Temporal Action LocalizationActivityNet 1.2 (val)
mAP@IoU 0.57.4
110
Temporal Action LocalizationTHUMOS 2014
mAP@0.3028.2
93
Action DetectionTHUMOS 2014 (test)
mAP (alpha=0.5)13.7
79
Temporal Action LocalizationTHUMOS 14
mAP@0.328.2
44
Temporal Action LocalizationTHUMOS 2014 (test)
mAP (theta=0.5)13.7
35
Temporal Action LocalizationActivityNet 1.2
mAP@0.57.4
32
Action RecognitionTHUMOS-14 (test)
mAP82.2
26
Showing 10 of 17 rows

Other info

Code

Follow for update