Background Suppression Network for Weakly-supervised Temporal Action Localization

About

Weakly-supervised temporal action localization is a very challenging problem because frame-wise labels are not given in the training stage while the only hint is video-level labels: whether each video contains action frames of interest. Previous methods aggregate frame-level class scores to produce video-level prediction and learn from video-level action labels. This formulation does not fully model the problem in that background frames are forced to be misclassified as action classes to predict video-level labels accurately. In this paper, we design Background Suppression Network (BaS-Net) which introduces an auxiliary class for background and has a two-branch weight-sharing architecture with an asymmetrical training strategy. This enables BaS-Net to suppress activations from background frames to improve localization performance. Extensive experiments demonstrate the effectiveness of BaS-Net and its superiority over the state-of-the-art methods on the most popular benchmarks - THUMOS'14 and ActivityNet. Our code and the trained model are available at https://github.com/Pilhyeon/BaSNet-pytorch.

Pilhyeon Lee, Youngjung Uh, Hyeran Byun• 2019

Related benchmarks

Task	Dataset	Result
Temporal Action Localization	THUMOS14 (test)	AP @ IoU=0.527	319
Temporal Action Localization	THUMOS-14 (test)	mAP@0.344.6	308
Temporal Action Localization	ActivityNet 1.3 (val)	AP@0.534.5	257
Temporal Action Localization	ActivityNet 1.2 (val)	mAP@IoU 0.538.5	110
Temporal Action Localization	THUMOS 2014	mAP@0.3044.6	93
Temporal Action Localization	ActivityNet 1.3	Average mAP22.2	60
Temporal Action Localization	ActivityNet v1.3 (test)	mAP @ IoU=0.534.5	47
Temporal Action Localization	THUMOS 14	mAP@0.344.6	44
Temporal Action Localization	ActivityNet 1.2 (test)	mAP@0.538.5	36
Temporal Action Localization	ActivityNet 1.2	mAP@0.538.5	32

Showing 10 of 12 rows

Other info

Code

Follow for update

@wizwand_team Discord