Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

W-TALC: Weakly-supervised Temporal Activity Localization and Classification

About

Most activity localization methods in the literature suffer from the burden of frame-wise annotation requirement. Learning from weak labels may be a potential solution towards reducing such manual labeling effort. Recent years have witnessed a substantial influx of tagged videos on the Internet, which can serve as a rich source of weakly-supervised training data. Specifically, the correlations between videos with similar tags can be utilized to temporally localize the activities. Towards this goal, we present W-TALC, a Weakly-supervised Temporal Activity Localization and Classification framework using only video-level labels. The proposed network can be divided into two sub-networks, namely the Two-Stream based feature extractor network and a weakly-supervised module, which we learn by optimizing two complimentary loss functions. Qualitative and quantitative results on two challenging datasets - Thumos14 and ActivityNet1.2, demonstrate that the proposed method is able to detect activities at a fine granularity and achieve better performance than current state-of-the-art methods.

Sujoy Paul, Sourya Roy, Amit K Roy-Chowdhury• 2018

Related benchmarks

TaskDatasetResultRank
Temporal Action DetectionTHUMOS-14 (test)
mAP@tIoU=0.522.8
330
Temporal Action LocalizationTHUMOS14 (test)
AP @ IoU=0.522.8
319
Temporal Action LocalizationTHUMOS-14 (test)
mAP@0.340.1
308
Temporal Action LocalizationActivityNet 1.2 (val)
mAP@IoU 0.537
110
Temporal Action LocalizationTHUMOS 2014
mAP@0.3040.1
93
Temporal Action LocalizationTHUMOS 14
mAP@0.340.1
44
Temporal Action LocalizationActivityNet 1.2
mAP@0.537
32
Temporal Action LocalizationTHUMOS14 v1.0 (test)
mAP @ IoU 0.340.1
29
Temporal Action DetectionFineAction
Avg mAP3.45
27
Action ClassificationActivityNet Untrimmed 1.2 (test)
mAP93.2
12
Showing 10 of 17 rows

Other info

Code

Follow for update