Temporal Context Network for Activity Localization in Videos
About
We present a Temporal Context Network (TCN) for precise temporal localization of human activities. Similar to the Faster-RCNN architecture, proposals are placed at equal intervals in a video which span multiple temporal scales. We propose a novel representation for ranking these proposals. Since pooling features only inside a segment is not sufficient to predict activity boundaries, we construct a representation which explicitly captures context around a proposal for ranking it. For each temporal segment inside a proposal, features are uniformly sampled at a pair of scales and are input to a temporal convolutional neural network for classification. After ranking proposals, non-maximum suppression is applied and classification is performed to obtain final detections. TCN outperforms state-of-the-art methods on the ActivityNet dataset and the THUMOS14 dataset.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Temporal Action Detection | THUMOS-14 (test) | mAP@tIoU=0.525.6 | 330 | |
| Temporal Action Localization | THUMOS14 (test) | AP @ IoU=0.525.6 | 319 | |
| Temporal Action Localization | ActivityNet 1.3 (val) | AP@0.537.49 | 257 | |
| Temporal Action Detection | ActivityNet v1.3 (val) | mAP@0.536.2 | 185 | |
| Temporal Action Proposal | ActivityNet v1.3 (val) | AUC59.58 | 114 | |
| Temporal Action Detection | ActivityNet 1.3 (test) | Average mAP23.58 | 80 | |
| Action Detection | THUMOS 2014 (test) | mAP (alpha=0.5)25.6 | 79 | |
| Temporal Action Detection | THUMOS 14 | mAP@0.333.3 | 71 | |
| Temporal Action Proposal Generation | ActivityNet 1.3 (test) | AUC61.56 | 62 | |
| Action Localization | Thumos14 | mAP@0.525.6 | 34 |