Temporal Context Network for Activity Localization in Videos

About

We present a Temporal Context Network (TCN) for precise temporal localization of human activities. Similar to the Faster-RCNN architecture, proposals are placed at equal intervals in a video which span multiple temporal scales. We propose a novel representation for ranking these proposals. Since pooling features only inside a segment is not sufficient to predict activity boundaries, we construct a representation which explicitly captures context around a proposal for ranking it. For each temporal segment inside a proposal, features are uniformly sampled at a pair of scales and are input to a temporal convolutional neural network for classification. After ranking proposals, non-maximum suppression is applied and classification is performed to obtain final detections. TCN outperforms state-of-the-art methods on the ActivityNet dataset and the THUMOS14 dataset.

Xiyang Dai, Bharat Singh, Guyue Zhang, Larry S. Davis, Yan Qiu Chen• 2017

Related benchmarks

Task	Dataset	Result
Temporal Action Detection	THUMOS-14 (test)	mAP@tIoU=0.525.6	339
Temporal Action Localization	THUMOS14 (test)	AP @ IoU=0.525.6	319
Temporal Action Localization	ActivityNet 1.3 (val)	AP@0.537.49	257
Temporal Action Detection	ActivityNet v1.3 (val)	mAP@0.536.2	185
Temporal Action Proposal	ActivityNet v1.3 (val)	AUC59.58	114
Temporal Action Detection	ActivityNet 1.3 (test)	Average mAP23.58	80
Action Detection	THUMOS 2014 (test)	mAP (alpha=0.5)25.6	79
Temporal Action Detection	THUMOS 14	mAP@0.333.3	71
Temporal Action Proposal Generation	ActivityNet 1.3 (test)	AUC61.56	62
Action Localization	Thumos14	mAP@0.525.6	34

Showing 10 of 13 rows

Other info

Follow for update

@wizwand_team Discord