Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Temporal Segment Networks for Action Recognition in Videos

About

Deep convolutional networks have achieved great success for image recognition. However, for action recognition in videos, their advantage over traditional methods is not so evident. We present a general and flexible video-level framework for learning action models in videos. This method, called temporal segment network (TSN), aims to model long-range temporal structures with a new segment-based sampling and aggregation module. This unique design enables our TSN to efficiently learn action models by using the whole action videos. The learned models could be easily adapted for action recognition in both trimmed and untrimmed videos with simple average pooling and multi-scale temporal window integration, respectively. We also study a series of good practices for the instantiation of TSN framework given limited training samples. Our approach obtains the state-the-of-art performance on four challenging action recognition benchmarks: HMDB51 (71.0%), UCF101 (94.9%), THUMOS14 (80.1%), and ActivityNet v1.2 (89.6%). Using the proposed RGB difference for motion models, our method can still achieve competitive accuracy on UCF101 (91.0%) while running at 340 FPS. Furthermore, based on the temporal segment networks, we won the video classification track at the ActivityNet challenge 2016 among 24 teams, which demonstrates the effectiveness of TSN and the proposed good practices.

Limin Wang, Yuanjun Xiong, Zhe Wang, Yu Qiao, Dahua Lin, Xiaoou Tang, Luc Van Gool• 2017

Related benchmarks

TaskDatasetResultRank
Action RecognitionSomething-Something v2 (val)
Top-1 Accuracy24.9
535
Action RecognitionKinetics-400
Top-1 Acc73.9
413
Action RecognitionUCF101
Accuracy94.9
365
Action RecognitionUCF101 (mean of 3 splits)
Accuracy93.2
357
Action RecognitionUCF101 (test)
Accuracy94.9
307
Action RecognitionHMDB51 (test)
Accuracy0.71
249
Action RecognitionKinetics 400 (test)
Top-1 Accuracy73.9
245
Action RecognitionHMDB51
Top-1 Acc71
225
Action RecognitionHMDB-51 (average of three splits)
Top-1 Acc69.4
204
Action RecognitionSomething-Something v2 (test val)
Top-1 Accuracy33.4
187
Showing 10 of 27 rows

Other info

Code

Follow for update