Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

DSANet: Dynamic Segment Aggregation Network for Video-Level Representation Learning

About

Long-range and short-range temporal modeling are two complementary and crucial aspects of video recognition. Most of the state-of-the-arts focus on short-range spatio-temporal modeling and then average multiple snippet-level predictions to yield the final video-level prediction. Thus, their video-level prediction does not consider spatio-temporal features of how video evolves along the temporal dimension. In this paper, we introduce a novel Dynamic Segment Aggregation (DSA) module to capture relationship among snippets. To be more specific, we attempt to generate a dynamic kernel for a convolutional operation to aggregate long-range temporal information among adjacent snippets adaptively. The DSA module is an efficient plug-and-play module and can be combined with the off-the-shelf clip-based models (i.e., TSM, I3D) to perform powerful long-range modeling with minimal overhead. The final video architecture, coined as DSANet. We conduct extensive experiments on several video recognition benchmarks (i.e., Mini-Kinetics-200, Kinetics-400, Something-Something V1 and ActivityNet) to show its superiority. Our proposed DSA module is shown to benefit various video recognition models significantly. For example, equipped with DSA modules, the top-1 accuracy of I3D ResNet-50 is improved from 74.9% to 78.2% on Kinetics-400. Codes are available at https://github.com/whwu95/DSANet.

Wenhao Wu, Yuxiang Zhao, Yanwu Xu, Xiao Tan, Dongliang He, Zhikang Zou, Jin Ye, Yingying Li, Mingde Yao, Zichao Dong, Yifeng Shi• 2021

Related benchmarks

TaskDatasetResultRank
Action RecognitionSomething-something v1 (val)
Top-1 Acc51.8
257
Video RecognitionKinetics-400 1.0 (val)
Top-1 Acc79
43
Action RecognitionActivityNet v1.3
mAP90.5
31
Action RecognitionActivityNet v1.3 (test)
mAP90.5
19
Video ClassificationMini-Kinetics-200 (val)
Top-1 Acc81.8
11
Untrimmed Video RecognitionActivityNet v1.3 (val)
mAP90.5
5
Action RecognitionActivityNet
mAP90.5
5
Showing 7 of 7 rows

Other info

Code

Follow for update