Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Video Playback Rate Perception for Self-supervisedSpatio-Temporal Representation Learning

About

In self-supervised spatio-temporal representation learning, the temporal resolution and long-short term characteristics are not yet fully explored, which limits representation capabilities of learned models. In this paper, we propose a novel self-supervised method, referred to as video Playback Rate Perception (PRP), to learn spatio-temporal representation in a simple-yet-effective way. PRP roots in a dilated sampling strategy, which produces self-supervision signals about video playback rates for representation model learning. PRP is implemented with a feature encoder, a classification module, and a reconstructing decoder, to achieve spatio-temporal semantic retention in a collaborative discrimination-generation manner. The discriminative perception model follows a feature encoder to prefer perceiving low temporal resolution and long-term representation by classifying fast-forward rates. The generative perception model acts as a feature decoder to focus on comprehending high temporal resolution and short-term representation by introducing a motion-attention mechanism. PRP is applied on typical video target tasks including action recognition and video retrieval. Experiments show that PRP outperforms state-of-the-art self-supervised models with significant margins. Code is available at github.com/yuanyao366/PRP

Yuan Yao, Chang Liu, Dezhao Luo, Yu Zhou, Qixiang Ye• 2020

Related benchmarks

TaskDatasetResultRank
Action RecognitionUCF101
Accuracy72.1
365
Action RecognitionUCF101 (mean of 3 splits)
Accuracy72.1
357
Action RecognitionUCF101 (test)
Accuracy72.1
307
Action RecognitionHMDB51 (test)
Accuracy0.35
249
Action RecognitionHMDB51
Top-1 Acc35
225
Action RecognitionHMDB-51 (average of three splits)
Top-1 Acc35
204
Action RecognitionHMDB51
3-Fold Accuracy35
191
Video Action RecognitionUCF101
Top-1 Acc72.1
153
Action RecognitionUCF-101
Top-1 Acc72.1
147
Action ClassificationHMDB51 (over all three splits)
Accuracy35
121
Showing 10 of 19 rows

Other info

Follow for update