Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

TransRank: Self-supervised Video Representation Learning via Ranking-based Transformation Recognition

About

Recognizing transformation types applied to a video clip (RecogTrans) is a long-established paradigm for self-supervised video representation learning, which achieves much inferior performance compared to instance discrimination approaches (InstDisc) in recent works. However, based on a thorough comparison of representative RecogTrans and InstDisc methods, we observe the great potential of RecogTrans on both semantic-related and temporal-related downstream tasks. Based on hard-label classification, existing RecogTrans approaches suffer from noisy supervision signals in pre-training. To mitigate this problem, we developed TransRank, a unified framework for recognizing Transformations in a Ranking formulation. TransRank provides accurate supervision signals by recognizing transformations relatively, consistently outperforming the classification-based formulation. Meanwhile, the unified framework can be instantiated with an arbitrary set of temporal or spatial transformations, demonstrating good generality. With a ranking-based formulation and several empirical practices, we achieve competitive performance on video retrieval and action recognition. Under the same setting, TransRank surpasses the previous state-of-the-art method by 6.4% on UCF101 and 8.3% on HMDB51 for action recognition (Top1 Acc); improves video retrieval on UCF101 by 20.4% (R@1). The promising results validate that RecogTrans is still a worth exploring paradigm for video self-supervised learning. Codes will be released at https://github.com/kennymckormick/TransRank.

Haodong Duan, Nanxuan Zhao, Kai Chen, Dahua Lin• 2022

Related benchmarks

TaskDatasetResultRank
Action RecognitionUCF101 (3 splits)
Accuracy90.7
155
Video Action RecognitionHMDB-51 (3 splits)
Accuracy64.2
116
Video RetrievalUCF101 (1)
Top-1 Acc54
92
Video RetrievalHMDB51 (first split)
Top-1 Accuracy25.5
49
Action ClassificationHMDB51 1.0 (fine-tuned)
Accuracy60.1
16
Action ClassificationUCF101 1.0 (fine-tuned)
Accuracy87.8
16
Showing 6 of 6 rows

Other info

Follow for update