TransRank: Self-supervised Video Representation Learning via Ranking-based Transformation Recognition

About

Recognizing transformation types applied to a video clip (RecogTrans) is a long-established paradigm for self-supervised video representation learning, which achieves much inferior performance compared to instance discrimination approaches (InstDisc) in recent works. However, based on a thorough comparison of representative RecogTrans and InstDisc methods, we observe the great potential of RecogTrans on both semantic-related and temporal-related downstream tasks. Based on hard-label classification, existing RecogTrans approaches suffer from noisy supervision signals in pre-training. To mitigate this problem, we developed TransRank, a unified framework for recognizing Transformations in a Ranking formulation. TransRank provides accurate supervision signals by recognizing transformations relatively, consistently outperforming the classification-based formulation. Meanwhile, the unified framework can be instantiated with an arbitrary set of temporal or spatial transformations, demonstrating good generality. With a ranking-based formulation and several empirical practices, we achieve competitive performance on video retrieval and action recognition. Under the same setting, TransRank surpasses the previous state-of-the-art method by 6.4% on UCF101 and 8.3% on HMDB51 for action recognition (Top1 Acc); improves video retrieval on UCF101 by 20.4% (R@1). The promising results validate that RecogTrans is still a worth exploring paradigm for video self-supervised learning. Codes will be released at https://github.com/kennymckormick/TransRank.

Haodong Duan, Nanxuan Zhao, Kai Chen, Dahua Lin• 2022

Related benchmarks

Task	Dataset	Result
Action Recognition	UCF101 (3 splits)	Accuracy90.7	155
Video Action Recognition	HMDB-51 (3 splits)	Accuracy64.2	116
Video Retrieval	UCF101 (1)	Top-1 Acc54	97
Video Retrieval	HMDB51 (first split)	Top-1 Accuracy25.5	49
Action Classification	HMDB51 1.0 (fine-tuned)	Accuracy60.1	16
Action Classification	UCF101 1.0 (fine-tuned)	Accuracy87.8	16

Showing 6 of 6 rows

Other info

Follow for update

@wizwand_team Discord