Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Contrastive Learning from Extremely Augmented Skeleton Sequences for Self-supervised Action Recognition

About

In recent years, self-supervised representation learning for skeleton-based action recognition has been developed with the advance of contrastive learning methods. The existing contrastive learning methods use normal augmentations to construct similar positive samples, which limits the ability to explore novel movement patterns. In this paper, to make better use of the movement patterns introduced by extreme augmentations, a Contrastive Learning framework utilizing Abundant Information Mining for self-supervised action Representation (AimCLR) is proposed. First, the extreme augmentations and the Energy-based Attention-guided Drop Module (EADM) are proposed to obtain diverse positive samples, which bring novel movement patterns to improve the universality of the learned representations. Second, since directly using extreme augmentations may not be able to boost the performance due to the drastic changes in original identity, the Dual Distributional Divergence Minimization Loss (D$^3$M Loss) is proposed to minimize the distribution divergence in a more gentle way. Third, the Nearest Neighbors Mining (NNM) is proposed to further expand positive samples to make the abundant information mining process more reasonable. Exhaustive experiments on NTU RGB+D 60, PKU-MMD, NTU RGB+D 120 datasets have verified that our AimCLR can significantly perform favorably against state-of-the-art methods under a variety of evaluation protocols with observed higher quality action representations. Our code is available at https://github.com/Levigty/AimCLR.

Tianyu Guo, Hong Liu, Zhan Chen, Mengyuan Liu, Tao Wang, Runwei Ding• 2021

Related benchmarks

TaskDatasetResultRank
Action RecognitionNTU RGB+D 120 (X-set)
Accuracy80.9
661
Action RecognitionNTU RGB+D (Cross-View)
Accuracy92.8
609
Action RecognitionNTU RGB+D 60 (Cross-View)
Accuracy92.8
575
Action RecognitionNTU RGB+D (Cross-subject)
Accuracy86.9
474
Action RecognitionNTU RGB+D 60 (X-sub)
Accuracy86.9
467
Action RecognitionNTU RGB+D X-sub 120
Accuracy80.1
377
Action RecognitionNTU RGB-D Cross-Subject 60
Accuracy78.2
305
Skeleton-based Action RecognitionNTU 60 (X-sub)
Accuracy78.9
220
Skeleton-based Action RecognitionNTU RGB+D 120 (X-set)
Top-1 Accuracy68.8
184
Action RecognitionNTU RGB+D 120 Cross-Subject
Accuracy68.2
183
Showing 10 of 41 rows

Other info

Code

Follow for update