Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Hierarchical Consistent Contrastive Learning for Skeleton-Based Action Recognition with Growing Augmentations

About

Contrastive learning has been proven beneficial for self-supervised skeleton-based action recognition. Most contrastive learning methods utilize carefully designed augmentations to generate different movement patterns of skeletons for the same semantics. However, it is still a pending issue to apply strong augmentations, which distort the images/skeletons' structures and cause semantic loss, due to their resulting unstable training. In this paper, we investigate the potential of adopting strong augmentations and propose a general hierarchical consistent contrastive learning framework (HiCLR) for skeleton-based action recognition. Specifically, we first design a gradual growing augmentation policy to generate multiple ordered positive pairs, which guide to achieve the consistency of the learned representation from different views. Then, an asymmetric loss is proposed to enforce the hierarchical consistency via a directional clustering operation in the feature space, pulling the representations from strongly augmented views closer to those from weakly augmented views for better generalizability. Meanwhile, we propose and evaluate three kinds of strong augmentations for 3D skeletons to demonstrate the effectiveness of our method. Extensive experiments show that HiCLR outperforms the state-of-the-art methods notably on three large-scale datasets, i.e., NTU60, NTU120, and PKUMMD.

Jiahang Zhang, Lilang Lin, Jiaying Liu• 2022

Related benchmarks

TaskDatasetResultRank
Action RecognitionNTU RGB+D 120 (X-set)
Accuracy87.5
717
Action RecognitionNTU RGB+D (Cross-View)
Accuracy95.7
652
Action RecognitionNTU RGB+D (Cross-subject)
Accuracy90.4
500
Action RecognitionNTU-60 (xsub)
Accuracy80.4
223
Action RecognitionNTU RGB+D 120 Cross-Subject
Accuracy85.6
222
Skeleton-based Action RecognitionNTU 60 (X-sub)
Accuracy78.8
220
Action RecognitionNTU-120 (cross-subject (xsub))
Accuracy68.2
211
Action RecognitionNTU RGB+D X-View 60
Accuracy85.5
190
Skeleton-based Action RecognitionNTU RGB+D 120 (X-set)
Top-1 Accuracy69.9
184
Skeleton-based Action RecognitionNTU 120 (X-sub)--
139
Showing 10 of 25 rows

Other info

Follow for update