Hierarchical Consistent Contrastive Learning for Skeleton-Based Action Recognition with Growing Augmentations

About

Contrastive learning has been proven beneficial for self-supervised skeleton-based action recognition. Most contrastive learning methods utilize carefully designed augmentations to generate different movement patterns of skeletons for the same semantics. However, it is still a pending issue to apply strong augmentations, which distort the images/skeletons' structures and cause semantic loss, due to their resulting unstable training. In this paper, we investigate the potential of adopting strong augmentations and propose a general hierarchical consistent contrastive learning framework (HiCLR) for skeleton-based action recognition. Specifically, we first design a gradual growing augmentation policy to generate multiple ordered positive pairs, which guide to achieve the consistency of the learned representation from different views. Then, an asymmetric loss is proposed to enforce the hierarchical consistency via a directional clustering operation in the feature space, pulling the representations from strongly augmented views closer to those from weakly augmented views for better generalizability. Meanwhile, we propose and evaluate three kinds of strong augmentations for 3D skeletons to demonstrate the effectiveness of our method. Extensive experiments show that HiCLR outperforms the state-of-the-art methods notably on three large-scale datasets, i.e., NTU60, NTU120, and PKUMMD.

Jiahang Zhang, Lilang Lin, Jiaying Liu• 2022

Related benchmarks

Task	Dataset	Result
Action Recognition	NTU RGB+D 120 (X-set)	Accuracy87.5	779
Action Recognition	NTU RGB+D (Cross-View)	Accuracy95.7	663
Action Recognition	NTU RGB+D (Cross-subject)	Accuracy90.4	511
Action Recognition	NTU-60 (xsub)	Accuracy80.4	271
Action Recognition	NTU RGB+D 120 Cross-Subject	Accuracy85.6	249
Action Recognition	NTU-120 (cross-subject (xsub))	Accuracy68.2	239
Action Recognition	NTU 120 (Cross-Setup)	Accuracy69.9	231
Skeleton-based Action Recognition	NTU 60 (X-sub)	Accuracy78.8	227
Action Recognition	NTU RGB+D X-View 60	Accuracy85.5	218
Skeleton-based Action Recognition	NTU RGB+D 120 (X-set)	Top-1 Accuracy69.9	184

Showing 10 of 27 rows

Other info

Follow for update

@wizwand_team Discord