Actionlet-Dependent Contrastive Learning for Unsupervised Skeleton-Based Action Recognition

About

The self-supervised pretraining paradigm has achieved great success in skeleton-based action recognition. However, these methods treat the motion and static parts equally, and lack an adaptive design for different parts, which has a negative impact on the accuracy of action recognition. To realize the adaptive action modeling of both parts, we propose an Actionlet-Dependent Contrastive Learning method (ActCLR). The actionlet, defined as the discriminative subset of the human skeleton, effectively decomposes motion regions for better action modeling. In detail, by contrasting with the static anchor without motion, we extract the motion region of the skeleton data, which serves as the actionlet, in an unsupervised manner. Then, centering on actionlet, a motion-adaptive data transformation method is built. Different data transformations are applied to actionlet and non-actionlet regions to introduce more diversity while maintaining their own characteristics. Meanwhile, we propose a semantic-aware feature pooling method to build feature representations among motion and static regions in a distinguished manner. Extensive experiments on NTU RGB+D and PKUMMD show that the proposed method achieves remarkable action recognition performance. More visualization and quantitative experiments demonstrate the effectiveness of our method. Our project website is available at https://langlandslin.github.io/projects/ActCLR/

Lilang Lin, Jiahang Zhang, Jiaying Liu• 2023

Related benchmarks

Task	Dataset	Result
Action Recognition	NTU RGB+D 120 (X-set)	Accuracy84.6	770
Action Recognition	NTU RGB+D 60 (Cross-View)	Accuracy93.9	601
Action Recognition	NTU RGB+D 60 (X-sub)	Accuracy88.2	496
Action Recognition	NTU RGB+D X-sub 120	Accuracy74.3	473
Action Recognition	NTU-60 (xsub)	Accuracy88.2	251
Action Recognition	NTU RGB+D 120 Cross-Subject	Accuracy74.3	241
Action Recognition	NTU-120 (cross-subject (xsub))	Accuracy82.1	239
Action Recognition	NTU 120 (Cross-Setup)	Accuracy84.6	231
Skeleton-based Action Recognition	NTU 60 (X-sub)	Accuracy84.3	220
Action Recognition	NTU RGB+D X-View 60	Accuracy88.8	218

Showing 10 of 38 rows

Other info

Code

Follow for update

@wizwand_team Discord