Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Actionlet-Dependent Contrastive Learning for Unsupervised Skeleton-Based Action Recognition

About

The self-supervised pretraining paradigm has achieved great success in skeleton-based action recognition. However, these methods treat the motion and static parts equally, and lack an adaptive design for different parts, which has a negative impact on the accuracy of action recognition. To realize the adaptive action modeling of both parts, we propose an Actionlet-Dependent Contrastive Learning method (ActCLR). The actionlet, defined as the discriminative subset of the human skeleton, effectively decomposes motion regions for better action modeling. In detail, by contrasting with the static anchor without motion, we extract the motion region of the skeleton data, which serves as the actionlet, in an unsupervised manner. Then, centering on actionlet, a motion-adaptive data transformation method is built. Different data transformations are applied to actionlet and non-actionlet regions to introduce more diversity while maintaining their own characteristics. Meanwhile, we propose a semantic-aware feature pooling method to build feature representations among motion and static regions in a distinguished manner. Extensive experiments on NTU RGB+D and PKUMMD show that the proposed method achieves remarkable action recognition performance. More visualization and quantitative experiments demonstrate the effectiveness of our method. Our project website is available at https://langlandslin.github.io/projects/ActCLR/

Lilang Lin, Jiahang Zhang, Jiaying Liu• 2023

Related benchmarks

TaskDatasetResultRank
Action RecognitionNTU RGB+D 120 (X-set)
Accuracy84.6
661
Action RecognitionNTU RGB+D 60 (Cross-View)
Accuracy93.9
575
Action RecognitionNTU RGB+D 60 (X-sub)
Accuracy88.2
467
Action RecognitionNTU RGB+D X-sub 120
Accuracy74.3
377
Skeleton-based Action RecognitionNTU 60 (X-sub)
Accuracy84.3
220
Skeleton-based Action RecognitionNTU RGB+D 120 (X-set)
Top-1 Accuracy75.7
184
Action RecognitionNTU RGB+D 120 Cross-Subject
Accuracy74.3
183
Action RecognitionNTU RGB+D X-View 60
Accuracy88.8
172
Skeleton-based Action RecognitionNTU 120 (X-sub)--
139
Skeleton-based Action RecognitionNTU RGB+D 60 (X-View)
Top-1 Accuracy88.8
126
Showing 10 of 20 rows

Other info

Code

Follow for update