Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

3D Human Action Representation Learning via Cross-View Consistency Pursuit

About

In this work, we propose a Cross-view Contrastive Learning framework for unsupervised 3D skeleton-based action Representation (CrosSCLR), by leveraging multi-view complementary supervision signal. CrosSCLR consists of both single-view contrastive learning (SkeletonCLR) and cross-view consistent knowledge mining (CVC-KM) modules, integrated in a collaborative learning manner. It is noted that CVC-KM works in such a way that high-confidence positive/negative samples and their distributions are exchanged among views according to their embedding similarity, ensuring cross-view consistency in terms of contrastive context, i.e., similar distributions. Extensive experiments show that CrosSCLR achieves remarkable action recognition results on NTU-60 and NTU-120 datasets under unsupervised settings, with observed higher-quality action representations. Our code is available at https://github.com/LinguoLi/CrosSCLR.

Linguo Li, Minsi Wang, Bingbing Ni, Hang Wang, Jiancheng Yang, Wenjun Zhang• 2021

Related benchmarks

TaskDatasetResultRank
Action RecognitionNTU RGB+D 120 (X-set)
Accuracy80.4
717
Action RecognitionNTU RGB+D (Cross-View)
Accuracy92.5
652
Action RecognitionNTU RGB+D 60 (Cross-View)
Accuracy92.5
588
Action RecognitionNTU RGB+D (Cross-subject)
Accuracy86.2
500
Action RecognitionNTU RGB+D 60 (X-sub)
Accuracy86.2
467
Action RecognitionNTU RGB+D X-sub 120
Accuracy80.5
430
Action RecognitionNTU RGB-D Cross-Subject 60
Accuracy86.2
336
Action RecognitionNTU-60 (xsub)
Accuracy86.2
223
Action RecognitionNTU RGB+D 120 Cross-Subject
Accuracy80.5
222
Skeleton-based Action RecognitionNTU 60 (X-sub)
Accuracy86.2
220
Showing 10 of 34 rows

Other info

Code

Follow for update