Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Prototypical Contrast and Reverse Prediction: Unsupervised Skeleton Based Action Recognition

About

In this paper, we focus on unsupervised representation learning for skeleton-based action recognition. Existing approaches usually learn action representations by sequential prediction but they suffer from the inability to fully learn semantic information. To address this limitation, we propose a novel framework named Prototypical Contrast and Reverse Prediction (PCRP), which not only creates reverse sequential prediction to learn low-level information (e.g., body posture at every frame) and high-level pattern (e.g., motion order), but also devises action prototypes to implicitly encode semantic similarity shared among sequences. In general, we regard action prototypes as latent variables and formulate PCRP as an expectation-maximization task. Specifically, PCRP iteratively runs (1) E-step as determining the distribution of prototypes by clustering action encoding from the encoder, and (2) M-step as optimizing the encoder by minimizing the proposed ProtoMAE loss, which helps simultaneously pull the action encoding closer to its assigned prototype and perform reverse prediction task. Extensive experiments on N-UCLA, NTU 60, and NTU 120 dataset present that PCRP outperforms state-of-the-art unsupervised methods and even achieves superior performance over some of supervised methods. Codes are available at https://github.com/Mikexu007/PCRP.

Shihao Xu, Haocong Rao, Xiping Hu, Bin Hu• 2020

Related benchmarks

TaskDatasetResultRank
Action RecognitionNTU RGB+D 120 (X-set)
Accuracy44.6
717
Action RecognitionNTU RGB+D (Cross-View)
Accuracy63.4
652
Action RecognitionNTU RGB+D 60 (Cross-View)
Accuracy63.5
588
Action RecognitionNTU RGB+D (Cross-subject)
Accuracy54.9
500
Action RecognitionNTU RGB+D X-sub 120
Accuracy41.7
430
Action RecognitionNTU RGB-D Cross-Subject 60
Accuracy53.9
336
Action RecognitionNTU RGB+D 120 Cross-Subject
Accuracy43
222
Action RecognitionNTU 120 (Cross-Setup)
Accuracy45.1
203
Action RecognitionN-UCLA
Accuracy87
36
Showing 9 of 9 rows

Other info

Code

Follow for update