Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Skeleton Cloud Colorization for Unsupervised 3D Action Representation Learning

About

Skeleton-based human action recognition has attracted increasing attention in recent years. However, most of the existing works focus on supervised learning which requiring a large number of annotated action sequences that are often expensive to collect. We investigate unsupervised representation learning for skeleton action recognition, and design a novel skeleton cloud colorization technique that is capable of learning skeleton representations from unlabeled skeleton sequence data. Specifically, we represent a skeleton action sequence as a 3D skeleton cloud and colorize each point in the cloud according to its temporal and spatial orders in the original (unannotated) skeleton sequence. Leveraging the colorized skeleton point cloud, we design an auto-encoder framework that can learn spatial-temporal features from the artificial color labels of skeleton joints effectively. We evaluate our skeleton cloud colorization approach with action classifiers trained under different configurations, including unsupervised, semi-supervised and fully-supervised settings. Extensive experiments on NTU RGB+D and NW-UCLA datasets show that the proposed method outperforms existing unsupervised and semi-supervised 3D action recognition methods by large margins, and it achieves competitive performance in supervised 3D action recognition as well.

Siyuan Yang, Jun Liu, Shijian Lu, Meng Hwa Er, Alex C. Kot• 2021

Related benchmarks

TaskDatasetResultRank
Action RecognitionNTU RGB+D (Cross-View)
Accuracy94.9
609
Action RecognitionNTU RGB+D 60 (Cross-View)
Accuracy94.9
575
Action RecognitionNTU RGB+D (Cross-subject)
Accuracy88
474
Action RecognitionNTU RGB+D 60 (X-sub)
Accuracy88
467
Action RecognitionNTU RGB-D Cross-Subject 60
Accuracy79.8
305
Skeleton-based Action RecognitionNTU 60 (X-sub)
Accuracy88
220
Skeleton-based Action RecognitionNTU RGB+D (Cross-View)--
213
Action RecognitionNTU RGB+D X-View 60
Accuracy94.9
172
Skeleton-based Action RecognitionNTU RGB+D (Cross-subject)
Accuracy88
123
Skeleton-based Action RecognitionNTU 60 (X-view)
Accuracy79.9
119
Showing 10 of 17 rows

Other info

Follow for update