Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Skeleton-Contrastive 3D Action Representation Learning

About

This paper strives for self-supervised learning of a feature space suitable for skeleton-based action recognition. Our proposal is built upon learning invariances to input skeleton representations and various skeleton augmentations via a noise contrastive estimation. In particular, we propose inter-skeleton contrastive learning, which learns from multiple different input skeleton representations in a cross-contrastive manner. In addition, we contribute several skeleton-specific spatial and temporal augmentations which further encourage the model to learn the spatio-temporal dynamics of skeleton data. By learning similarities between different skeleton representations as well as augmented views of the same sequence, the network is encouraged to learn higher-level semantics of the skeleton data than when only using the augmented views. Our approach achieves state-of-the-art performance for self-supervised learning from skeleton data on the challenging PKU and NTU datasets with multiple downstream tasks, including action recognition, action retrieval and semi-supervised learning. Code is available at https://github.com/fmthoker/skeleton-contrast.

Fida Mohammad Thoker, Hazel Doughty, Cees G.M. Snoek• 2021

Related benchmarks

TaskDatasetResultRank
Action RecognitionNTU RGB+D 120 (X-set)
Accuracy67.1
661
Action RecognitionNTU RGB+D (Cross-View)
Accuracy90.4
609
Action RecognitionNTU RGB+D 60 (Cross-View)
Accuracy85.2
575
Action RecognitionNTU RGB+D 60 (X-sub)
Accuracy76.3
467
Action RecognitionNTU RGB+D X-sub 120
Accuracy67.9
377
Action RecognitionNTU RGB-D Cross-Subject 60
Accuracy70.8
305
Skeleton-based Action RecognitionNTU 60 (X-sub)
Accuracy65.9
220
Action RecognitionNTU 120 (Cross-Setup)
Accuracy67.9
112
Action RecognitionNTU-120 (cross-subject (xsub))
Accuracy67.9
82
Action RecognitionPKU-MMD Part I
Accuracy80.9
53
Showing 10 of 36 rows

Other info

Follow for update