Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Hierarchical Contrast for Unsupervised Skeleton-based Action Representation Learning

About

This paper targets unsupervised skeleton-based action representation learning and proposes a new Hierarchical Contrast (HiCo) framework. Different from the existing contrastive-based solutions that typically represent an input skeleton sequence into instance-level features and perform contrast holistically, our proposed HiCo represents the input into multiple-level features and performs contrast in a hierarchical manner. Specifically, given a human skeleton sequence, we represent it into multiple feature vectors of different granularities from both temporal and spatial domains via sequence-to-sequence (S2S) encoders and unified downsampling modules. Besides, the hierarchical contrast is conducted in terms of four levels: instance level, domain level, clip level, and part level. Moreover, HiCo is orthogonal to the S2S encoder, which allows us to flexibly embrace state-of-the-art S2S encoders. Extensive experiments on four datasets, i.e., NTU-60, NTU-120, PKU-MMD I and II, show that HiCo achieves a new state-of-the-art for unsupervised skeleton-based action representation learning in two downstream tasks including action recognition and retrieval, and its learned action representation is of good transferability. Besides, we also show that our framework is effective for semi-supervised skeleton-based action recognition. Our code is available at https://github.com/HuiGuanLab/HiCo.

Jianfeng Dong, Shengkai Sun, Zhonglin Liu, Shujie Chen, Baolong Liu, Xun Wang• 2022

Related benchmarks

TaskDatasetResultRank
Action RecognitionNTU RGB+D 120 (X-set)
Accuracy74.1
717
Action RecognitionNTU RGB+D 60 (Cross-View)
Accuracy85.5
588
Action RecognitionNTU RGB-D Cross-Subject 60
Accuracy80.4
336
Action RecognitionNTU-60 (xsub)
Accuracy81.1
223
Action RecognitionNTU RGB+D 120 Cross-Subject
Accuracy70
222
Action RecognitionNTU-120 (cross-subject (xsub))
Accuracy72.8
211
Action RecognitionNTU RGB+D X-View 60
Accuracy88.6
190
Action RecognitionPKU-MMD II (xsub)
Accuracy56.3
42
Action RecognitionNTU 60 (X-sub)
Accuracy (10% data)73
35
Action RetrievalNTU 60 (X-view)
Accuracy84.8
28
Showing 10 of 23 rows

Other info

Follow for update