Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

USDRL: Unified Skeleton-Based Dense Representation Learning with Multi-Grained Feature Decorrelation

About

Contrastive learning has achieved great success in skeleton-based representation learning recently. However, the prevailing methods are predominantly negative-based, necessitating additional momentum encoder and memory bank to get negative samples, which increases the difficulty of model training. Furthermore, these methods primarily concentrate on learning a global representation for recognition and retrieval tasks, while overlooking the rich and detailed local representations that are crucial for dense prediction tasks. To alleviate these issues, we introduce a Unified Skeleton-based Dense Representation Learning framework based on feature decorrelation, called USDRL, which employs feature decorrelation across temporal, spatial, and instance domains in a multi-grained manner to reduce redundancy among dimensions of the representations to maximize information extraction from features. Additionally, we design a Dense Spatio-Temporal Encoder (DSTE) to capture fine-grained action representations effectively, thereby enhancing the performance of dense prediction tasks. Comprehensive experiments, conducted on the benchmarks NTU-60, NTU-120, PKU-MMD I, and PKU-MMD II, across diverse downstream tasks including action recognition, action retrieval, and action detection, conclusively demonstrate that our approach significantly outperforms the current state-of-the-art (SOTA) approaches. Our code and models are available at https://github.com/wengwanjiang/USDRL.

Wanjiang Weng, Hongsong Wang, Junbo Wang, Lei He, Guosen Xie• 2024

Related benchmarks

TaskDatasetResultRank
Action RecognitionNTU RGB+D 120 (X-set)
Accuracy80.6
661
Skeleton-based Action RecognitionNTU 60 (X-sub)
Accuracy87.1
220
Skeleton-based Action RecognitionNTU RGB+D 120 (X-set)
Top-1 Accuracy80.6
184
Action RecognitionNTU RGB+D X-View 60
Accuracy93.2
172
Skeleton-based Action RecognitionNTU 120 (X-sub)--
139
Skeleton-based Action RecognitionNTU RGB+D 60 (X-View)
Top-1 Accuracy93.2
126
Action RecognitionNTU-120 (cross-subject (xsub))
Accuracy79.3
82
Action RecognitionPKU-MMD II (xsub)
Accuracy59.7
42
Action RecognitionNTU-60 (xsub)
Accuracy87.1
40
Skeleton-based Action RecognitionPKU-MMD II (x-sub)
Top-1 Acc59.7
21
Showing 10 of 18 rows

Other info

Code

Follow for update