Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Variational Contrastive Learning for Skeleton-based Action Recognition

About

In recent years, self-supervised representation learning for skeleton-based action recognition has advanced with the development of contrastive learning methods. However, most of contrastive paradigms are inherently discriminative and often struggle to capture the variability and uncertainty intrinsic to human motion. To address this issue, we propose a variational contrastive learning framework that integrates probabilistic latent modeling with contrastive self-supervised learning. This formulation enables the learning of structured and semantically meaningful representations that generalize across different datasets and supervision levels. Extensive experiments on three widely used skeleton-based action recognition benchmarks show that our proposed method consistently outperforms existing approaches, particularly in low-label regimes. Moreover, qualitative analyses show that the features provided by our method are more relevant given the motion and sample characteristics, with more focus on important skeleton joints, when compared to the other methods.

Dang Dinh Nguyen, Decky Aspandi Latif, Titus Zaharia• 2026

Related benchmarks

TaskDatasetResultRank
Skeleton-based Action RecognitionNTU 60 (X-sub)
Accuracy75.2
220
Action RecognitionNTU RGB+D X-View 60
Accuracy92.9
172
Skeleton-based Action RecognitionNTU 60 (X-view)
Accuracy80.2
119
Action RecognitionNTU 120 (Cross-Setup)
Accuracy81.4
112
Action RecognitionNTU-120 (cross-subject (xsub))
Accuracy79.8
82
Skeleton-based Action RecognitionNTU RGB+D 60 (Cross-Subject)
Accuracy75.2
59
Action RecognitionPKU-MMD Part I
Accuracy86.1
53
Action RecognitionPKU-MMD (Part II)
Accuracy39.2
52
Action RecognitionNTU-60 (xsub)
Accuracy86.6
40
Action RecognitionNTU-60 (xview)
Accuracy80.2
12
Showing 10 of 10 rows

Other info

Follow for update