Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Variational Contrastive Learning for Skeleton-based Action Recognition

About

In recent years, self-supervised representation learning for skeleton-based action recognition has advanced with the development of contrastive learning methods. However, most of contrastive paradigms are inherently discriminative and often struggle to capture the variability and uncertainty intrinsic to human motion. To address this issue, we propose a variational contrastive learning framework that integrates probabilistic latent modeling with contrastive self-supervised learning. This formulation enables the learning of structured and semantically meaningful representations that generalize across different datasets and supervision levels. Extensive experiments on three widely used skeleton-based action recognition benchmarks show that our proposed method consistently outperforms existing approaches, particularly in low-label regimes. Moreover, qualitative analyses show that the features provided by our method are more relevant given the motion and sample characteristics, with more focus on important skeleton joints, when compared to the other methods.

Dang Dinh Nguyen, Decky Aspandi Latif, Titus Zaharia• 2026

Related benchmarks

TaskDatasetResultRank
Action RecognitionNTU-60 (xsub)
Accuracy86.6
223
Skeleton-based Action RecognitionNTU 60 (X-sub)
Accuracy75.2
220
Action RecognitionNTU-120 (cross-subject (xsub))
Accuracy79.8
211
Action RecognitionNTU 120 (Cross-Setup)
Accuracy81.4
203
Action RecognitionNTU RGB+D X-View 60
Accuracy92.9
190
Skeleton-based Action RecognitionNTU 60 (X-view)
Accuracy80.2
119
Action RecognitionNTU-60 (xview)
Accuracy80.2
117
Action RecognitionPKU-MMD Part I
Accuracy86.1
74
Action RecognitionPKU-MMD (Part II)
Accuracy39.2
71
Skeleton-based Action RecognitionNTU RGB+D 60 (Cross-Subject)
Accuracy75.2
59
Showing 10 of 10 rows

Other info

Follow for update