Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Spatio-Temporal Joint Density Driven Learning for Skeleton-Based Action Recognition

About

Traditional approaches in unsupervised or self supervised learning for skeleton-based action classification have concentrated predominantly on the dynamic aspects of skeletal sequences. Yet, the intricate interaction between the moving and static elements of the skeleton presents a rarely tapped discriminative potential for action classification. This paper introduces a novel measurement, referred to as spatial-temporal joint density (STJD), to quantify such interaction. Tracking the evolution of this density throughout an action can effectively identify a subset of discriminative moving and/or static joints termed "prime joints" to steer self-supervised learning. A new contrastive learning strategy named STJD-CL is proposed to align the representation of a skeleton sequence with that of its prime joints while simultaneously contrasting the representations of prime and nonprime joints. In addition, a method called STJD-MP is developed by integrating it with a reconstruction-based framework for more effective learning. Experimental evaluations on the NTU RGB+D 60, NTU RGB+D 120, and PKUMMD datasets in various downstream tasks demonstrate that the proposed STJD-CL and STJD-MP improved performance, particularly by 3.5 and 3.6 percentage points over the state-of-the-art contrastive methods on the NTU RGB+D 120 dataset using X-sub and X-set evaluations, respectively.

Shanaka Ramesh Gunasekara, Wanqing Li, Philip Ogunbona, Jack Yang• 2025

Related benchmarks

TaskDatasetResultRank
Action RecognitionNTU RGB+D 120 (X-set)
Accuracy86.8
717
Action RecognitionNTU RGB+D 60 (Cross-View)
Accuracy94.8
588
Action RecognitionNTU RGB+D 60 (X-sub)
Accuracy89.3
467
Action RecognitionNTU RGB+D X-sub 120
Accuracy83.5
430
Action RecognitionNTU-60 (xsub)
Accuracy85.9
223
Action RecognitionNTU-120 (cross-subject (xsub))
Accuracy77.1
211
Action RecognitionNTU 120 (Cross-Setup)
Accuracy79.3
203
Action RecognitionNTU-60 (xview)
Accuracy90
117
Action RecognitionPKU-MMD Part I
Accuracy93.2
74
Action RecognitionPKU-MMD (Part II)
Accuracy55.3
71
Showing 10 of 10 rows

Other info

Follow for update