Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

SA-DVAE: Improving Zero-Shot Skeleton-Based Action Recognition by Disentangled Variational Autoencoders

About

Existing zero-shot skeleton-based action recognition methods utilize projection networks to learn a shared latent space of skeleton features and semantic embeddings. The inherent imbalance in action recognition datasets, characterized by variable skeleton sequences yet constant class labels, presents significant challenges for alignment. To address the imbalance, we propose SA-DVAE -- Semantic Alignment via Disentangled Variational Autoencoders, a method that first adopts feature disentanglement to separate skeleton features into two independent parts -- one is semantic-related and another is irrelevant -- to better align skeleton and semantic features. We implement this idea via a pair of modality-specific variational autoencoders coupled with a total correction penalty. We conduct experiments on three benchmark datasets: NTU RGB+D, NTU RGB+D 120 and PKU-MMD, and our experimental results show that SA-DAVE produces improved performance over existing methods. The code is available at https://github.com/pha123661/SA-DVAE.

Sheng-Wei Li, Zi-Xiang Wei, Wei-Jie Chen, Yi-Hsin Yu, Chih-Yuan Yang, Jane Yung-jen Hsu• 2024

Related benchmarks

TaskDatasetResultRank
Action RecognitionNTU RGB+D X-sub 120
Accuracy21.9
473
Action RecognitionNTU RGB-D Cross-Subject 60
Accuracy41.4
358
Action RecognitionNTU-60 (xsub)
Accuracy84.2
251
Action RecognitionNTU-120 (cross-subject (xsub))
Accuracy50.7
239
Skeleton-based Action RecognitionNTU RGB+D 120 Cross-Subject
Top-1 Accuracy68.8
143
Action RecognitionNTU-60 48/12 split
Top-1 Acc50.2
119
Action RecognitionNTU-120 96/24 split
Top-1 Acc46.12
100
Action RecognitionNTU 60 (55/5 split)
Top-1 Acc84.2
73
Action RecognitionNTU-120 110/10 split
Top-1 Acc68.8
72
Action RecognitionNTU RGB+D 120 (110/10 Xsub)
Accuracy55.6
66
Showing 10 of 53 rows

Other info

Code

Follow for update