Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Syntactically Guided Generative Embeddings for Zero-Shot Skeleton Action Recognition

About

We introduce SynSE, a novel syntactically guided generative approach for Zero-Shot Learning (ZSL). Our end-to-end approach learns progressively refined generative embedding spaces constrained within and across the involved modalities (visual, language). The inter-modal constraints are defined between action sequence embedding and embeddings of Parts of Speech (PoS) tagged words in the corresponding action description. We deploy SynSE for the task of skeleton-based action sequence recognition. Our design choices enable SynSE to generalize compositionally, i.e., recognize sequences whose action descriptions contain words not encountered during training. We also extend our approach to the more challenging Generalized Zero-Shot Learning (GZSL) problem via a confidence-based gating mechanism. We are the first to present zero-shot skeleton action recognition results on the large-scale NTU-60 and NTU-120 skeleton action datasets with multiple splits. Our results demonstrate SynSE's state of the art performance in both ZSL and GZSL settings compared to strong baselines on the NTU-60 and NTU-120 datasets. The code and pretrained models are available at https://github.com/skelemoa/synse-zsl

Pranay Gupta, Divyanshu Sharma, Ravi Kiran Sarvadevabhatla• 2021

Related benchmarks

TaskDatasetResultRank
Action RecognitionNTU RGB+D 60 (X-sub)
Accuracy75.81
467
Skeleton-based Action RecognitionNTU RGB+D 120 (X-set)
Top-1 Accuracy59.3
184
Skeleton-based Action RecognitionNTU RGB+D 120 Cross-Subject
Top-1 Accuracy52.4
143
Action RecognitionNTU RGB+D 120 (Cross-View)
Accuracy62.69
47
Action RecognitionNTU 60 (55/5 split)
Top-1 Acc75.81
35
Action RecognitionNTU-120 110/10 split
Top-1 Acc62.69
34
Skeleton Action RecognitionNTU RGB+D Cross-Subject (Xsub) 120
Accuracy41.9
29
Action RecognitionNTU-60 48/12 split
Top-1 Acc33.3
27
Action RecognitionNTU-120 96/24 split
Top-1 Acc38.7
18
Zero-shot Action RecognitionNTU RGB+D 60 (48/12 Split)
Top-1 Acc33.3
16
Showing 10 of 50 rows

Other info

Code

Follow for update