Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Skeleton based Zero Shot Action Recognition in Joint Pose-Language Semantic Space

About

How does one represent an action? How does one describe an action that we have never seen before? Such questions are addressed by the Zero Shot Learning paradigm, where a model is trained on only a subset of classes and is evaluated on its ability to correctly classify an example from a class it has never seen before. In this work, we present a body pose based zero shot action recognition network and demonstrate its performance on the NTU RGB-D dataset. Our model learns to jointly encapsulate visual similarities based on pose features of the action performer as well as similarities in the natural language descriptions of the unseen action class names. We demonstrate how this pose-language semantic space encodes knowledge which allows our model to correctly predict actions not seen during training.

Bhavan Jasani, Afshaan Mazagonwalla• 2019

Related benchmarks

TaskDatasetResultRank
Action RecognitionNTU 60 (55/5 split)
Top-1 Acc40.12
35
Action RecognitionNTU-120 110/10 split
Top-1 Acc52.59
34
Action RecognitionNTU-60 48/12 split
Top-1 Acc30.06
27
Action RecognitionNTU-120 96/24 split
Top-1 Acc29.06
18
Skeleton Action RecognitionNTU-60 (55/5 random split)--
15
Showing 5 of 5 rows

Other info

Follow for update