Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Zero-shot Skeleton-based Action Recognition via Mutual Information Estimation and Maximization

About

Zero-shot skeleton-based action recognition aims to recognize actions of unseen categories after training on data of seen categories. The key is to build the connection between visual and semantic space from seen to unseen classes. Previous studies have primarily focused on encoding sequences into a singular feature vector, with subsequent mapping the features to an identical anchor point within the embedded space. Their performance is hindered by 1) the ignorance of the global visual/semantic distribution alignment, which results in a limitation to capture the true interdependence between the two spaces. 2) the negligence of temporal information since the frame-wise features with rich action clues are directly pooled into a single feature vector. We propose a new zero-shot skeleton-based action recognition method via mutual information (MI) estimation and maximization. Specifically, 1) we maximize the MI between visual and semantic space for distribution alignment; 2) we leverage the temporal information for estimating the MI by encouraging MI to increase as more frames are observed. Extensive experiments on three large-scale skeleton action datasets confirm the effectiveness of our method. Code: https://github.com/YujieOuO/SMIE.

Yujie Zhou, Wenwen Qiang, Anyi Rao, Ning Lin, Bing Su, Jiaqi Wang• 2023

Related benchmarks

TaskDatasetResultRank
Action RecognitionNTU RGB+D 60 (X-sub)
Accuracy77.98
467
Skeleton-based Action RecognitionNTU RGB+D 120 (X-set)
Top-1 Accuracy57
184
Skeleton-based Action RecognitionNTU RGB+D 120 Cross-Subject
Top-1 Accuracy61.3
143
Action RecognitionNTU RGB+D 120 (Cross-View)
Accuracy65.74
47
Action RecognitionNTU 60 (55/5 split)
Top-1 Acc77.98
35
Action RecognitionNTU-120 110/10 split
Top-1 Acc65.74
34
Skeleton Action RecognitionNTU RGB+D Cross-Subject (Xsub) 120
Accuracy42.3
29
Action RecognitionNTU-60 48/12 split
Top-1 Acc40.18
27
Action RecognitionNTU-120 96/24 split
Top-1 Acc45.3
18
Zero-shot Action RecognitionNTU-RGB+D 120 (96/24)
Top-1 Acc45.3
16
Showing 10 of 40 rows

Other info

Follow for update