Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

One-Shot Action Recognition via Multi-Scale Spatial-Temporal Skeleton Matching

About

One-shot skeleton action recognition, which aims to learn a skeleton action recognition model with a single training sample, has attracted increasing interest due to the challenge of collecting and annotating large-scale skeleton action data. However, most existing studies match skeleton sequences by comparing their feature vectors directly which neglects spatial structures and temporal orders of skeleton data. This paper presents a novel one-shot skeleton action recognition technique that handles skeleton action recognition via multi-scale spatial-temporal feature matching. We represent skeleton data at multiple spatial and temporal scales and achieve optimal feature matching from two perspectives. The first is multi-scale matching which captures the scale-wise semantic relevance of skeleton data at multiple spatial and temporal scales simultaneously. The second is cross-scale matching which handles different motion magnitudes and speeds by capturing sample-wise relevance across multiple scales. Extensive experiments over three large-scale datasets (NTU RGB+D, NTU RGB+D 120, and PKU-MMD) show that our method achieves superior one-shot skeleton action recognition, and it outperforms the state-of-the-art consistently by large margins.

Siyuan Yang, Jun Liu, Shijian Lu, Er Meng Hwa, Alex C. Kot• 2023

Related benchmarks

TaskDatasetResultRank
Action RecognitionNTU RGB+D 120 one-shot protocol--
26
Skeleton-based Action RecognitionPKU-MMD (unseen)
Accuracy86.9
8
Skeleton-based Action RecognitionNTU-60 (50 seen / 10 unseen)
Accuracy82.7
6
Skeleton-based Action RecognitionNTU-120 (100 seen / 20 unseen)
Accuracy0.687
6
Showing 4 of 4 rows

Other info

Follow for update