Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Informative Sample Selection Model for Skeleton-based Action Recognition with Limited Training Samples

About

Skeleton-based human action recognition aims to classify human skeletal sequences, which are spatiotemporal representations of actions, into predefined categories. To reduce the reliance on costly annotations of skeletal sequences while maintaining competitive recognition accuracy, the task of 3D Action Recognition with Limited Training Samples, also known as semi-supervised 3D Action Recognition, has been proposed. In addition, active learning, which aims to proactively select the most informative unlabeled samples for annotation, has been explored in semi-supervised 3D Action Recognition for training sample selection. Specifically, researchers adopt an encoder-decoder framework to embed skeleton sequences into a latent space, where clustering information, combined with a margin-based selection strategy using a multi-head mechanism, is utilized to identify the most informative sequences in the unlabeled set for annotation. However, the most representative skeleton sequences may not necessarily be the most informative for the action recognizer, as the model may have already acquired similar knowledge from previously seen skeleton samples. To solve it, we reformulate Semi-supervised 3D action recognition via active learning from a novel perspective by casting it as a Markov Decision Process (MDP). Built upon the MDP framework and its training paradigm, we train an informative sample selection model to intelligently guide the selection of skeleton sequences for annotation. To enhance the representational capacity of the factors in the state-action pairs within our method, we project them from Euclidean space to hyperbolic space. Furthermore, we introduce a meta tuning strategy to accelerate the deployment of our method in real-world scenarios. Extensive experiments on three 3D action recognition benchmarks demonstrate the effectiveness of our method.

Zhigang Tu, Zhengbo Zhang, Jia Gong, Junsong Yuan, Bo Du• 2025

Related benchmarks

TaskDatasetResultRank
Action RecognitionNTU RGB+D 120 (X-set)
Accuracy73.2
717
Action RecognitionNTU RGB+D 60 (Cross-View)
Accuracy86.7
588
Action RecognitionNTU RGB-D Cross-Subject 60
Accuracy81.1
336
Action RecognitionNTU RGB+D 120 Cross-Subject
Accuracy69.9
222
Action RecognitionPKU-MMD Part I
Accuracy89.6
74
Action RecognitionPKU-MMD (Part II)
Accuracy40.3
71
Showing 6 of 6 rows

Other info

Follow for update