Informative Sample Selection Model for Skeleton-based Action Recognition with Limited Training Samples

About

Skeleton-based human action recognition aims to classify human skeletal sequences, which are spatiotemporal representations of actions, into predefined categories. To reduce the reliance on costly annotations of skeletal sequences while maintaining competitive recognition accuracy, the task of 3D Action Recognition with Limited Training Samples, also known as semi-supervised 3D Action Recognition, has been proposed. In addition, active learning, which aims to proactively select the most informative unlabeled samples for annotation, has been explored in semi-supervised 3D Action Recognition for training sample selection. Specifically, researchers adopt an encoder-decoder framework to embed skeleton sequences into a latent space, where clustering information, combined with a margin-based selection strategy using a multi-head mechanism, is utilized to identify the most informative sequences in the unlabeled set for annotation. However, the most representative skeleton sequences may not necessarily be the most informative for the action recognizer, as the model may have already acquired similar knowledge from previously seen skeleton samples. To solve it, we reformulate Semi-supervised 3D action recognition via active learning from a novel perspective by casting it as a Markov Decision Process (MDP). Built upon the MDP framework and its training paradigm, we train an informative sample selection model to intelligently guide the selection of skeleton sequences for annotation. To enhance the representational capacity of the factors in the state-action pairs within our method, we project them from Euclidean space to hyperbolic space. Furthermore, we introduce a meta tuning strategy to accelerate the deployment of our method in real-world scenarios. Extensive experiments on three 3D action recognition benchmarks demonstrate the effectiveness of our method.

Zhigang Tu, Zhengbo Zhang, Jia Gong, Junsong Yuan, Bo Du• 2025

Related benchmarks

Task	Dataset	Result
Action Recognition	NTU RGB+D 120 (X-set)	Accuracy73.2	770
Action Recognition	NTU RGB+D 60 (Cross-View)	Accuracy86.7	601
Action Recognition	NTU RGB-D Cross-Subject 60	Accuracy81.1	358
Action Recognition	NTU RGB+D 120 Cross-Subject	Accuracy69.9	241
Action Recognition	PKU-MMD (Part II)	Accuracy40.3	90
Action Recognition	PKU-MMD Part I	Accuracy89.6	74

Showing 6 of 6 rows

Other info

Follow for update

@wizwand_team Discord