Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Learning by Neighbor-Aware Semantics, Deciding by Open-form Flows: Towards Robust Zero-Shot Skeleton Action Recognition

About

Recognizing unseen skeleton action categories remains highly challenging due to the absence of corresponding skeletal priors. Existing approaches generally follow an ``align-then-classify'' paradigm but face two fundamental issues, \textit{i.e.}, (i) fragile point-to-point alignment arising from imperfect semantics, and (ii) rigid classifiers restricted by static decision boundaries and coarse-grained anchors. To address these issues, we propose a novel method for zero-shot skeleton action recognition, termed \texttt{\textbf{Flora}}, which builds upon \textbf{F}lexib\textbf{L}e neighb\textbf{O}r-aware semantic attunement and open-form dist\textbf{R}ibution-aware flow cl\textbf{A}ssifier. Specifically, we flexibly attune textual semantics by incorporating neighboring inter-class contextual cues to form direction-aware regional semantics, coupled with a cross-modal geometric consistency objective that ensures stable and robust point-to-region alignment. Furthermore, we employ noise-free flow matching to bridge the modality distribution gap between semantic and skeleton latent embeddings, while a condition-free contrastive regularization enhances discriminability, leading to a distribution-aware classifier with fine-grained decision boundaries achieved through token-level velocity predictions. Extensive experiments on three benchmark datasets validate the effectiveness of our method, showing particularly impressive performance even when trained with only 10% of the seen data. Code is available at https://github.com/cseeyangchen/Flora.

Yang Chen, Miaoge Li, Zhijie Rao, Deze Zeng, Song Guo, Jingcai Guo• 2025

Related benchmarks

TaskDatasetResultRank
Action RecognitionNTU RGB+D X-sub 120
Accuracy65.9
430
Action RecognitionNTU RGB-D Cross-Subject 60
Accuracy65.3
336
Action RecognitionNTU-60 (xsub)
Accuracy88.6
223
Action RecognitionNTU-120 (cross-subject (xsub))
Accuracy71.2
211
Action RecognitionNTU-60 48/12 split
Top-1 Acc56.1
103
Action RecognitionNTU-120 96/24 split
Top-1 Acc65.9
84
Action RecognitionNTU RGB+D 120 (110/10 Xsub)
Accuracy78.9
66
Action RecognitionNTU-RGB+D 60 (48/12)
Accuracy56.1
49
Action RecognitionPKU-MMD 46/5 I (Xsub)
Accuracy79.1
43
Action RecognitionNTU RGB+D Xsub 60 (Cross-Subject 55/5)
Accuracy86.3
40
Showing 10 of 20 rows

Other info

Follow for update