Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Boosting Skeleton-based Zero-Shot Action Recognition with Training-Free Test-Time Adaptation

About

We introduce Skeleton-Cache, the first training-free test-time adaptation framework for skeleton-based zero-shot action recognition (SZAR), aimed at improving model generalization to unseen actions during inference. Skeleton-Cache reformulates inference as a lightweight retrieval process over a non-parametric cache that stores structured skeleton representations, combining both global and fine-grained local descriptors. To guide the fusion of descriptor-wise predictions, we leverage the semantic reasoning capabilities of large language models (LLMs) to assign class-specific importance weights. By integrating these structured descriptors with LLM-guided semantic priors, Skeleton-Cache dynamically adapts to unseen actions without any additional training or access to training data. Extensive experiments on NTU RGB+D 60/120 and PKU-MMD II demonstrate that Skeleton-Cache consistently boosts the performance of various SZAR backbones under both zero-shot and generalized zero-shot settings. The code is publicly available at https://github.com/Alchemist0754/Skeleton-Cache.

Jingmin Zhu, Anqi Zhu, Hossein Rahmani, Jun Liu, Mohammed Bennamoun, Qiuhong Ke• 2025

Related benchmarks

TaskDatasetResultRank
Zero-shot Action RecognitionNTU RGB+D 60 (55/5 Split)
Top-1 Accuracy89.41
16
Zero-shot Action RecognitionNTU RGB+D 60 (48/12 Split)
Top-1 Acc52.03
16
Zero-shot Action RecognitionNTU RGB+D 120 (110/10 Split)
Top-1 Accuracy77.6
16
Zero-shot Action RecognitionNTU-RGB+D 120 (96/24)
Top-1 Acc56.83
16
Skeleton Action RecognitionNTU RGB+D 120 (Cross-Setup (Xset), 110/10 Split)
S Score62.19
13
Skeleton-based Action RecognitionNTU RGB+D 60 (55/5 Split)
ZSL Accuracy89.41
11
Skeleton-based Action RecognitionNTU RGB+D 60 (48/12 Split)
ZSL47.83
11
Skeleton-based Action RecognitionNTU RGB+D 120 (96/24 Split)
ZSL Accuracy56.83
11
Skeleton-based Action RecognitionNTU 60 (random-split)
ZSL Accuracy89.86
9
Skeleton-based Action RecognitionNTU 120 (random-split)
ZSL Accuracy56.18
9
Showing 10 of 11 rows

Other info

Follow for update