Neuron: Learning Context-Aware Evolving Representations for Zero-Shot Skeleton Action Recognition

About

Zero-shot skeleton action recognition is a non-trivial task that requires robust unseen generalization with prior knowledge from only seen classes and shared semantics. Existing methods typically build the skeleton-semantics interactions by uncontrollable mappings and conspicuous representations, thereby can hardly capture the intricate and fine-grained relationship for effective cross-modal transferability. To address these issues, we propose a novel dyNamically Evolving dUal skeleton-semantic syneRgistic framework with the guidance of cOntext-aware side informatioN (dubbed Neuron), to explore more fine-grained cross-modal correspondence from micro to macro perspectives at both spatial and temporal levels, respectively. Concretely, 1) we first construct the spatial-temporal evolving micro-prototypes and integrate dynamic context-aware side information to capture the intricate and synergistic skeleton-semantic correlations step-by-step, progressively refining cross-model alignment; and 2) we introduce the spatial compression and temporal memory mechanisms to guide the growth of spatial-temporal micro-prototypes, enabling them to absorb structure-related spatial representations and regularity-dependent temporal patterns. Notably, such processes are analogous to the learning and growth of neurons, equipping the framework with the capacity to generalize to novel unseen action categories. Extensive experiments on various benchmark datasets demonstrated the superiority of the proposed method.

Yang Chen, Jingcai Guo, Song Guo, Dacheng Tao• 2024

Related benchmarks

Task	Dataset	Result
Action Recognition	NTU RGB+D X-sub 120	Accuracy33.5	482
Action Recognition	NTU RGB-D Cross-Subject 60	Accuracy62.7	358
Action Recognition	NTU-60 (xsub)	Accuracy81.5	271
Skeleton-based Action Recognition	NTU RGB+D 120 Cross-Subject	Top-1 Accuracy71.5	143
Action Recognition	NTU-60 48/12 split	Top-1 Acc62.7	119
Action Recognition	NTU-120 96/24 split	Top-1 Acc57.1	100
Action Recognition	NTU 60 (55/5 split)	Top-1 Acc86.9	73
Action Recognition	NTU-120 110/10 split	Top-1 Acc71.5	72
Action Recognition	NTU RGB+D 120 (110/10 Xsub)	Accuracy68.6	66
Action Recognition	NTU RGB+D Xsub 60 (Cross-Subject 55/5)	Accuracy86.9	66

Showing 10 of 40 rows

Other info

Code

Follow for update

@wizwand_team Discord