Idempotent Unsupervised Representation Learning for Skeleton-Based Action Recognition

About

Generative models, as a powerful technique for generation, also gradually become a critical tool for recognition tasks. However, in skeleton-based action recognition, the features obtained from existing pre-trained generative methods contain redundant information unrelated to recognition, which contradicts the nature of the skeleton's spatially sparse and temporally consistent properties, leading to undesirable performance. To address this challenge, we make efforts to bridge the gap in theory and methodology and propose a novel skeleton-based idempotent generative model (IGM) for unsupervised representation learning. More specifically, we first theoretically demonstrate the equivalence between generative models and maximum entropy coding, which demonstrates a potential route that makes the features of generative models more compact by introducing contrastive learning. To this end, we introduce the idempotency constraint to form a stronger consistency regularization in the feature space, to push the features only to maintain the critical information of motion semantics for the recognition task. Our extensive experiments on benchmark datasets, NTU RGB+D and PKUMMD, demonstrate the effectiveness of our proposed method. On the NTU 60 xsub dataset, we observe a performance improvement from 84.6$\%$ to 86.2$\%$. Furthermore, in zero-shot adaptation scenarios, our model demonstrates significant efficacy by achieving promising results in cases that were previously unrecognizable. Our project is available at \url{https://github.com/LanglandsLin/IGM}.

Lilang Lin, Lehong Wu, Jiahang Zhang, Jiaying Liu• 2024

Related benchmarks

Task	Dataset	Result
Action Recognition	NTU RGB+D 120 (X-set)	Accuracy81.4	779
Action Recognition	NTU RGB+D 60 (Cross-View)	Accuracy91.2	601
Action Recognition	NTU RGB+D 60 (X-sub)	Accuracy86.2	496
Action Recognition	NTU RGB+D X-sub 120	Accuracy80	482
Skeleton-based Action Recognition	NTU 60 (X-sub)	Accuracy86.2	227
Action Recognition	NTU RGB+D X-View 60	Accuracy91.2	218
Skeleton-based Action Recognition	NTU RGB+D 120 (X-set)	Top-1 Accuracy81.4	184
Skeleton-based Action Recognition	NTU 120 (X-sub)	--	153
Skeleton-based Action Recognition	NTU RGB+D 60 (X-View)	Top-1 Accuracy91.2	126
Action Recognition	PKU-MMD (Part II)	--	90

Showing 10 of 10 rows

Other info

Follow for update

@wizwand_team Discord