Revealing Key Details to See Differences: A Novel Prototypical Perspective for Skeleton-based Action Recognition

About

In skeleton-based action recognition, a key challenge is distinguishing between actions with similar trajectories of joints due to the lack of image-level details in skeletal representations. Recognizing that the differentiation of similar actions relies on subtle motion details in specific body parts, we direct our approach to focus on the fine-grained motion of local skeleton components. To this end, we introduce ProtoGCN, a Graph Convolutional Network (GCN)-based model that breaks down the dynamics of entire skeleton sequences into a combination of learnable prototypes representing core motion patterns of action units. By contrasting the reconstruction of prototypes, ProtoGCN can effectively identify and enhance the discriminative representation of similar actions. Without bells and whistles, ProtoGCN achieves state-of-the-art performance on multiple benchmark datasets, including NTU RGB+D, NTU RGB+D 120, Kinetics-Skeleton, and FineGYM, which demonstrates the effectiveness of the proposed method. The code is available at https://github.com/firework8/ProtoGCN.

Hongda Liu, Yunfan Liu, Min Ren, Hao Wang, Yunlong Wang, Zhenan Sun• 2024

Related benchmarks

Task	Dataset	Result
Action Recognition	NTU RGB+D 120 (X-set)	Accuracy92.2	779
Action Recognition	NTU RGB+D 60 (X-sub)	Accuracy93.8	496
Action Recognition	NTU RGB+D X-sub 120	Accuracy90.9	482
Action Recognition	NTU RGB-D Cross-Subject 60	Accuracy93.8	358
Action Recognition	NTU-60 (xsub)	Accuracy93.8	271
Action Recognition	NTU RGB+D 120 Cross-Subject	--	249
Action Recognition	NTU-120 (cross-subject (xsub))	Accuracy90.9	239
Action Recognition	NTU 120 (Cross-Setup)	Accuracy92.2	231
Action Recognition	NTU RGB+D X-View 60	Accuracy97.8	218
Skeleton-based Action Recognition	NTU RGB+D (Cross-View)	Accuracy97.8	213

Showing 10 of 39 rows

Other info

Code

Follow for update

@wizwand_team Discord