Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Are Spatial-Temporal Graph Convolution Networks for Human Action Recognition Over-Parameterized?

About

Spatial-temporal graph convolutional networks (ST-GCNs) showcase impressive performance in skeleton-based human action recognition (HAR). However, despite the development of numerous models, their recognition performance does not differ significantly after aligning the input settings. With this observation, we hypothesize that ST-GCNs are over-parameterized for HAR, a conjecture subsequently confirmed through experiments employing the lottery ticket hypothesis. Additionally, a novel sparse ST-GCNs generator is proposed, which trains a sparse architecture from a randomly initialized dense network while maintaining comparable performance levels to the dense components. Moreover, we generate multi-level sparsity ST-GCNs by integrating sparse structures at various sparsity levels and demonstrate that the assembled model yields a significant enhancement in HAR performance. Thorough experiments on four datasets, including NTU-RGB+D 60(120), Kinetics-400, and FineGYM, demonstrate that the proposed sparse ST-GCNs can achieve comparable performance to their dense components. Even with 95% fewer parameters, the sparse ST-GCNs exhibit a degradation of <1% in top-1 accuracy. Meanwhile, the multi-level sparsity ST-GCNs, which require only 66% of the parameters of the dense ST-GCNs, demonstrate an improvement of >1% in top-1 accuracy. The code is available at https://github.com/davelailai/Sparse-ST-GCN.

Jianyang Xie, Yitian Zhao, Yanda Meng, He Zhao, Anh Nguyen, Yalin Zheng• 2025

Related benchmarks

TaskDatasetResultRank
Action RecognitionNTU RGB+D 60 (Cross-View)
Accuracy97.4
575
Action RecognitionNTU RGB-D Cross-Subject 60
Accuracy92.9
305
Action RecognitionKinetics 400 (test)--
245
Action RecognitionNTU 120 (Cross-Setup)
Accuracy91.4
112
Action RecognitionNTU120 (cross-subject (CS))
Top-1 Accuracy90.4
36
Action RecognitionFineGYM 1.0 (test)
Accuracy95.3
9
Coarse-grained action recognitionNTU Moderate temporal corruption 60 (xsub)
Top-1 Accuracy0.896
9
Coarse-grained action recognitionNTU Minor temporal corruption 60 (xsub)
Top-1 Accuracy89.5
9
Coarse-grained action recognitionNTU Minor temporal corruption 120 (xsub)
Top-1 Accuracy81.3
9
Fine-grained Action RecognitionGym288-skeleton minor temporal corruption (Min.)
Top-1 Acc76.5
9
Showing 10 of 18 rows

Other info

Code

Follow for update