Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

DG-STGCN: Dynamic Spatial-Temporal Modeling for Skeleton-based Action Recognition

About

Graph convolution networks (GCN) have been widely used in skeleton-based action recognition. We note that existing GCN-based approaches primarily rely on prescribed graphical structures (ie., a manually defined topology of skeleton joints), which limits their flexibility to capture complicated correlations between joints. To move beyond this limitation, we propose a new framework for skeleton-based action recognition, namely Dynamic Group Spatio-Temporal GCN (DG-STGCN). It consists of two modules, DG-GCN and DG-TCN, respectively, for spatial and temporal modeling. In particular, DG-GCN uses learned affinity matrices to capture dynamic graphical structures instead of relying on a prescribed one, while DG-TCN performs group-wise temporal convolutions with varying receptive fields and incorporates a dynamic joint-skeleton fusion module for adaptive multi-level temporal modeling. On a wide range of benchmarks, including NTURGB+D, Kinetics-Skeleton, BABEL, and Toyota SmartHome, DG-STGCN consistently outperforms state-of-the-art methods, often by a notable margin.

Haodong Duan, Jiaqi Wang, Kai Chen, Dahua Lin• 2022

Related benchmarks

TaskDatasetResultRank
Action RecognitionNTU RGB+D 120 (X-set)
Accuracy91.4
770
Action RecognitionNTU RGB+D 60 (Cross-View)
Accuracy98.6
601
Action RecognitionKinetics-400
Top-1 Acc40.3
498
Action RecognitionNTU RGB-D Cross-Subject 60
Accuracy94.1
358
Action RecognitionKinetics 400 (test)--
245
Action RecognitionNTU RGB+D 120 Cross-Subject
Accuracy89.6
241
Action RecognitionNTU 120 (Cross-Setup)
Accuracy91.3
231
Action RecognitionToyota SmartHome (TSH) (CV1)
Accuracy41.8
68
Action RecognitionNTU RGB+D Xsub 60 (Cross-Subject 55/5)
Accuracy93.2
66
Action RecognitionToyota Smarthome CS
Accuracy65.1
58
Showing 10 of 18 rows

Other info

Follow for update