Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Disentangling and Unifying Graph Convolutions for Skeleton-Based Action Recognition

About

Spatial-temporal graphs have been widely used by skeleton-based action recognition algorithms to model human action dynamics. To capture robust movement patterns from these graphs, long-range and multi-scale context aggregation and spatial-temporal dependency modeling are critical aspects of a powerful feature extractor. However, existing methods have limitations in achieving (1) unbiased long-range joint relationship modeling under multi-scale operators and (2) unobstructed cross-spacetime information flow for capturing complex spatial-temporal dependencies. In this work, we present (1) a simple method to disentangle multi-scale graph convolutions and (2) a unified spatial-temporal graph convolutional operator named G3D. The proposed multi-scale aggregation scheme disentangles the importance of nodes in different neighborhoods for effective long-range modeling. The proposed G3D module leverages dense cross-spacetime edges as skip connections for direct information propagation across the spatial-temporal graph. By coupling these proposals, we develop a powerful feature extractor named MS-G3D based on which our model outperforms previous state-of-the-art methods on three large-scale datasets: NTU RGB+D 60, NTU RGB+D 120, and Kinetics Skeleton 400.

Ziyu Liu, Hongwen Zhang, Zhenghao Chen, Zhiyong Wang, Wanli Ouyang• 2020

Related benchmarks

TaskDatasetResultRank
Action RecognitionNTU RGB+D 120 (X-set)
Accuracy89
661
Action RecognitionNTU RGB+D (Cross-View)
Accuracy96.2
609
Action RecognitionNTU RGB+D 60 (Cross-View)
Accuracy96.6
575
Action RecognitionNTU RGB+D (Cross-subject)
Accuracy91.5
474
Action RecognitionNTU RGB+D 60 (X-sub)
Accuracy92.2
467
Action RecognitionKinetics-400
Top-1 Acc45.1
413
Action RecognitionNTU RGB+D X-sub 120
Accuracy87.2
377
Action RecognitionNTU RGB-D Cross-Subject 60
Accuracy92.2
305
Action RecognitionKinetics 400 (test)--
245
Skeleton-based Action RecognitionNTU 60 (X-sub)
Accuracy91.5
220
Showing 10 of 62 rows

Other info

Code

Follow for update