Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Vertex Feature Encoding and Hierarchical Temporal Modeling in a Spatial-Temporal Graph Convolutional Network for Action Recognition

About

This paper extends the Spatial-Temporal Graph Convolutional Network (ST-GCN) for skeleton-based action recognition by introducing two novel modules, namely, the Graph Vertex Feature Encoder (GVFE) and the Dilated Hierarchical Temporal Convolutional Network (DH-TCN). On the one hand, the GVFE module learns appropriate vertex features for action recognition by encoding raw skeleton data into a new feature space. On the other hand, the DH-TCN module is capable of capturing both short-term and long-term temporal dependencies using a hierarchical dilated convolutional network. Experiments have been conducted on the challenging NTU RGB-D-60 and NTU RGB-D 120 datasets. The obtained results show that our method competes with state-of-the-art approaches while using a smaller number of layers and parameters; thus reducing the required training time and memory.

Konstantinos Papadopoulos, Enjie Ghorbel, Djamila Aouada, Bj\"orn Ottersten• 2019

Related benchmarks

TaskDatasetResultRank
Action RecognitionNTU RGB+D 120 (X-set)
Accuracy79.8
661
Action RecognitionNTU RGB+D 60 (Cross-View)
Accuracy92.8
575
Action RecognitionNTU RGB+D 60 (X-sub)
Accuracy85.3
467
Action RecognitionNTU RGB+D X-sub 120
Accuracy78.3
377
Skeleton-based Action RecognitionNTU RGB+D (Cross-View)
Accuracy92.8
213
Action RecognitionNTU RGB+D 120 Cross-Subject
Accuracy78.3
183
Skeleton-based Action RecognitionNTU RGB+D 120 Cross-Subject
Top-1 Accuracy78.3
143
Skeleton-based Action RecognitionNTU 120 (X-sub)
Accuracy78.3
139
Skeleton-based Action RecognitionNTU-RGB+D 120 (Cross-setup)
Accuracy79.8
136
Skeleton-based Action RecognitionNTU RGB+D (Cross-subject)
Accuracy85.3
123
Showing 10 of 14 rows

Other info

Follow for update