Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Topology-aware Convolutional Neural Network for Efficient Skeleton-based Action Recognition

About

In the context of skeleton-based action recognition, graph convolutional networks (GCNs) have been rapidly developed, whereas convolutional neural networks (CNNs) have received less attention. One reason is that CNNs are considered poor in modeling the irregular skeleton topology. To alleviate this limitation, we propose a pure CNN architecture named Topology-aware CNN (Ta-CNN) in this paper. In particular, we develop a novel cross-channel feature augmentation module, which is a combo of map-attend-group-map operations. By applying the module to the coordinate level and the joint level subsequently, the topology feature is effectively enhanced. Notably, we theoretically prove that graph convolution is a special case of normal convolution when the joint dimension is treated as channels. This confirms that the topology modeling power of GCNs can also be implemented by using a CNN. Moreover, we creatively design a SkeletonMix strategy which mixes two persons in a unique manner and further boosts the performance. Extensive experiments are conducted on four widely used datasets, i.e. N-UCLA, SBU, NTU RGB+D and NTU RGB+D 120 to verify the effectiveness of Ta-CNN. We surpass existing CNN-based methods significantly. Compared with leading GCN-based methods, we achieve comparable performance with much less complexity in terms of the required GFLOPs and parameters.

Kailin Xu, Fanfan Ye, Qiaoyong Zhong, Di Xie• 2021

Related benchmarks

TaskDatasetResultRank
Action RecognitionNTU RGB+D 120 (X-set)
Accuracy87.3
661
Action RecognitionNTU RGB+D (Cross-View)
Accuracy94.8
609
Action RecognitionNTU RGB+D 60 (Cross-View)
Accuracy95.1
575
Action RecognitionNTU RGB+D (Cross-subject)
Accuracy90.4
474
Action RecognitionNTU RGB+D 60 (X-sub)
Accuracy90.7
467
Action RecognitionNTU RGB+D X-sub 120
Accuracy86.7
377
Skeleton-based Action RecognitionNTU RGB+D (Cross-View)
Accuracy95.1
213
Skeleton-based Action RecognitionNTU RGB+D 120 (X-set)
Top-1 Accuracy86.8
184
Skeleton-based Action RecognitionNTU RGB+D 120 Cross-Subject
Top-1 Accuracy85.7
143
Skeleton-based Action RecognitionNTU-RGB+D 120 (Cross-setup)
Accuracy87.3
136
Showing 10 of 18 rows

Other info

Follow for update