Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Skeleton based action recognition using translation-scale invariant image mapping and multi-scale deep cnn

About

This paper presents an image classification based approach for skeleton-based video action recognition problem. Firstly, A dataset independent translation-scale invariant image mapping method is proposed, which transformes the skeleton videos to colour images, named skeleton-images. Secondly, A multi-scale deep convolutional neural network (CNN) architecture is proposed which could be built and fine-tuned on the powerful pre-trained CNNs, e.g., AlexNet, VGGNet, ResNet etal.. Even though the skeleton-images are very different from natural images, the fine-tune strategy still works well. At last, we prove that our method could also work well on 2D skeleton video data. We achieve the state-of-the-art results on the popular benchmard datasets e.g. NTU RGB+D, UTD-MHAD, MSRC-12, and G3D. Especially on the largest and challenge NTU RGB+D, UTD-MHAD, and MSRC-12 dataset, our method outperforms other methods by a large margion, which proves the efficacy of the proposed method.

Bo Li, Mingyi He, Xuelian Cheng, Yucheng Chen, Yuchao Dai• 2017

Related benchmarks

TaskDatasetResultRank
Action RecognitionNTU RGB+D (Cross-View)
Accuracy92.3
609
Action RecognitionNTU RGB+D 60 (Cross-View)
Accuracy92.3
575
Action RecognitionNTU RGB+D (Cross-subject)
Accuracy85.02
474
Action RecognitionNTU RGB-D Cross-Subject 60
Accuracy85
305
Skeleton-based Action RecognitionNTU RGB+D (Cross-View)
Accuracy92.3
213
Skeleton-based Action RecognitionNTU RGB+D (Cross-subject)
Accuracy85
123
Action RecognitionUTD-MHAD (cross-subject)
Accuracy96.27
36
Action RecognitionG3D (test)
Accuracy93.9
11
Gesture RecognitionMSRC-12 Kinect Gesture Dataset (cross-subject)
Accuracy99.41
7
Showing 9 of 9 rows

Other info

Follow for update