Action Recognition Based on Joint Trajectory Maps Using Convolutional Neural Networks

About

Recently, Convolutional Neural Networks (ConvNets) have shown promising performances in many computer vision tasks, especially image-based recognition. How to effectively use ConvNets for video-based recognition is still an open problem. In this paper, we propose a compact, effective yet simple method to encode spatio-temporal information carried in $3D$ skeleton sequences into multiple $2D$ images, referred to as Joint Trajectory Maps (JTM), and ConvNets are adopted to exploit the discriminative features for real-time human action recognition. The proposed method has been evaluated on three public benchmarks, i.e., MSRC-12 Kinect gesture dataset (MSRC-12), G3D dataset and UTD multimodal human action dataset (UTD-MHAD) and achieved the state-of-the-art results.

Pichao Wang, Zhaoyang Li, Yonghong Hou, Wanqing Li• 2016

Related benchmarks

Task	Dataset	Result
Action Recognition	NTU RGB+D (Cross-View)	Accuracy81.08	652
Action Recognition	NTU RGB+D 60 (Cross-View)	Accuracy35.9	601
Action Recognition	NTU RGB+D (Cross-subject)	Accuracy76.32	500
Action Recognition	NTU RGB+D 60 (X-sub)	Accuracy39.1	496
Skeleton-based Action Recognition	NTU (Cross-Subject)	Accuracy73.4	86
Action Recognition	NTU RGB+D	Accuracy75.2	50
Skeleton-based Action Recognition	NTU RGB+D Cross-View (CV) 1.0	Accuracy75.2	38
Action Recognition	UTD-MHAD (cross-subject)	Accuracy87.9	36
Action Recognition	NTU RGB+D V2 (Cross Subject)	Accuracy73.4	16
Action Recognition	NTU RGB+D V2 (Cross View)	Accuracy75.2	16

Showing 10 of 13 rows

Other info

Follow for update

@wizwand_team Discord