2D/3D Pose Estimation and Action Recognition using Multitask Deep Learning

About

Action recognition and human pose estimation are closely related but both problems are generally handled as distinct tasks in the literature. In this work, we propose a multitask framework for jointly 2D and 3D pose estimation from still images and human action recognition from video sequences. We show that a single architecture can be used to solve the two problems in an efficient way and still achieves state-of-the-art results. Additionally, we demonstrate that optimization from end-to-end leads to significantly higher accuracy than separated learning. The proposed architecture can be trained with data from different categories simultaneously in a seamlessly way. The reported results on four datasets (MPII, Human3.6M, Penn Action and NTU) demonstrate the effectiveness of our method on the targeted tasks.

Diogo C. Luvizon, David Picard, Hedi Tabia• 2018

Related benchmarks

Task	Dataset	Result
3D Human Pose Estimation	Human3.6M (test)	MPJPE (Average)53.2	570
Action Recognition	NTU RGB+D (Cross-subject)	Accuracy85.5	511
Action Recognition	NTU RGB-D Cross-Subject 60	Accuracy85.5	358
Human Pose Estimation	MPII (test)	--	350
3D Human Pose Estimation	Human3.6M	--	197
3D Human Pose Estimation	Human3.6M Protocol 1 (test)	Dir. Error (Protocol 1)49.2	183
3D Human Pose Estimation	Human3.6M (subjects 9 and 11)	Average Error53.2	180
3D Human Pose Estimation	Human3.6M v1 (test)	Avg Performance53.2	58
Action Recognition	Penn-Action	Accuracy98.6	31
Action Recognition	Penn-Action (test)	Accuracy98.6	27

Showing 10 of 15 rows

Other info

Follow for update

@wizwand_team Discord