Multi-view Human Pose and Shape Estimation Using Learnable Volumetric Aggregation

About

Human pose and shape estimation from RGB images is a highly sought after alternative to marker-based motion capture, which is laborious, requires expensive equipment, and constrains capture to laboratory environments. Monocular vision-based algorithms, however, still suffer from rotational ambiguities and are not ready for translation in healthcare applications, where high accuracy is paramount. While fusion of data from multiple viewpoints could overcome these challenges, current algorithms require further improvement to obtain clinically acceptable accuracies. In this paper, we propose a learnable volumetric aggregation approach to reconstruct 3D human body pose and shape from calibrated multi-view images. We use a parametric representation of the human body, which makes our approach directly applicable to medical applications. Compared to previous approaches, our framework shows higher accuracy and greater promise for real-time prediction, given its cost efficiency.

Soyong Shin, Eni Halilaj• 2020

Related benchmarks

Task	Dataset	Result
3D Human Pose Estimation	MPI-INF-3DHP (test)	PCK97.4	606
3D Human Pose Estimation	Human3.6M (test)	--	570
3D Human Pose Estimation	Human3.6M	MPJPE46.9	197
Human Mesh Reconstruction	Human3.6M	--	50
Human Mesh Recovery	MPI-INF-3DHP	MPJPE50.2	43
Human Mesh Recovery	Human3.6M Protocol 1 (test)	PA-MPJPE35.4	33
3D human shape and pose estimation	MPI-INF-3DHP	MPJPE-PA50.2	29

Showing 7 of 7 rows

Other info

Follow for update

@wizwand_team Discord