Leveraging MoCap Data for Human Mesh Recovery

About

Training state-of-the-art models for human body pose and shape recovery from images or videos requires datasets with corresponding annotations that are really hard and expensive to obtain. Our goal in this paper is to study whether poses from 3D Motion Capture (MoCap) data can be used to improve image-based and video-based human mesh recovery methods. We find that fine-tune image-based models with synthetic renderings from MoCap data can increase their performance, by providing them with a wider variety of poses, textures and backgrounds. In fact, we show that simply fine-tuning the batch normalization layers of the model is enough to achieve large gains. We further study the use of MoCap data for video, and introduce PoseBERT, a transformer module that directly regresses the pose parameters and is trained via masked modeling. It is simple, generic and can be plugged on top of any state-of-the-art image-based model in order to transform it in a video-based model leveraging temporal information. Our experimental results show that the proposed approaches reach state-of-the-art performance on various datasets including 3DPW, MPI-INF-3DHP, MuPoTS-3D, MCB and AIST. Test code and models will be available soon.

Fabien Baradel, Thibault Groueix, Philippe Weinzaepfel, Romain Br\'egier, Yannis Kalantidis, Gr\'egory Rogez• 2021

Related benchmarks

Task	Dataset	Result
3D Human Pose Estimation	MPI-INF-3DHP (test)	--	606
3D Human Pose Estimation	3DPW (test)	PA-MPJPE52.9	514
3D Human Pose Estimation	CMU Panoptic (test)	--	32
Human 3D Mesh Recovery	3DPW 47 (test)	PA-MPJPE52.9	11
Human 3D Mesh Recovery	MPI-INF-3DHP 30 (test)	PA-MPJPE63.3	6
Human 3D Mesh Recovery	MuPoTS-3D 31 (test)	PA-MPJPE79.9	5
Human 3D Mesh Recovery	AIST 43 (test)	PA-MPJPE74.1	5

Showing 7 of 7 rows

Other info

Code

Follow for update

@wizwand_team Discord