Monocular, One-stage, Regression of Multiple 3D People

About

This paper focuses on the regression of multiple 3D people from a single RGB image. Existing approaches predominantly follow a multi-stage pipeline that first detects people in bounding boxes and then independently regresses their 3D body meshes. In contrast, we propose to Regress all meshes in a One-stage fashion for Multiple 3D People (termed ROMP). The approach is conceptually simple, bounding box-free, and able to learn a per-pixel representation in an end-to-end manner. Our method simultaneously predicts a Body Center heatmap and a Mesh Parameter map, which can jointly describe the 3D body mesh on the pixel level. Through a body-center-guided sampling process, the body mesh parameters of all people in the image are easily extracted from the Mesh Parameter map. Equipped with such a fine-grained representation, our one-stage framework is free of the complex multi-stage process and more robust to occlusion. Compared with state-of-the-art methods, ROMP achieves superior performance on the challenging multi-person benchmarks, including 3DPW and CMU Panoptic. Experiments on crowded/occluded datasets demonstrate the robustness under various types of occlusion. The released code is the first real-time implementation of monocular multi-person 3D mesh regression.

Yu Sun, Qian Bao, Wu Liu, Yili Fu, Michael J. Black, Tao Mei• 2020

Related benchmarks

Task	Dataset	Result
3D Human Pose Estimation	3DPW (test)	PA-MPJPE47.3	514
3D Human Mesh Recovery	3DPW (test)	MPJPE76.7	341
Pose Estimation	COCO (val)	AP14.7	319
Multi-person Pose Estimation	CrowdPose (test)	--	202
Human Mesh Recovery	3DPW	PA-MPJPE47.3	159
3D Human Pose and Shape Estimation	3DPW (test)	MPJPE-PA47.3	158
3D Human Pose Estimation	3DPW	PA-MPJPE53.3	137
3D Human Pose Estimation	MPI-INF-3DHP	MPJPE95.11	122
3D Human Mesh Recovery	3DPW	PA-MPJPE47.3	80
3D Human Pose and Shape Estimation	3DPW	PA-MPJPE47.3	74

Showing 10 of 68 rows

Other info

Code

Follow for update

@wizwand_team Discord