Look Ma, no markers: holistic performance capture without the hassle

About

We tackle the problem of highly-accurate, holistic performance capture for the face, body and hands simultaneously. Motion-capture technologies used in film and game production typically focus only on face, body or hand capture independently, involve complex and expensive hardware and a high degree of manual intervention from skilled operators. While machine-learning-based approaches exist to overcome these problems, they usually only support a single camera, often operate on a single part of the body, do not produce precise world-space results, and rarely generalize outside specific contexts. In this work, we introduce the first technique for marker-free, high-quality reconstruction of the complete human body, including eyes and tongue, without requiring any calibration, manual intervention or custom hardware. Our approach produces stable world-space results from arbitrary camera rigs as well as supporting varied capture environments and clothing. We achieve this through a hybrid approach that leverages machine learning models trained exclusively on synthetic data and powerful parametric models of human shape and motion. We evaluate our method on a number of body, face and hand reconstruction benchmarks and demonstrate state-of-the-art results that generalize on diverse datasets.

Charlie Hewitt, Fatemeh Saleh, Sadegh Aliakbarian, Lohit Petikam, Shideh Rezaeifar, Louis Florentin, Zafiirah Hosenie, Thomas J Cashman, Julien Valentin, Darren Cosker, Tadas Baltrusaitis• 2024

Related benchmarks

Task	Dataset	Result
Human Mesh Recovery	RICH	MPJPE39.52	19
Human Mesh Recovery	MoYo	MPJPE60.15	16
3D Human Pose Estimation	Chi3D	MPJPE46.47	15
Human Pose Estimation	Harmony4D	PVE45.6	9
Hand Pose Estimation	FreiHAND (test)	PA-MPVPE8.1	7
3D human mesh fitting	MammaEval-S	MPJPE25.97	5
3D human mesh fitting	MammaEval-D	MPJPE27.98	5
3D human reconstruction	Harmony4D + CHI3D + MammaEval-D (test)	Mean Perceptual Depth (mm)13.73	5
2D Landmark Prediction	Harmony4D IoU > 0.5	Mean 2D Euclidean Distance Error (pixels)31.45	4
2D Landmark Prediction	RICH	Mean 2D Euclidean Distance Error (pixels)13.26	4

Showing 10 of 14 rows

Other info

Follow for update

@wizwand_team Discord