Towards Metrical Reconstruction of Human Faces
About
Face reconstruction and tracking is a building block of numerous applications in AR/VR, human-machine interaction, as well as medical applications. Most of these applications rely on a metrically correct prediction of the shape, especially, when the reconstructed subject is put into a metrical context (i.e., when there is a reference object of known size). A metrical reconstruction is also needed for any application that measures distances and dimensions of the subject (e.g., to virtually fit a glasses frame). State-of-the-art methods for face reconstruction from a single image are trained on large 2D image datasets in a self-supervised fashion. However, due to the nature of a perspective projection they are not able to reconstruct the actual face dimensions, and even predicting the average human face outperforms some of these methods in a metrical sense. To learn the actual shape of a face, we argue for a supervised training scheme. Since there exists no large-scale 3D dataset for this task, we annotated and unified small- and medium-scale databases. The resulting unified dataset is still a medium-scale dataset with more than 2k identities and training purely on it would lead to overfitting. To this end, we take advantage of a face recognition network pretrained on a large-scale 2D image dataset, which provides distinct features for different faces and is robust to expression, illumination, and camera changes. Using these features, we train our face shape estimator in a supervised fashion, inheriting the robustness and generalization of the face recognition network. Our method, which we call MICA (MetrIC fAce), outperforms the state-of-the-art reconstruction methods by a large margin, both on current non-metric benchmarks as well as on our metric benchmarks (15% and 24% lower average error on NoW, respectively).
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| 3D Face Reconstruction | NoW face challenge (test) | Median Error (mm)0.9 | 38 | |
| 3D Face Reconstruction | REALY (frontal-view) | Overall Error2.134 | 34 | |
| 6DoF head pose estimation | BIWI (test) | Yaw Error5.4 | 31 | |
| Single-view 3D face reconstruction | REALY-S side-view | NMSE (All, Avg)2.125 | 24 | |
| Monocular 3D Face Reconstruction | NoW (val) | Full Median Error0.913 | 20 | |
| Face shape estimation | Stirling Reconstruction Benchmark NoW Protocol (LQ) | Non-Metrical Median Error0.96 | 14 | |
| Face shape estimation | Stirling Reconstruction Benchmark NoW Protocol HQ | Non-Metrical Median Error0.92 | 14 | |
| Face shape estimation | NoW Challenge original (test) | Non-Metrical Median Error0.9 | 13 | |
| Neutral Face Reconstruction | NoW full (val) | Median Error0.9 | 12 | |
| 3D Metrical Reconstruction | NoW (test) | Median Error (mm)1.08 | 10 |