MoFA: Model-based Deep Convolutional Face Autoencoder for Unsupervised Monocular Reconstruction
About
In this work we propose a novel model-based deep convolutional autoencoder that addresses the highly challenging problem of reconstructing a 3D human face from a single in-the-wild color image. To this end, we combine a convolutional encoder network with an expert-designed generative model that serves as decoder. The core innovation is our new differentiable parametric decoder that encapsulates image formation analytically based on a generative model. Our decoder takes as input a code vector with exactly defined semantic meaning that encodes detailed face pose, shape, expression, skin reflectance and scene illumination. Due to this new way of combining CNN-based with model-based face reconstruction, the CNN-based encoder learns to extract semantically meaningful parameters from a single monocular input image. For the first time, a CNN encoder and an expert-designed generative model can be trained end-to-end in an unsupervised manner, which renders training on very large (unlabeled) real world data feasible. The obtained reconstructions compare favorably to current state-of-the-art approaches in terms of quality and richness of representation.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| 3D Face Reconstruction | 3DFAW frontal faces | Depth Correlation0.1597 | 8 | |
| 3D Face Reconstruction | FaceWarehouse Region-S (180 meshes of 9 subjects) | Mean Reconstruction Error (mm)2.19 | 6 | |
| 3D Face Reconstruction | BU-3DFE v1 (test) | Mean Point-to-Point RMSE (mm)3.22 | 6 | |
| Occlusion Segmentation | CelebA-HQ Unoccluded (test) | RMSE8.77 | 6 | |
| Occlusion Segmentation | CelebA-HQ Occluded (test) | RMSE9.2 | 6 | |
| Occlusion Segmentation | CelebA-HQ (test) | RMSE8.99 | 6 | |
| Face Identity Clustering | MoFA 84 images, 78 identities (test) | Top-1 Recall0.19 | 4 | |
| Face Recognition | MoFA-T (test) | Earth Mover's Distance (Same)0.3 | 3 |