Realistic One-shot Mesh-based Head Avatars
About
We present a system for realistic one-shot mesh-based human head avatars creation, ROME for short. Using a single photograph, our model estimates a person-specific head mesh and the associated neural texture, which encodes both local photometric and geometric details. The resulting avatars are rigged and can be rendered using a neural network, which is trained alongside the mesh and texture estimators on a dataset of in-the-wild videos. In the experiments, we observe that our system performs competitively both in terms of head geometry recovery and the quality of renders, especially for the cross-person reenactment. See results https://samsunglabs.github.io/rome/
Taras Khakhulin, Vanessa Sklyarova, Victor Lempitsky, Egor Zakharov• 2022
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Self-Reenactment | HDTF | PSNR20.51 | 29 | |
| Self-Reenactment | VFHQ (test) | PSNR19.96 | 23 | |
| Cross-identity reenactment | VFHQ (test) | CSIM0.53 | 23 | |
| Cross-Reenactment | HDTF | CSIM72.6 | 15 | |
| Video-driven Talking Head Generation (Self-Reenactment) | HDTF | FID76.44 | 12 | |
| 3D Portrait Animation (Cross Reenactment) | VFHQ 1.0 (test) | CSIM49.5 | 11 | |
| Self-Reenactment | CelebV-HQ 69 (inference) | PSNR30.74 | 7 | |
| Video-driven Talking Head Generation (Self-Reenactment) | NeRSemble Mono | PSNR31.07 | 7 | |
| Cross-Reenactment | CelebV-HQ 69 (inference) | FID78.02 | 7 | |
| Video-driven Talking Head Generation (Cross-Reenactment) | HDTF | FID79.31 | 7 |
Showing 10 of 20 rows