Realistic One-shot Mesh-based Head Avatars
About
We present a system for realistic one-shot mesh-based human head avatars creation, ROME for short. Using a single photograph, our model estimates a person-specific head mesh and the associated neural texture, which encodes both local photometric and geometric details. The resulting avatars are rigged and can be rendered using a neural network, which is trained alongside the mesh and texture estimators on a dataset of in-the-wild videos. In the experiments, we observe that our system performs competitively both in terms of head geometry recovery and the quality of renders, especially for the cross-person reenactment. See results https://samsunglabs.github.io/rome/
Taras Khakhulin, Vanessa Sklyarova, Victor Lempitsky, Egor Zakharov• 2022
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Video-driven Talking Head Generation (Self-Reenactment) | HDTF | FID76.44 | 12 | |
| Self-Reenactment | CelebV-HQ 69 (inference) | PSNR30.74 | 7 | |
| Video-driven Talking Head Generation (Self-Reenactment) | NeRSemble Mono | PSNR31.07 | 7 | |
| Cross-Reenactment | CelebV-HQ 69 (inference) | FID78.02 | 7 | |
| Video-driven Talking Head Generation (Cross-Reenactment) | HDTF | FID79.31 | 7 | |
| Video-driven Talking Head Generation (Cross-Reenactment) | NeRSemble Mono | FID119.1 | 7 | |
| Avatar Synthesis | NeRSemble Single Image (test) | PSNR15.78 | 5 | |
| Cross-identity Face Reenactment | CelebA-HQ | CSIM0.519 | 5 | |
| Face Cross-reenactment | HDTF 1.0 (test) | CSIM0.507 | 5 | |
| Face Self-reenactment | HDTF 1.0 (test) | PSNR18.46 | 5 |
Showing 10 of 10 rows