Realistic One-shot Mesh-based Head Avatars

About

We present a system for realistic one-shot mesh-based human head avatars creation, ROME for short. Using a single photograph, our model estimates a person-specific head mesh and the associated neural texture, which encodes both local photometric and geometric details. The resulting avatars are rigged and can be rendered using a neural network, which is trained alongside the mesh and texture estimators on a dataset of in-the-wild videos. In the experiments, we observe that our system performs competitively both in terms of head geometry recovery and the quality of renders, especially for the cross-person reenactment. See results https://samsunglabs.github.io/rome/

Taras Khakhulin, Vanessa Sklyarova, Victor Lempitsky, Egor Zakharov• 2022

Related benchmarks

Task	Dataset	Result
Self-Reenactment	HDTF	PSNR20.51	35
Self-Reenactment	VFHQ (test)	PSNR19.96	23
Cross-identity reenactment	VFHQ (test)	CSIM0.53	23
Cross-Reenactment	HDTF	CSIM72.6	21
Video-driven Talking Head Generation (Self-Reenactment)	HDTF	FID76.44	12
3D Portrait Animation (Cross Reenactment)	VFHQ 1.0 (test)	CSIM49.5	11
Self-Reenactment	CelebV-HQ 69 (inference)	PSNR30.74	7
Video-driven Talking Head Generation (Self-Reenactment)	NeRSemble Mono	PSNR31.07	7
Cross-Reenactment	CelebV-HQ 69 (inference)	FID78.02	7
Video-driven Talking Head Generation (Cross-Reenactment)	HDTF	FID79.31	7

Showing 10 of 21 rows

Other info

Follow for update

@wizwand_team Discord