Portrait4D-v2: Pseudo Multi-View Data Creates Better 4D Head Synthesizer

About

In this paper, we propose a novel learning approach for feed-forward one-shot 4D head avatar synthesis. Different from existing methods that often learn from reconstructing monocular videos guided by 3DMM, we employ pseudo multi-view videos to learn a 4D head synthesizer in a data-driven manner, avoiding reliance on inaccurate 3DMM reconstruction that could be detrimental to the synthesis performance. The key idea is to first learn a 3D head synthesizer using synthetic multi-view images to convert monocular real videos into multi-view ones, and then utilize the pseudo multi-view videos to learn a 4D head synthesizer via cross-view self-reenactment. By leveraging a simple vision transformer backbone with motion-aware cross-attentions, our method exhibits superior performance compared to previous methods in terms of reconstruction fidelity, geometry consistency, and motion control accuracy. We hope our method offers novel insights into integrating 3D priors with 2D supervisions for improved 4D head avatar creation.

Yu Deng, Duomin Wang, Baoyuan Wang• 2024

Related benchmarks

Task	Dataset	Result
Self-Reenactment	HDTF	PSNR22.87	35
Cross-identity reenactment	VFHQ (test)	CSIM0.6731	23
Self-Reenactment	VFHQ (test)	PSNR21.34	23
Portrait Animation (Self-reenactment)	VFHQ (test)	FVD506.1	23
Cross-Reenactment	HDTF	CSIM85.7	21
3D Head Avatar Reconstruction	Ava 256	PSNR11.9	21
Portrait Animation (Cross-reenactment)	FFHQ source + VFHQ driving (test)	CSIM0.6702	18
Self-reenactment portrait animation	MEAD 59 (test)	CSIM0.8793	18
Talking head video generation	HDTF	FID25.74	14
Video-driven Talking Head Generation (Self-Reenactment)	HDTF	FID27.83	12

Showing 10 of 34 rows

Other info

Follow for update

@wizwand_team Discord