CVTHead: One-shot Controllable Head Avatar with Vertex-feature Transformer

About

Reconstructing personalized animatable head avatars has significant implications in the fields of AR/VR. Existing methods for achieving explicit face control of 3D Morphable Models (3DMM) typically rely on multi-view images or videos of a single subject, making the reconstruction process complex. Additionally, the traditional rendering pipeline is time-consuming, limiting real-time animation possibilities. In this paper, we introduce CVTHead, a novel approach that generates controllable neural head avatars from a single reference image using point-based neural rendering. CVTHead considers the sparse vertices of mesh as the point set and employs the proposed Vertex-feature Transformer to learn local feature descriptors for each vertex. This enables the modeling of long-range dependencies among all the vertices. Experimental results on the VoxCeleb dataset demonstrate that CVTHead achieves comparable performance to state-of-the-art graphics-based methods. Moreover, it enables efficient rendering of novel human heads with various expressions, head poses, and camera views. These attributes can be explicitly controlled using the coefficients of 3DMMs, facilitating versatile and realistic animation in real-time scenarios.

Haoyu Ma, Tong Zhang, Shanlin Sun, Xiangyi Yan, Kun Han, Xiaohui Xie• 2023

Related benchmarks

Task	Dataset	Result
Self-Reenactment	HDTF	PSNR20.08	35
Cross-Reenactment	HDTF	CSIM59.1	32
Self-Reenactment	VFHQ (test)	PSNR18.43	23
Cross-identity reenactment	VFHQ (test)	CSIM0.374	23
Neural Rendering Reenactment	VFHQ	FPS18.09	11
Face Reenactment	VFHQ Self-reenactment one-shot	PSNR18.43	11
Face Reenactment	VFHQ Cross-reenactment zero-shot	CSIM0.374	11
One-shot Self-reenactment	HDTF	PSNR20.08	11
Cross-Reenactment	VFHQ	CSIM0.0861	6

Showing 9 of 9 rows

Other info

Follow for update

@wizwand_team Discord