Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Efficient Region-Aware Neural Radiance Fields for High-Fidelity Talking Portrait Synthesis

About

This paper presents ER-NeRF, a novel conditional Neural Radiance Fields (NeRF) based architecture for talking portrait synthesis that can concurrently achieve fast convergence, real-time rendering, and state-of-the-art performance with small model size. Our idea is to explicitly exploit the unequal contribution of spatial regions to guide talking portrait modeling. Specifically, to improve the accuracy of dynamic head reconstruction, a compact and expressive NeRF-based Tri-Plane Hash Representation is introduced by pruning empty spatial regions with three planar hash encoders. For speech audio, we propose a Region Attention Module to generate region-aware condition feature via an attention mechanism. Different from existing methods that utilize an MLP-based encoder to learn the cross-modal relation implicitly, the attention mechanism builds an explicit connection between audio features and spatial regions to capture the priors of local motions. Moreover, a direct and fast Adaptive Pose Encoding is introduced to optimize the head-torso separation problem by mapping the complex transformation of the head pose into spatial coordinates. Extensive experiments demonstrate that our method renders better high-fidelity and audio-lips synchronized talking portrait videos, with realistic details and high efficiency compared to previous methods.

Jiahe Li, Jiawei Zhang, Xiao Bai, Jun Zhou, Lin Gu• 2023

Related benchmarks

TaskDatasetResultRank
Talking head synthesisUser Study
Lip Sync Quality3.189
18
3D Talking Face GenerationHDTF
NIQE24.053
12
Personalized 3D Talking Face GenerationHDTF
PSNR25.17
12
Head reconstructionVideo sequences (test)
PSNR32.5216
11
Talking Head GenerationUser Study
Lip Sync65.1
11
Talking head synthesisMay avatar Shaheen audio
Sync-D9.775
10
Talking head synthesisMay avatar Lieu audio
Sync-D10.017
10
Talking Head ReconstructionTalking Head Reconstruction (test)
PSNR32.52
9
Lip synchronizationCross-subject Lip Synchronization (Audio B)
LSE-D10.734
8
Talking Face GenerationUser Study (test)
Lip-sync Accuracy5.57
8
Showing 10 of 25 rows

Other info

Follow for update