Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

DreamTalk: When Emotional Talking Head Generation Meets Diffusion Probabilistic Models

About

Emotional talking head generation has attracted growing attention. Previous methods, which are mainly GAN-based, still struggle to consistently produce satisfactory results across diverse emotions and cannot conveniently specify personalized emotions. In this work, we leverage powerful diffusion models to address the issue and propose DreamTalk, a framework that employs meticulous design to unlock the potential of diffusion models in generating emotional talking heads. Specifically, DreamTalk consists of three crucial components: a denoising network, a style-aware lip expert, and a style predictor. The diffusion-based denoising network can consistently synthesize high-quality audio-driven face motions across diverse emotions. To enhance lip-motion accuracy and emotional fullness, we introduce a style-aware lip expert that can guide lip-sync while preserving emotion intensity. To more conveniently specify personalized emotions, a diffusion-based style predictor is utilized to predict the personalized emotion directly from the audio, eliminating the need for extra emotion reference. By this means, DreamTalk can consistently generate vivid talking faces across diverse emotions and conveniently specify personalized emotions. Extensive experiments validate DreamTalk's effectiveness and superiority. The code is available at https://github.com/ali-vilab/dreamtalk.

Yifeng Ma, Shiwei Zhang, Jiayu Wang, Xiang Wang, Yingya Zhang, Zhidong Deng• 2023

Related benchmarks

TaskDatasetResultRank
Audio-driven facial animationMEAD 41 (test)
PSNR27.801
26
Audio-driven facial animationRAVDESS 42 (test)
PSNR26.193
24
Talking Head GenerationHDTF
FID78.147
23
Talking Head ReenactmentGeneral Inference (test)
FPS7.832
13
Talking Head ReenactmentGeneral Inference
Inference Speed (FPS)7.832
13
Talking Head GenerationCelebV-HQ
AHD4.06
9
Talking Head GenerationCeleb-V
Sync-C5.709
9
Talking Head GenerationProposed Wild Dataset
Sync-C4.498
5
Showing 8 of 8 rows

Other info

Follow for update