Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

A Lip Sync Expert Is All You Need for Speech to Lip Generation In The Wild

About

In this work, we investigate the problem of lip-syncing a talking face video of an arbitrary identity to match a target speech segment. Current works excel at producing accurate lip movements on a static image or videos of specific people seen during the training phase. However, they fail to accurately morph the lip movements of arbitrary identities in dynamic, unconstrained talking face videos, resulting in significant parts of the video being out-of-sync with the new audio. We identify key reasons pertaining to this and hence resolve them by learning from a powerful lip-sync discriminator. Next, we propose new, rigorous evaluation benchmarks and metrics to accurately measure lip synchronization in unconstrained videos. Extensive quantitative evaluations on our challenging benchmarks show that the lip-sync accuracy of the videos generated by our Wav2Lip model is almost as good as real synced videos. We provide a demo video clearly showing the substantial impact of our Wav2Lip model and evaluation benchmarks on our website: \url{cvit.iiit.ac.in/research/projects/cvit-projects/a-lip-sync-expert-is-all-you-need-for-speech-to-lip-generation-in-the-wild}. The code and models are released at this GitHub repository: \url{github.com/Rudrabha/Wav2Lip}. You can also try out the interactive demo at this link: \url{bhaasha.iiit.ac.in/lipsync}.

K R Prajwal, Rudrabha Mukhopadhyay, Vinay Namboodiri, C V Jawahar• 2020

Related benchmarks

TaskDatasetResultRank
Talking Face GenerationLRW (test)
SSIM0.874
28
Audio-driven facial animationMEAD 41 (test)
PSNR27.819
26
Audio-driven facial animationRAVDESS 42 (test)
PSNR27.931
24
Talking Face GenerationLRS2 (test)
SSIM0.8962
18
Talking head synthesisUser Study
Lip Sync Quality3.839
18
Visual DubbingContextDubBench 1.0 (test)
FID19.33
18
Audio Driven Talking Head GenerationMead
Sync8.7778
14
Audio Driven Talking Head GenerationCREMA
Sync6.7109
14
Talking Face GenerationVoxCeleb2 (test)
SSIM0.846
14
Talking Head ReenactmentGeneral Inference (test)
FPS15.243
13
Showing 10 of 44 rows

Other info

Code

Follow for update