Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Attention-Based Lip Audio-Visual Synthesis for Talking Face Generation in the Wild

About

Talking face generation with great practical significance has attracted more attention in recent audio-visual studies. How to achieve accurate lip synchronization is a long-standing challenge to be further investigated. Motivated by xxx, in this paper, an AttnWav2Lip model is proposed by incorporating spatial attention module and channel attention module into lip-syncing strategy. Rather than focusing on the unimportant regions of the face image, the proposed AttnWav2Lip model is able to pay more attention on the lip region reconstruction. To our limited knowledge, this is the first attempt to introduce attention mechanism to the scheme of talking face generation. An extensive experiments have been conducted to evaluate the effectiveness of the proposed model. Compared to the baseline measured by LSE-D and LSE-C metrics, a superior performance has been demonstrated on the benchmark lip synthesis datasets, including LRW, LRS2 and LRS3.

Ganglai Wang, Peng Zhang, Lei Xie, Wei Huang, Yufei Zha• 2022

Related benchmarks

TaskDatasetResultRank
Lip-syncingLRS2 1 (test)
LSE-D7.339
12
Talking Face GenerationLRS2--
8
Talking Head GenerationLRS2 35
LSE-C6.834
6
Talking Head GenerationLRS3 37
LSE-C7.086
6
Talking Head GenerationLRW 38
LSE-C6.581
6
Showing 5 of 5 rows

Other info

Follow for update