EAMM: One-Shot Emotional Talking Face via Audio-Based Emotion-Aware Motion Model

About

Although significant progress has been made to audio-driven talking face generation, existing methods either neglect facial emotion or cannot be applied to arbitrary subjects. In this paper, we propose the Emotion-Aware Motion Model (EAMM) to generate one-shot emotional talking faces by involving an emotion source video. Specifically, we first propose an Audio2Facial-Dynamics module, which renders talking faces from audio-driven unsupervised zero- and first-order key-points motion. Then through exploring the motion model's properties, we further propose an Implicit Emotion Displacement Learner to represent emotion-related facial dynamics as linearly additive displacements to the previously acquired motion representations. Comprehensive experiments demonstrate that by incorporating the results from both modules, our method can generate satisfactory talking face results on arbitrary subjects with realistic emotion patterns.

Xinya Ji, Hang Zhou, Kaisiyuan Wang, Qianyi Wu, Wayne Wu, Feng Xu, Xun Cao• 2022

Related benchmarks

Task	Dataset	Result
Audio-driven facial animation	MEAD 41 (test)	PSNR26.807	26
Audio-driven facial animation	RAVDESS 42 (test)	PSNR26.071	24
Talking Face Generation	LRS2 (test)	SSIM0.4623	18
Talking head synthesis	User Study	Lip Sync Quality1.92	18
Talking Face Generation	HDTF (test)	SSIM0.396	16
Talking Head Reenactment	General Inference (test)	FPS8.351	13
Talking Head Reenactment	General Inference	Inference Speed (FPS)8.351	13
Talking Face Emotion Editing	User Study Extended Emotion	Emotional Accuracy6.7	12
Talking Face Emotion Editing	User Study Basic Emotion	Emotional Expression10.4	12
Talking Face Generation	CREMA-D (test)	SSIM0.414	8

Showing 10 of 23 rows

Other info

Follow for update

@wizwand_team Discord