Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

MoEE: Mixture of Emotion Experts for Audio-Driven Portrait Animation

About

The generation of talking avatars has achieved significant advancements in precise audio synchronization. However, crafting lifelike talking head videos requires capturing a broad spectrum of emotions and subtle facial expressions. Current methods face fundamental challenges: a) the absence of frameworks for modeling single basic emotional expressions, which restricts the generation of complex emotions such as compound emotions; b) the lack of comprehensive datasets rich in human emotional expressions, which limits the potential of models. To address these challenges, we propose the following innovations: 1) the Mixture of Emotion Experts (MoEE) model, which decouples six fundamental emotions to enable the precise synthesis of both singular and compound emotional states; 2) the DH-FaceEmoVid-150 dataset, specifically curated to include six prevalent human emotional expressions as well as four types of compound emotions, thereby expanding the training potential of emotion-driven models. Furthermore, to enhance the flexibility of emotion control, we propose an emotion-to-latents module that leverages multimodal inputs, aligning diverse control signals-such as audio, text, and labels-to ensure more varied control inputs as well as the ability to control emotions using audio alone. Through extensive quantitative and qualitative evaluations, we demonstrate that the MoEE framework, in conjunction with the DH-FaceEmoVid-150 dataset, excels in generating complex emotional expressions and nuanced facial details, setting a new benchmark in the field. These datasets will be publicly released.

Huaize Liu, Wenzhang Sun, Donglin Di, Shibo Sun, Jiahui Yang, Changqing Zou, Hujun Bao• 2025

Related benchmarks

TaskDatasetResultRank
Portrait Image AnimationDH-FaceEmoVid-150 (test)
FID39.619
12
Portrait Image AnimationMEAD (test)
FID39.42
12
Avatar GenerationDH-FaceEmoVid-150
Emo. Score4.73
6
Emotional Talking Head GenerationMead
Emotion Score4.65
6
Portrait Image AnimationHDTF (test)
FID28.834
6
Showing 5 of 5 rows

Other info

Follow for update