Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Responsive Listening Head Generation: A Benchmark Dataset and Baseline

About

We present a new listening head generation benchmark, for synthesizing responsive feedbacks of a listener (e.g., nod, smile) during a face-to-face conversation. As the indispensable complement to talking heads generation, listening head generation has seldomly been studied in literature. Automatically synthesizing listening behavior that actively responds to a talking head, is critical to applications such as digital human, virtual agents and social robots. In this work, we propose a novel dataset "ViCo", highlighting the listening head generation during a face-to-face conversation. A total number of 92 identities (67 speakers and 76 listeners) are involved in ViCo, featuring 483 clips in a paired "speaking-listening" pattern, where listeners show three listening styles based on their attitudes: positive, neutral, negative. Different from traditional speech-to-gesture or talking-head generation, listening head generation takes as input both the audio and visual signals from the speaker, and gives non-verbal feedbacks (e.g., head motions, facial expressions) in a real-time manner. Our dataset supports a wide range of applications such as human-to-human interaction, video-to-video translation, cross-modal understanding and generation. To encourage further research, we also release a listening head generation baseline, conditioning on different listening attitudes. Code & ViCo dataset: https://project.mhzhou.com/vico.

Mohan Zhou, Yalong Bai, Wei Zhang, Ting Yao, Tiejun Zhao, Tao Mei• 2021

Related benchmarks

TaskDatasetResultRank
Listener Facial Motion GenerationViCo (test)
FD Expression39.02
7
Listening Head GenerationViCo (test)
FD (Exp)39.02
6
Listener Head GenerationViCo (D_test)
FD (Expression)15.03
5
Listener Head GenerationViCo (test)
SSIM0.56
5
Listener Head GenerationViCo out-of-domain (D_ood)
FD (exp)22.81
5
Listening Head GenerationRealTalk (Dtest)
FD-exp20.11
4
Image Quality AssessmentRealTalk (test)
SSIM0.51
4
Listener Head GenerationViCo and RealTalk (test)
Similarity to GT1.28
4
Showing 8 of 8 rows

Other info

Follow for update