Responsive Listening Head Generation: A Benchmark Dataset and Baseline
About
We present a new listening head generation benchmark, for synthesizing responsive feedbacks of a listener (e.g., nod, smile) during a face-to-face conversation. As the indispensable complement to talking heads generation, listening head generation has seldomly been studied in literature. Automatically synthesizing listening behavior that actively responds to a talking head, is critical to applications such as digital human, virtual agents and social robots. In this work, we propose a novel dataset "ViCo", highlighting the listening head generation during a face-to-face conversation. A total number of 92 identities (67 speakers and 76 listeners) are involved in ViCo, featuring 483 clips in a paired "speaking-listening" pattern, where listeners show three listening styles based on their attitudes: positive, neutral, negative. Different from traditional speech-to-gesture or talking-head generation, listening head generation takes as input both the audio and visual signals from the speaker, and gives non-verbal feedbacks (e.g., head motions, facial expressions) in a real-time manner. Our dataset supports a wide range of applications such as human-to-human interaction, video-to-video translation, cross-modal understanding and generation. To encourage further research, we also release a listening head generation baseline, conditioning on different listening attitudes. Code & ViCo dataset: https://project.mhzhou.com/vico.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Listener Facial Motion Generation | ViCo (test) | FD Expression39.02 | 7 | |
| Listening Head Generation | ViCo (test) | FD (Exp)39.02 | 6 | |
| Listener Head Generation | ViCo (D_test) | FD (Expression)15.03 | 5 | |
| Listener Head Generation | ViCo (test) | SSIM0.56 | 5 | |
| Listener Head Generation | ViCo out-of-domain (D_ood) | FD (exp)22.81 | 5 | |
| Listening Head Generation | RealTalk (Dtest) | FD-exp20.11 | 4 | |
| Image Quality Assessment | RealTalk (test) | SSIM0.51 | 4 | |
| Listener Head Generation | ViCo and RealTalk (test) | Similarity to GT1.28 | 4 |