Perceptual Conversational Head Generation with Regularized Driver and Enhanced Renderer
About
This paper reports our solution for ACM Multimedia ViCo 2022 Conversational Head Generation Challenge, which aims to generate vivid face-to-face conversation videos based on audio and reference images. Our solution focuses on training a generalized audio-to-head driver using regularization and assembling a high-visual quality renderer. We carefully tweak the audio-to-behavior model and post-process the generated video using our foreground-background fusion module. We get first place in the listening head generation track and second place in the talking head generation track on the official leaderboard. Our code is available at https://github.com/megvii-research/MM2022-ViCoPerceptualHeadGeneration.
Ailin Huang, Zhewei Huang, Shuchang Zhou• 2022
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Listener Head Generation | ViCo (test) | SSIM0.58 | 5 | |
| Listener Head Generation | ViCo out-of-domain (D_ood) | FD (exp)18.63 | 5 | |
| Listener Head Generation | ViCo (D_test) | FD (Expression)19.02 | 5 | |
| Image Quality Assessment | RealTalk (test) | SSIM0.56 | 4 | |
| Listener Head Generation | ViCo and RealTalk (test) | Similarity to GT2.08 | 4 | |
| Listening Head Generation | RealTalk (Dtest) | FD-exp23.07 | 4 |
Showing 6 of 6 rows