Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

One-Shot Free-View Neural Talking-Head Synthesis for Video Conferencing

About

We propose a neural talking-head video synthesis model and demonstrate its application to video conferencing. Our model learns to synthesize a talking-head video using a source image containing the target person's appearance and a driving video that dictates the motion in the output. Our motion is encoded based on a novel keypoint representation, where the identity-specific and motion-related information is decomposed unsupervisedly. Extensive experimental validation shows that our model outperforms competing methods on benchmark datasets. Moreover, our compact keypoint representation enables a video conferencing system that achieves the same visual quality as the commercial H.264 standard while only using one-tenth of the bandwidth. Besides, we show our keypoint representation allows the user to rotate the head during synthesis, which is useful for simulating face-to-face video conferencing experiences.

Ting-Chun Wang, Arun Mallya, Ming-Yu Liu• 2020

Related benchmarks

TaskDatasetResultRank
Face ReenactmentVoxCeleb1 (test)
SSIM0.761
16
Video-driven Talking Head Generation (Self-Reenactment)HDTF
FID22.27
12
Talking head video generationHDTF
FID20.57
8
Talking head video generationTalkinghead1kh
FID30.52
8
Self-ReenactmentHDTF (test)
LPIPS0.2771
8
Cross-identity reenactmentCelebV 30
CSIM79.1
7
Same-identity reconstructionVoxCeleb 1 (test)
L1 Loss0.0445
7
Cross-identity reenactmentHDTF
FVD134.9
6
Talking head synthesisSelf-Collected Dataset 50 identities
FID47.13
6
Talking head synthesisVFHQ (first 100 frames)
FID71.58
6
Showing 10 of 15 rows

Other info

Follow for update