| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Cross-Audio Talking Head Generation | HDTF, CelebV-HQ, and CelebV-Text 100 cross-audio pairs | FID5.89 | 8 | |
| Talking Head Reconstruction | HDTF, CelebV-HQ, and CelebV-Text 100 randomly sampled reconstruction videos | FID4.43 | 8 | |
| Lip-audio synchronization | HDTF, CelebV-HQ, and CelebV-Text | FPS109.41 | 8 |