| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Text-to-Audio-Video Generation | Verse-Bench | MS0.58 | 16 | |
| Audio-Visual Generation | Verse-Bench (All subsets) | IS (Score)4.269 | 7 | |
| Audio-Visual Generation | Verse-Bench multi-speaker | cpCER14.9 | 6 | |
| Audio-Visual Generation | Verse-Bench (set3) | DNSMOS3.797 | 6 | |
| Talking Head Generation | Verse-Bench | LSE-C1.62 | 5 |