| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Speech-to-Portrait | AVSpeech (test) | L1 Error31.26 | 6 | |
| Visual Acoustic Matching | AVSpeech-Rooms unseen environments (test) | RTE (s)0.071 | 5 | |
| Audio-Visual Speech Recognition | AVSpeech (1,000 manually filtered samples) | WER25 | 4 |