| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Longitudinal Classification | LRS (80/20) | Macro-F157.8 | 24 | |
| Audio-Visual Speech Separation | LRS2 | Parameters (M)0.85 | 18 | |
| Lip-to-speech synthesis | LRS3-TED (test) | UTMOS4.2096 | 7 | |
| Lip-syncing | LRS 3 (test) | LSE-D6.652 | 5 | |
| Video-to-Speech | LRS3 and LRS2 | Win Rate A57 | 4 |