| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Visual Speech Recognition | LRS3 (test) | WER0.77 | 209 | |
| Visual Speech Recognition | LRS3 High-Resource, 433h labelled v1 (test) | WER0.009 | 80 | |
| Audio-Visual Speech Recognition | LRS3 (test) | WER0.68 | 77 | |
| Audio-Visual Speech Recognition | LRS3 clean (test) | WER0.72 | 77 | |
| Visual Speech Recognition | LRS3 | WER0.009 | 63 | |
| Automatic Speech Recognition | LRS3 (test) | WER (%)0.79 | 58 | |
| Visual Speech Recognition | LRS3 Low-Resource 30h labelled v1 (test) | WER0.024 | 34 | |
| Audio-Visual Speech Separation | LRS3 (test) | SDRi18.9 | 29 | |
| Visual Speech Recognition | LRS3 30h labeled low-resource (test) | WER25.3 | 28 | |
| Automatic Speech Recognition | LRS3 30h labeled low-resource (test) | WER1.5 | 26 | |
| English transcription | LRS3 Noisy 0-SNR (test) | WER0.046 | 25 | |
| Speech Recognition | LRS3-TED | WER7.2 | 25 | |
| Audio-Visual Speech Recognition | LRS3 30h labeled low-resource (test) | WER1.8 | 22 | |
| Automatic Speech Recognition | LRS3 Clean original (test) | WER0.68 | 21 | |
| Visual Speech Recognition | LRS3 low-resource (test) | WER19.3 | 20 | |
| Automatic Speech Recognition | LRS3 433-hour labeled (test) | WER (%)1.3 | 19 | |
| Lip Reading | LRS3 1.0 (test) | WER25.51 | 19 | |
| Speech Recognition | LRS3 high-resource | WER (V)17.6 | 18 | |
| Speech Recognition | LRS3 low-resource | WER (V)23.7 | 18 | |
| Audio Speech Recognition | LRS3 | WER0.7 | 18 | |
| Automatic Speech Recognition | LRS3 low-resource (test) | WER0.016 | 18 | |
| Automatic Lip-Reading | LRS3 v1 (dev) | WER16.92 | 18 | |
| Speech Enhancement | LRS3 mixed with QUT city-street noises (test) | PESQ3.21 | 18 | |
| Speech Enhancement | LRS3 mixed with VGGSound noises (test) | PESQ3.25 | 18 | |
| Automatic Speech Recognition | LRS3 High-Resource 433h labelled v1 (test) | WER0.012 | 16 |