| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Visual Speech Recognition | LRS3 (test) | WER0.77 | 159 | |
| Visual Speech Recognition | LRS3 High-Resource, 433h labelled v1 (test) | WER0.009 | 80 | |
| Audio-Visual Speech Recognition | LRS3 clean (test) | WER0.72 | 70 | |
| Visual Speech Recognition | LRS3 | WER0.009 | 59 | |
| Automatic Speech Recognition | LRS3 (test) | WER (%)0.79 | 46 | |
| Visual Speech Recognition | LRS3 Low-Resource 30h labelled v1 (test) | WER0.024 | 34 | |
| English transcription | LRS3 Noisy 0-SNR (test) | WER0.046 | 25 | |
| Speech Recognition | LRS3-TED | WER7.2 | 25 | |
| Automatic Speech Recognition | LRS3 Clean original (test) | WER0.68 | 21 | |
| Visual Speech Recognition | LRS3 low-resource (test) | WER19.3 | 20 | |
| Audio-Visual Speech Separation | LRS3 (test) | SDRi18.5 | 20 | |
| Automatic Speech Recognition | LRS3 433-hour labeled (test) | WER (%)1.3 | 19 | |
| Lip Reading | LRS3 1.0 (test) | WER25.51 | 19 | |
| Speech Recognition | LRS3 high-resource | WER (V)17.6 | 18 | |
| Speech Recognition | LRS3 low-resource | WER (V)23.7 | 18 | |
| Audio-Visual Speech Recognition | LRS3 (test) | WER0.9 | 18 | |
| Automatic Speech Recognition | LRS3 low-resource (test) | WER0.016 | 18 | |
| Automatic Lip-Reading | LRS3 v1 (dev) | WER16.92 | 18 | |
| Speech Enhancement | LRS3 mixed with QUT city-street noises (test) | PESQ3.21 | 18 | |
| Speech Enhancement | LRS3 mixed with VGGSound noises (test) | PESQ3.25 | 18 | |
| Automatic Speech Recognition | LRS3 High-Resource 433h labelled v1 (test) | WER0.012 | 16 | |
| Visual Speech Recognition | LRS3 high-resource (test) | WER23.1 | 16 | |
| Automatic Speech Recognition | LRS3 Low-Resource 30h labelled v1 (test) | WER (%)2.3 | 15 | |
| Audio Speech Recognition | LRS3 | WER0.7 | 14 | |
| Speech Separation | LRS3-2Mix (test) | SDRi17.5 | 11 |