| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Video Speech Recognition | Multilingual TEDx-French (MTfr) (test) | Mean WER67 | 4 | |
| Visual Speech Recognition | Multilingual TEDx-Spanish (MTes) (test) | Mean WER56.6 | 4 | |
| Visual Speech Recognition | Multilingual TEDx-Portuguese (MTpt) (test) | Mean Accuracy70.2 | 4 | |
| Visual Speech Recognition | Multilingual TEDx Italian (MTit) (test) | Mean WER57.9 | 4 |