| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Speech Recognition | CommonVoice | WER25.49 | 40 | |
| ASR error correction | CommonVoice (CV) (test) | WER5.8 | 18 | |
| Multilingual Automatic Speech Recognition | CommonVoice | WER (de)4.42 | 13 | |
| Automatic Speech Recognition | CommonVoice segmented (test) | WER11.4 | 10 | |
| LID | CommonVoice | Accuracy (eng)99.4 | 8 | |
| Automatic Speech Recognition | CommonVoice (CV) v17 | WER8.8 | 8 | |
| Phoneme recognition | CommonVoice (test) | Phoneme Error Rate (es)2 | 7 | |
| Automatic Speech Recognition | CommonVoice | CER (English)9.9 | 6 | |
| Speech Recognition | German CommonVoice 6.1 (test) | WER3.64 | 6 | |
| Automatic Speech Recognition | CommonVoice CN | WER7.29 | 5 | |
| Automatic Speech Recognition | CommonVoice 13 languages (test) | WER9.18 | 5 | |
| Automatic Speech Recognition | CommonVoice (test) | WER8.36 | 4 | |
| Automatic Speech Recognition | CommonVoice (dev) | WER7.2 | 3 | |
| Text-to-Speech | CommonVoice JA | UTMOS2.65 | 3 | |
| Language Identification | CommonVoice (test) | Accuracy98.7 | 3 | |
| In-context Text-to-Speech | CommonVoice English v13.0 (test) | Sim-o0.674 | 3 | |
| Text-to-Speech | CommonVoice English v13.0 (test) | Metric- | 0 |