| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Phoneme Recognition | TIMIT (test) | PER8.3 | 31 | |
| Speech Enhancement | TIMIT Baby-cry noise | PESQ2.277 | 24 | |
| Speech Enhancement | TIMIT Cafeteria noise | PESQ2.458 | 24 | |
| Speech Enhancement | TIMIT Crowd-party noise | PESQ2.447 | 24 | |
| Speech Enhancement | TIMIT Helicopter noise | PESQ2.677 | 24 | |
| Phone recognition | TIMIT (test) | Frame Error Rate17.3 | 23 | |
| Phoneme Recognition | TIMIT (dev) | PER7.4 | 20 | |
| Phoneme Recognition | TIMIT core (test) | PER10.3 | 20 | |
| Audio Classification | TIMIT 3 (test) | Average Top-1 Acc95.22 | 18 | |
| Log-magnitude STFT prediction | TIMIT 8kHz (val) | MSE14.41 | 15 | |
| Spoken Term Detection | TIMIT OOV | MTWV @ -5dB SNR0.07 | 14 | |
| Spoken Term Detection | TIMIT (IV) | MTWV (-5dB)0.03 | 14 | |
| Speech prediction | TIMIT (test) | MSE2.76 | 13 | |
| Speech prediction | TIMIT (val) | MSE2.86 | 13 | |
| Phone recognition | TIMIT (dev) | Frame Error Rate28.5 | 12 | |
| Log-magnitude STFT prediction | TIMIT 8kHz (evaluation) | MSE14.45 | 11 | |
| Automatic Speech Recognition | TIMIT (test) | Accuracy85.3 | 10 | |
| Speech Recognition | TIMIT (test) | PER0.209 | 7 | |
| Voice conversion | TIMIT OOD | F0 Correlation0.484 | 6 | |
| Phoneme Recognition | TIMIT core (dev) | PER9.1 | 6 | |
| Online Speech Recognition | TIMIT (test) | PER0.196 | 6 | |
| Phone boundary detection | TIMIT non-speech removed (test) | Precision85.31 | 4 | |
| Speech Recognition | TIMIT | Accuracy68.9 | 4 | |
| Speaker Identification | TIMIT 462 speakers (test) | CER85 | 4 | |
| Articulatory Feature Detection | TIMIT (test) | Anterior Feature Accuracy0.94 | 4 |