| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Phoneme Recognition | TIMIT (test) | PER8.3 | 31 | |
| Speech Enhancement | TIMIT Baby-cry noise | PESQ2.277 | 24 | |
| Speech Enhancement | TIMIT Cafeteria noise | PESQ2.458 | 24 | |
| Speech Enhancement | TIMIT Crowd-party noise | PESQ2.447 | 24 | |
| Speech Enhancement | TIMIT Helicopter noise | PESQ2.677 | 24 | |
| Phone recognition | TIMIT (test) | Frame Error Rate17.3 | 23 | |
| Phoneme Recognition | TIMIT (dev) | PER7.4 | 20 | |
| Phoneme Recognition | TIMIT core (test) | PER10.3 | 20 | |
| Audio Classification | TIMIT 3 (test) | Average Top-1 Acc95.22 | 18 | |
| Log-magnitude STFT prediction | TIMIT 8kHz (val) | MSE14.41 | 15 | |
| Spoken Term Detection | TIMIT OOV | MTWV @ -5dB SNR0.07 | 14 | |
| Spoken Term Detection | TIMIT (IV) | MTWV (-5dB)0.03 | 14 | |
| Speech prediction | TIMIT (test) | MSE2.76 | 13 | |
| Speech prediction | TIMIT (val) | MSE2.86 | 13 | |
| Phone recognition | TIMIT (dev) | Frame Error Rate28.5 | 12 | |
| Log-magnitude STFT prediction | TIMIT 8kHz (evaluation) | MSE14.45 | 11 | |
| Bandwidth Extension | TIMIT 8 kHz to 16 kHz (test) | VISQOL4.51 | 10 | |
| Automatic Speech Recognition | TIMIT (test) | Accuracy85.3 | 10 | |
| Speech Recognition | TIMIT (test) | PER0.209 | 7 | |
| Voice conversion | TIMIT OOD | F0 Correlation0.484 | 6 | |
| Phoneme Recognition | TIMIT core (dev) | PER9.1 | 6 | |
| Online Speech Recognition | TIMIT (test) | PER0.196 | 6 | |
| Speaker Verification | TIMIT Official (test) | EER (Speaker)3.25 | 5 | |
| Distribution Fitting | TIMIT | FAKS0283.57 | 4 | |
| Phone boundary detection | TIMIT non-speech removed (test) | Precision85.31 | 4 |