| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Audio Super-Resolution | VCTK In-domain | LSD0.6 | 34 | |
| Speech Decompression | VCTK (test) | Log Spectral Distance1.01 | 28 | |
| Phonetic Transcription | VCTK++ (test) | F1 Score93 | 25 | |
| Voice Conversion | VCTK | WER0 | 21 | |
| Text-to-Speech | VCTK | WER1.7 | 19 | |
| Speech Enhancement | VCTK Accelerometer 12-bit, 4-16 kHz upsampling (test) | LSD0.87 | 18 | |
| Speech Enhancement | VCTK Vibration sensor 12-bit, 4-16 kHz upsampling (test) | LSD (Log-Spectral Distance)0.84 | 18 | |
| Speech Super-resolution | VCTK 0.92 (test) | LSD0.7 | 16 | |
| Speech Bandwidth Extension | VCTK English | NISQA-MOS4.53 | 15 | |
| Automatic Speech Recognition | VCTK (test) | WER3.47 | 15 | |
| Audio Super-resolution | VCTK Multi-speaker (test) | SNR20 | 15 | |
| Audio Super-resolution | VCTK Single-speaker (test) | SNR19.5 | 15 | |
| Audio-to-Text Retrieval | VCTK A→T | Recall@196.1 | 15 | |
| Pitch Shift | VCTK (10% unseen utterances) | MOS4.05 | 15 | |
| Time-scale modification | VCTK (10% unseen utterances) | MOS3.98 | 15 | |
| Speech Quality Evaluation | VCTK 48 kHz (test) | STOI0.895 | 12 | |
| Speech Coding | VCTK 48 kHz (test) | RTF (CPU)0.142 | 12 | |
| Speech Bandwidth Extension | VCTK noisy (test) | NISQA-MOS3.89 | 12 | |
| Audio Super-Resolution | VCTK 24 kHz (test) | LSD0.74 | 11 | |
| Speech Super-resolution | VCTK 16 kHz target sampling rate 0.92 (test) | LSD0.78 | 11 | |
| Bandwidth Extension (BWE) | VCTK Google Pixel7 | LSD0.84 | 10 | |
| Bandwidth Extension (BWE) | VCTK Desktop | LSD0.82 | 10 | |
| Bandwidth Extension | VCTK 8 kHz to 44.1 kHz (test) | VISQOL4.73 | 10 | |
| Neural Vocoding | VCTK 100 audio clips (unseen) | MAE0.0925 | 10 | |
| Speaker-ID | VCTK (test) | Accuracy99.3 | 10 |