| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Audio Super-Resolution | VCTK In-domain | LSD0.6 | 34 | |
| Speech Decompression | VCTK (test) | Log Spectral Distance1.01 | 28 | |
| Phonetic Transcription | VCTK++ (test) | F1 Score93 | 25 | |
| Voice Conversion | VCTK | WER0 | 21 | |
| Text-to-Speech | VCTK | WER1.7 | 19 | |
| Speech Enhancement | VCTK Accelerometer 12-bit, 4-16 kHz upsampling (test) | LSD0.87 | 18 | |
| Speech Enhancement | VCTK Vibration sensor 12-bit, 4-16 kHz upsampling (test) | LSD (Log-Spectral Distance)0.84 | 18 | |
| Speech Super-resolution | VCTK 0.92 (test) | LSD0.7 | 16 | |
| Automatic Speech Recognition | VCTK (test) | WER3.47 | 15 | |
| Audio Super-resolution | VCTK Multi-speaker (test) | SNR20 | 15 | |
| Audio Super-resolution | VCTK Single-speaker (test) | SNR19.5 | 15 | |
| Audio-to-Text Retrieval | VCTK A→T | Recall@196.1 | 15 | |
| Pitch Shift | VCTK (10% unseen utterances) | MOS4.05 | 15 | |
| Time-scale modification | VCTK (10% unseen utterances) | MOS3.98 | 15 | |
| Speech Super-resolution | VCTK 16 kHz target sampling rate 0.92 (test) | LSD0.78 | 11 | |
| Bandwidth Extension (BWE) | VCTK Google Pixel7 | LSD0.84 | 10 | |
| Bandwidth Extension (BWE) | VCTK Desktop | LSD0.82 | 10 | |
| Bandwidth Extension | VCTK 8 kHz to 44.1 kHz (test) | VISQOL4.73 | 10 | |
| Neural Vocoding | VCTK 100 audio clips (unseen) | MAE0.0925 | 10 | |
| Speaker-ID | VCTK (test) | Accuracy99.3 | 10 | |
| Neural Vocoding | VCTK English Corpus with Unseen Speakers (out-of-domain) | UTMOS4.117 | 9 | |
| Voice Conversion | VCTK (test) | nMOS4.26 | 9 | |
| Speech Synthesis | VCTK (OD) | PESQ4.5 | 9 | |
| Text-to-Speech | VCTK (test) | MOS4.4 | 8 | |
| Neural Vocoding | VCTK (unseen speakers) | MOS4.37 | 8 |