| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Audio Watermarking | LJSpeech | PESQ4.5486 | 88 | |
| Audio Reconstruction | LJSpeech | UTMOS4.3794 | 26 | |
| Text-to-Speech | LJSpeech (test) | CMOS0.934 | 20 | |
| Speech Watermarking | LJSpeech 2017 | STOI0.9996 | 17 | |
| Speech Watermarking | LJSpeech (In-Distribution) | MP3 (16 kbps) Acc0.9984 | 13 | |
| Speech Watermarking | LJSpeech (in-distribution) | Gaussian Noise (5 dB) Score0.9986 | 13 | |
| Waveform Generation | LJSpeech | UTMOS4.3894 | 12 | |
| Speech Synthesis | LJSpeech | MOS4.45 | 12 | |
| Audio Synthesis | LJSpeech (unseen) | MAE0.1102 | 10 | |
| Neural Vocoding | LJSpeech | MOS4.49 | 9 | |
| Waveform Generation | LJSpeech (test) | M-STFT0.9369 | 8 | |
| Generative Speech Watermarking | LJSpeech (test) | Inference Time (ms)13.48 | 7 | |
| Voice Conversion | LJSpeech target speaker | WER3.22 | 7 | |
| Speech reconstruction | LJSpeech ID | MCD4.42 | 6 | |
| Audio Synthesis | LJSpeech (test) | GPU Execution Time4.82 | 6 | |
| Speech Synthesis | LJSpeech (test) | RTF0.011 | 6 | |
| Waveform Synthesis | LJSpeech | Training Time (h)17.02 | 4 | |
| Speech Synthesis | LJSpeech 26 (test) | PESQ3.807 | 3 | |
| Text-to-Speech | LJSpeech | MAE0.131 | 3 | |
| Text-to-Speech | LJSpeech low-resource setting | Intelligibility Rate97 | 3 | |
| Audio Synthesis | LJSpeech 44.1kHz (test) | GPU xRT152.58 | 2 | |
| Speech Synthesis | LJSpeech | CMOS-N Score1.07 | 2 | |
| Mel-spectrogram generation | LJSpeech (test) | Speedup269.4 | 1 |