| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Audio Watermarking | LJSpeech | PESQ4.5486 | 88 | |
| Audio Reconstruction | LJSpeech | UTMOS4.3794 | 26 | |
| Text-to-Speech | LJSpeech (test) | CMOS0.934 | 20 | |
| Speech Watermarking | LJSpeech 2017 | STOI0.9996 | 17 | |
| Speech Watermarking | LJSpeech (In-Distribution) | MP3 (16 kbps) Acc0.9984 | 13 | |
| Speech Watermarking | LJSpeech (in-distribution) | Gaussian Noise (5 dB) Score0.9986 | 13 | |
| Neural Vocoding | LJSpeech 1.1 (test) | M-STFT0.9 | 12 | |
| Neural Vocoding | LJSpeech 88 (test) | M-STFT0.9 | 12 | |
| Waveform Generation | LJSpeech | UTMOS4.3894 | 12 | |
| Speech Synthesis | LJSpeech | MOS4.45 | 12 | |
| Lossless Data Compression | LJSpeech | Compression Ratio1.88 | 11 | |
| Audio Synthesis | LJSpeech (unseen) | MAE0.1102 | 10 | |
| Audio Generation | LJSpeech Short-Term (test) | FAD0.911 | 9 | |
| Neural Vocoding | LJSpeech | MOS4.49 | 9 | |
| Neural Vocoding | LJSpeech (Long Audio) | MOS4.73 | 8 | |
| Neural Vocoding | LJSpeech Short Audio | MOS3.67 | 8 | |
| Waveform Generation | LJSpeech (test) | M-STFT0.9369 | 8 | |
| Generative Speech Watermarking | LJSpeech (test) | Inference Time (ms)13.48 | 7 | |
| Voice Conversion | LJSpeech target speaker | WER3.22 | 7 | |
| Text-to-Speech | LJSpeech | WER3.37 | 6 | |
| Speech reconstruction | LJSpeech ID | MCD4.42 | 6 | |
| Audio Synthesis | LJSpeech (test) | GPU Execution Time4.82 | 6 | |
| Speech Synthesis | LJSpeech (test) | RTF0.011 | 6 | |
| Lossless Audio Compression | LJSpeech 16-bit | Compression Rate2.08 | 5 | |
| Speech Synthesis | LJSpeech | PESQ4.235 | 5 |