| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Speech Reconstruction | LibriTTS clean (test) | PESQ4.644 | 63 | |
| Speech Reconstruction | LibriTTS (test-other) | UTMOS3.91 | 44 | |
| Text-to-Speech | LibriTTS clean (test) | WER0.018 | 30 | |
| Speech Synthesis | LibriTTS (ID) | PESQ4.5 | 20 | |
| Neural Vocoding | LibriTTS (test) | PESQ4.269 | 18 | |
| Audio Generation | LibriTTS (dev) | M-STFT1.3647 | 18 | |
| Speech Synthesis | LibriTTS (test) | MOS4.9134 | 17 | |
| Text-to-Speech | LibriTTS (test) | MOS4.54 | 16 | |
| Text-to-Speech | LibriTTS zero-shot | UTMOS4.3026 | 14 | |
| Waveform Generation | LibriTTS 24,000 Hz (test) | UTMOS3.7229 | 13 | |
| Zero-shot Text-to-Speech | LibriTTS (test) | SECS0.765 | 12 | |
| Waveform Generation | LibriTTS (dev) | M-STFT1.2129 | 12 | |
| Neural Vocoding | LibriTTS | UTMOS4.058 | 12 | |
| Speech Synthesis | LibriTTS (dev) | M-STFT1.086 | 11 | |
| Voice Conversion | LibriTTS (test-clean) | WER2.04 | 11 | |
| Speech Synthesis | LibriTTS 24,000 Hz (test) | MOS4.28 | 11 | |
| Waveform Generation | LibriTTS-R clean (test) | Speech BERT Score100 | 10 | |
| Audio Reconstruction | LibriTTS clean (test) | Mel Distance0.3442 | 10 | |
| Vocoding | LibriTTS (dev-other) | MAE0.0986 | 10 | |
| Neural Vocoding | LibriTTS clean (dev) | MAE0.0931 | 10 | |
| Audio Watermarking | LibriTTS | PESQ4.3289 | 8 | |
| Generative Speech Watermarking | LibriTTS OOD (test) | STOI0.9789 | 8 | |
| Speaker Erasure | LibriTTS 1-speaker setting (forget test) | WER2.57 | 7 | |
| Speaker Erasure | LibriTTS 1-speaker setting (retain test) | WER2.47 | 7 | |
| Accented Speech Synthesis | LibriTTS-R (train-clean-100) | US Probability73.8 | 7 |