| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| LibriTTS (test) | BigVGAN | PESQ4.269 | 18 | 1mo ago | |
| LJSpeech 1.1 (test) | BigVGAN | M-STFT0.9 | 12 | 1mo ago | |
| LJSpeech 88 (test) | BigVGAN | M-STFT0.9 | 12 | 1mo ago | |
| LibriTTS | UTMOS4.058 | 12 | 1mo ago | ||
| VCTK 100 audio clips (unseen) | BigVGAN | MAE0.0925 | 10 | 1mo ago | |
| LibriTTS clean (dev) | BigVGAN | MAE0.0931 | 10 | 1mo ago | |
| VCTK English Corpus with Unseen Speakers (out-of-domain) | UTMOS4.117 | 9 | 1mo ago | ||
| EARS (out-of-domain) | UTMOS3.3 | 9 | 1mo ago | ||
| LJSpeech | DiffWave | MOS4.49 | 9 | 1mo ago | |
| LJSpeech (Long Audio) | MOS4.73 | 8 | 1mo ago | ||
| LJSpeech Short Audio | MOS3.67 | 8 | 1mo ago | ||
| VCTK (unseen speakers) | MOS4.37 | 8 | 1mo ago | ||
| LJSpeech and VCTK | MOS4.6 | 6 | 1mo ago | ||
| Inference Speed Benchmark batch size 16, 1s samples | BigVGAN | xRT (GPU)98.61 | 5 | 1mo ago | |
| MUSDB18 (out-of-distribution) | Vocos | Mixture Score4.61 | 4 | 1mo ago | |
| Deeply Korean | SMOS4.847 | 3 | 1mo ago | ||
| LJSpeech (test) | RNDVoC-Lite | PESQ3.769 | 3 | 1mo ago |