Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Zero-shot Text-to-Speech on LibriSpeech LS960 (test-clean)
Loading...
1.6
WER (Whisper-large)
Ground truth
1.464
2.382
3.3
4.218
May 28, 2026
WER (Whisper-large)
WER (Conformer-Transducer)
Speech Similarity (SIM)
Updated 5d ago
Evaluation Results
Method
Method
Links
WER (Whisper-large)
WER (Conformer-Transducer)
Speech Similarity (SIM)
Ground truth
Text=-, Speech=-, Freq=-
2026.05
1.6
2.2
0.925
DAC
Text=-, Speech=-, Freq...
2026.05
1.6
2.2
0.922
HiFi-Gan
Text=-, Speech=-, Freq...
2026.05
1.6
2.2
0.903
MELD
Text=BPE, Speech=Mel,...
2026.05
1.9
2.4
0.872
MELD
Text=BPE, Speech=Mel,...
2026.05
1.9
2.5
0.855
Codec-LM
Text=Phn, Speech=DAC,...
2026.05
4.7
5.7
0.872
Codec-LM
Text=BPE, Speech=DAC,...
2026.05
4.8
5.3
0.864
VALL-E
Text=Phn, Speech=Encod...
2026.05
5
-
0.868
Feedback
Search any
task
Search any
task