| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Text-to-Speech | Seed-TTS-Eval zh (test) | CER0.93 | 16 | |
| Text-to-Speech | Seed-TTS-Eval Chinese | WER0.87 | 10 | |
| Speech Synthesis | Seed-TTS-Eval en (test) | WER1.85 | 8 | |
| Voice-cloning intelligibility | Seed-TTS-Eval zh (test) | WER0.54 | 8 | |
| Speech Synthesis | Seed-TTS-Eval zh-hard (test) | CER6.71 | 7 | |
| Audio Tokenization | Seed-TTS-Eval EN | PESQ (NB)3.02 | 7 | |
| Audio Tokenization | Seed-TTS-Eval ZH | PESQ NB3.3 | 7 | |
| Text-to-Speech | Seed-TTS-Eval hard (test) | WER6.83 | 6 | |
| Text-to-Speech | Seed-TTS-Eval EN | UTMOS3.71 | 3 | |
| Zero-shot Voice Conversion | Seed-TTS-Eval en (test) | SMOS3.98 | 2 |