| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Text-to-Speech | Seed-TTS-Eval (test) | WER1.33 | 32 | |
| Text-to-Speech | Seed-TTS-Eval zh (test) | CER0.93 | 21 | |
| Text-to-Speech | Seed-TTS-Eval Chinese | WER0.87 | 10 | |
| Zero-shot Speech Generation | Seed-TTS Eval zh (test) | CER0.83 | 9 | |
| Speech Synthesis | Seed-TTS-Eval en (test) | WER1.85 | 8 | |
| Voice-cloning intelligibility | Seed-TTS-Eval zh (test) | WER0.54 | 8 | |
| Speech Synthesis | Seed-TTS-Eval zh-hard (test) | CER6.71 | 7 | |
| Audio Tokenization | Seed-TTS-Eval EN | PESQ (NB)3.02 | 7 | |
| Audio Tokenization | Seed-TTS-Eval ZH | PESQ NB3.3 | 7 | |
| Text-to-Speech | Seed-TTS-Eval hard (test) | WER6.83 | 6 | |
| Text-to-Speech | Seed-TTS-Eval EN | UTMOS3.71 | 3 | |
| Zero-shot Voice Conversion | Seed-TTS-Eval en (test) | SMOS3.98 | 2 |