| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Text-to-Speech | Seed-TTS en (test) | WER0.8 | 121 | |
| Text-to-Speech | Seed-TTS zh (test) | WER0.0084 | 87 | |
| Text-to-speech | Seed-TTS (eval) | WER1.85 | 39 | |
| Text-to-Speech | Seed-TTS Seed-EN (test) | WER0.0147 | 32 | |
| Text-to-Speech | Seed-TTS EN | WER1.39 | 32 | |
| Zero-shot Text-to-Speech | Seed-TTS en (test) | WER1.08 | 25 | |
| Speech reconstruction | Seed-TTS English | PESQ4.125 | 17 | |
| Voice Cloning | SEED-TTS EN (test) | WER0.99 | 16 | |
| Text-to-Speech | Seed-TTS Seed-ZH (Evaluation) | CER0.89 | 16 | |
| Text-to-Speech | Seed-TTS English (test) | WER1.47 | 14 | |
| Text-to-Speech | Seed-TTS Hard | CER6.83 | 12 | |
| Text-to-Speech | Seed-TTS ZH | WER1.07 | 12 | |
| Text-to-Speech | Seed-TTS Seed-ZH (test) | WER1.02 | 11 | |
| Text-to-Speech | Seed-TTS 24 kHz (test-zh) | SIM-o0.762 | 11 | |
| Text-to-Speech | Seed-TTS en 24 kHz (test) | SIM-o0.734 | 11 | |
| Text-to-Speech | Seed-TTS-Eval English | WER1.39 | 10 | |
| Zero-shot Speech Generation | Seed-TTS Eval en (test) | WER (%)1.5 | 9 | |
| Voice Conversion | Seed-TTS zh (test) | WER1.33 | 9 | |
| Zero-shot Text-to-Speech | Seed-TTS zh (test) | WER0.84 | 8 | |
| Speaker Disentanglement | seed-tts-eval | WER2.03 | 8 | |
| Voice Cloning | SEED-TTS-Eval ZH (test) | CER1.03 | 8 | |
| Text-to-Speech | Seed-TTS Chinese (test) | ZS CER0.89 | 7 | |
| Text-to-Speech | Seed-TTS hard (test) | WER7.53 | 7 | |
| Voice Conversion | Seed-TTS en (test) | WER1.96 | 7 | |
| Text-to-Speech | SEED-TTS | WER1.2 | 7 |