| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Speech Recognition | AISHELL-1 (dev) | WER1.2 | 28 | |
| Automatic Speech Recognition | AISHELL-1 | WER0.6 | 22 | |
| Speaker Diarization | AISHELL-4 | DER (%)1.6 | 20 | |
| Automatic Speech Recognition | AISHELL (test) | CER1.95 | 20 | |
| Speaker-Attributed Automatic Speech Recognition | AISHELL-4 (test) | CER0.1543 | 18 | |
| Speech Synthesis | AISHELL3 Mandarin | UTMOS2.7 | 14 | |
| Automatic Speech Recognition | SLR93 (AISHELL-3) (test) | CER4.55 | 10 | |
| Automatic Speech Recognition | AISHELL-2 (ios) | CER2.33 | 10 | |
| Automatic Speech Recognition | AISHELL Mandarin 3 | CER1.86 | 9 | |
| Automatic Speech Recognition | AISHELL D 2021 (test) | CER1.66 | 7 | |
| Automatic Speech Recognition | AISHELL C 2021 (Eval) | CER1.71 | 7 | |
| Automatic Speech Recognition | AISHELL Eval A 2021 | CER3.45 | 7 | |
| Automatic Speech Recognition | AISHELL-2 ios (dev) | CER2.08 | 7 | |
| Automatic Speech Recognition | AISHELL-2 | Word Error Rate (WER)2.16 | 7 | |
| Automatic Speech Recognition | AISHELL-1 1.0 (test) | CER (Offline, Rescoring)5.25 | 7 | |
| ASR Error Correction | AISHELL-1 (dev) | WER3.8 | 6 | |
| Target Speaker Extraction | AISHELL Noisy zero-shot | SI-SDR10.2 | 5 | |
| Target Speaker Extraction | AISHELL zero-shot Clean | SI-SDR13.4 | 5 | |
| Text-to-Speech | AISHELL-1 | Error Rate1.9 | 4 | |
| Automatic Speech Recognition | AISHELL-1 | Error Rate2.5 | 4 | |
| Contextual Automatic Speech Recognition | AISHELL-1-NE (test) | CER0.92 | 4 | |
| Speech Watermarking | AiShell3 (OOD) | GN+Ec99.33 | 4 | |
| Speech Watermarking | AiShell3 (out-of-distribution) | Robustness (Gaussian Noise 5 dB)98.68 | 4 | |
| Automatic Speech Recognition | AISHELL | CER0.54 | 4 | |
| Automatic Speech Recognition | AISHELL-3 | Error Rate9.2 | 3 |