| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Automatic Speech Recognition | Eval2000 Fisher-Switchboard 2300-h (test) | WER10.9 | 9 | |
| Spoken Dialogue System (SDS) Semantic Quality Evaluation | Eval2000 (test) | ROUGE-L12.1 | 6 | |
| Audio Quality Evaluation | Eval2000 | UTMOS3.34 | 6 | |
| Speaking Style Consistency | Eval2000 (test) | Emotion Rank4.92 | 5 | |
| Intelligibility Evaluation | Eval2000 | WER1 | 4 | |
| Automatic Speech Recognition | Eval2000 Switchboard (SW) 300-hour (test) | WER12.5 | 4 | |
| 3D Human Pose Estimation | EVAL cross-view | Head0.939 | 2 |