| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Speech Emotion Recognition | ESD In-Domain v1 (test) | ACC93.86 | 13 | |
| Object Detection | ESD | AP46.5 | 13 | |
| Text-to-Speech | ESD (test) | MOS4.47 | 11 | |
| Emotional Text-to-Speech | ESD (English) | WER1.411 | 9 | |
| Empathetic Response Generation | ESD | Emotional Reaction1.851 | 8 | |
| Emotion Style Transfer | ESD (test) | UTMOS3.93 | 7 | |
| Emotional Speech Synthesis | ESD English (test) | Score (Neutral)78.39 | 5 | |
| Text-to-Speech | ESD English (test) | WER6.8 | 5 | |
| Speech Emotion Recognition | ESD | UA98.9 | 5 | |
| Instance Segmentation | ESD-1 (test) | Accuracy (2 Objects)95 | 5 | |
| Voice Conversion | ESD | WER0.149 | 4 | |
| Chain Generation | ESD-CoT (test) | B-144.87 | 3 |