| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Audio Understanding | MMSU | Perception Score55.7 | 37 | |
| Audio Question-Answering | MMSU | Score77.7 | 23 | |
| Multi-task Language Understanding | MMSU | Accuracy71.6 | 23 | |
| Multimodal Speech Understanding | MMSU All (test) | Accuracy75.2 | 20 | |
| Multimodal Speech Understanding | MMSU Paralinguistic subset (test) | Accuracy65.18 | 20 | |
| Speech Understanding | MMSU | Accuracy81.3 | 16 | |
| General Audio Understanding | MMSU 1.0 (test) | Perception Semantics72.13 | 16 | |
| Audio Understanding | MMSU (test) | Overall Score66.64 | 15 | |
| Multimodal Understanding | MMSU | MMSU Score79.36 | 14 | |
| Paralinguistic Perception | MMSU Paralinguistic | Para. Score54.51 | 12 | |
| Multi-task Knowledge | MMSU | Accuracy67.1 | 11 | |
| Knowledge | MMSU (test) | Performance77 | 11 | |
| Audio Understanding & Reasoning | MMSU | Score83.7 | 9 | |
| Speech Reasoning | MMSU S→T only | Accuracy43.2 | 9 | |
| Audio-conditioned reasoning | MMSU | Acc57.63 | 8 | |
| Audio Reasoning | MMSU | Accuracy (Audio Reasoning)70.7 | 7 |