| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| General Audio Understanding | VoiceBench | AlpacaEval Score4.78 | 19 | |
| Speech-to-Text | VoiceBench | AlpacaEval Score4.78 | 15 | |
| Speech-to-text reasoning and semantic understanding | VoiceBench (test) | Alpaca Eval4.8 | 13 | |
| Reasoning | VoiceBench | MMSU Accuracy (Audio)72.9 | 13 | |
| Voice Evaluation | VoiceBench | Overall Score (VoiceBench)89.6 | 10 | |
| Audio Instruction Following | VoiceBench | AlpacaEval Score4.78 | 10 | |
| Spoken Question Answering | VoiceBench | Accuracy76.79 | 9 | |
| General capability evaluation | Voicebench | HS Score76.91 | 8 | |
| Empathetic Speech Generation | VoiceBench CommonEval | Empathy Score4.22 | 7 | |
| Speech-to-Text Spoken Question Answering | VoiceBench S2T (test) | AlpacaEval4.8 | 7 | |
| Voice Chatting | VoiceBench | AlpacaEval4.94 | 7 | |
| Conversational Intelligence | VoiceBench | AlpacaEval4.57 | 6 | |
| General Conversation | VoiceBench | AlpacaEval Score4.19 | 5 | |
| Spoken Question Answering | VoiceBench S2T | AlpacaEval4.94 | 4 | |
| Speech Question Answering | VoiceBench AlpacaEval | AlpacaEval Score4.81 | 3 | |
| Voice Interaction | VoiceBench | VoiceBench Average Score89.4 | 3 | |
| Dialogue | VoiceBench | Score93.1 | 3 |