| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| General Utility | AirBench 1-10 | Speech Score8.1 | 12 | |
| Toxicity and Harmful Content Detection | Airbench | Score98.59 | 5 | |
| Multiple-choice Question Answering | AirBench Foundational | Total Average Score44 | 4 | |
| Jailbreak Attack Evaluation | AirBench | Attack Success Count1,390 | 2 |