| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Holistic Evaluation | Combined Suite General Reasoning Perception Text | Text Average76.3 | 13 | |
| LLM Alignment | Combined Suite Setup 3 | Average Percentage Score54.38 | 9 | |
| Overall Performance Evaluation | Combined Suite (MME, MMStar, SQA, RealWorldQA, MMMU, MMMU-P, VisuLogic, LogicVista, CRPE, POPE, HallBench) | Average Score43.94 | 4 | |
| General Language Modeling | Combined Suite (HS, PIQA, SIQA, Wino, MMLU, NQ, TQA, ARC-C, ARC-E, OBQA, BoolQ, DROP, BBH-LB, GSM8K) | Accuracy57.8 | 4 | |
| Knowledge-Preserved Adaptation | Combined Suite TriviaQA, NQ open, WebQS, HumanEval, MBPP | Average Score21.91 | 4 |