| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Multimodal Understanding | MMBench | Accuracy88.8 | 367 | |
| Multimodal Model Evaluation | MMBench | Accuracy87.8 | 180 | |
| Multimodal Understanding | MMBench CN | Accuracy88.5 | 162 | |
| Multimodal Model Evaluation | MMBench Chinese | Accuracy82.6 | 121 | |
| Multimodal Evaluation | MMBench | MMB Score79.7 | 118 | |
| Vision Understanding | MMBench | Accuracy85 | 104 | |
| Multimodal Benchmarking | MMBench-CN | Score92.39 | 73 | |
| Multimodal Benchmark | MMBench (MMB) | Accuracy81.8 | 70 | |
| Multimodal Understanding | MMBench Chinese | MMB Benchmark (CN)89.5 | 70 | |
| Multimodal Understanding | MMBench (MMB) | Accuracy86.3 | 69 | |
| Multimodal Understanding | MMBench (test) | Overall Score81.2 | 65 | |
| Multimodal Benchmarking | MMBench | Score83.4 | 62 | |
| Multimodal Benchmarking | MMBench English | Accuracy70.4 | 61 | |
| Multimodal Understanding | MMBench (dev) | Accuracy80.41 | 58 | |
| Multimodal Evaluation | MMBench CN | Accuracy74.3 | 57 | |
| Multimodal Understanding | MMBench English | MMB Score90.8 | 55 | |
| Multimodal Reasoning | MMBench | Accuracy87 | 50 | |
| Multimodal Understanding (Chinese) | MMBench Chinese | Accuracy91 | 47 | |
| Multimodal Reasoning | MMBench (dev) | Accuracy87.6 | 47 | |
| GUI Grounding | MMBench-GUI L2 (test) | Error (Windows, Basic)1.5 | 46 | |
| Multi-modal Benchmark | MMBench | Accuracy83.3 | 40 | |
| Visual Question Answering | MMBench-CN | Accuracy93.13 | 40 | |
| Multi-modal Understanding | MMBench (dev) | Overall Score80.6 | 40 | |
| Multi-modal Understanding | MMBench EN | Overall Score86.3 | 39 | |
| Multimodal Understanding | MMBench en (dev) | Score84.2 | 38 |