| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Multimodal Understanding | MMBench | Accuracy90.6 | 847 | |
| Multimodal Understanding | MMBench CN | Accuracy88.5 | 254 | |
| Multimodal Model Evaluation | MMBench | Accuracy87.8 | 204 | |
| Multimodal Understanding | MMBench (MMB) | Accuracy86.3 | 166 | |
| Multimodal Model Evaluation | MMBench Chinese | Accuracy82.6 | 154 | |
| Multimodal Benchmarking | MMBench-CN | Score92.39 | 151 | |
| Vision Understanding | MMBench | Accuracy85 | 141 | |
| Multimodal Reasoning | MMBench | Accuracy90.63 | 127 | |
| Multimodal Reasoning | MMBench EN V1.1 | Accuracy80.68 | 125 | |
| Multimodal Benchmarking | MMBench English | Accuracy70.4 | 125 | |
| Multimodal Evaluation | MMBench CN | Accuracy82.37 | 120 | |
| Multimodal Evaluation | MMBench | MMB Score79.7 | 118 | |
| Multimodal Reasoning | MMBench CN | Accuracy82 | 113 | |
| Multi-modal Understanding | MMBench EN | Accuracy93.53 | 105 | |
| Multimodal Benchmark | MMBench (MMB) | Accuracy81.8 | 95 | |
| Multimodal Benchmarking | MMBench | Accuracy84.4 | 90 | |
| Visual Question Answering | MMBench (MMB) | Accuracy92.1 | 86 | |
| Multimodal Understanding | MMBench Chinese | MMB Benchmark (CN)89.5 | 86 | |
| Multi-modal Question Answering | MMBench | Accuracy86.4 | 84 | |
| Multimodal Understanding | MMBench English | Accuracy88.79 | 81 | |
| Multimodal Benchmarking | MMBench | Score83.4 | 73 | |
| Visual Question Answering | MMBench-CN | Accuracy93.13 | 72 | |
| GUI Grounding | MMBench-GUI L2 (test) | Average Error2.9 | 67 | |
| Multimodal Understanding | MMBench (test) | Accuracy84.2 | 67 | |
| Vision-Language Understanding | MMBench | Accuracy88.7 | 64 |