| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Multimodal Understanding | MMB | Accuracy90.6 | 53 | |
| Multimodal Benchmarking | MMB | Average Performance100 | 40 | |
| Multimodal Evaluation | MMB | Score85.31 | 27 | |
| General Vision-Language Understanding | MMB | Score84.6 | 25 | |
| Knowledge | MMB | Accuracy61.98 | 21 | |
| Multi-modality Evaluation | MMB-en (test) | Relative Performance100 | 10 | |
| Visual Question Answering | MMB | Score83.2 | 8 | |
| Image Captioning | MMB | Prism81.34 | 7 | |
| MLLM Evaluation | MMB | Overall Score63.14 | 4 | |
| Multimodal Reasoning | MMB-CN | Accuracy54 | 3 | |
| Multimodal Reasoning | MMB | Accuracy62.8 | 3 | |
| Image Understanding | MMB | Accuracy76.4 | 2 |