| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Multimodal Understanding | MME-RealWorld Lite | Overall Score67.3 | 34 | |
| Perception | MME-RealWorld Lite | Overall Score59 | 29 | |
| Real-world Multimodal Understanding | MME-RealWorld Lite | Lite Score54.9 | 25 | |
| Multimodal Understanding | MME-RealWorld Chinese | Accuracy64.3 | 25 | |
| Multimodal Understanding | MME-RealWorld English | Accuracy59.9 | 25 | |
| Reasoning | MME-RealWorld Lite | OCR Score84 | 20 | |
| Real-world Understanding | MME-RealWorld EN | Score64 | 20 | |
| Multimodal Question Answering | MME-RealWorld-Lite 1.0 (test) | Perception (AD) Acc57.7 | 19 | |
| General Visual Reasoning | MME-RealWorld-Lite | Accuracy73.06 | 17 | |
| Multimodal Evaluation | MME-RealWorld | Accuracy71.2 | 15 | |
| Fine-grained visual reasoning | MME Realworld Lite | Avg@155.8 | 12 | |
| Reasoning | MME-RealWorld Lite (test) | OCR76 | 12 | |
| Remote Sensing Visual Question Answering | MME-RealWorld-RS | Position Score58.15 | 11 | |
| General Perception and Reasoning | MME-RealWorld Lite | Overall Accuracy54.3 | 11 | |
| Multimodal Evaluation | MME-RealWorld Lite | Score57.8 | 10 | |
| Real-world Visual Question Answering | MME-RealWorld-Lite (MMERW) | Accuracy44.6 | 8 | |
| Fine-grained Perception | MME-RealWorld Lite | Score51.49 | 6 | |
| General Visual Question Answering | MME-RealWorld en | Score63.2 | 6 | |
| Perception | MME-RealWorld Lite (test) | OCR83.6 | 3 | |
| Multimodal Evaluation | MME-RealWorld zero-shot | Zero-shot Accuracy48.03 | 2 |