| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Multimodal Understanding | MME-RealWorld Lite | Overall Score67.3 | 34 | |
| Real-world Understanding | MME-RealWorld EN | Score64 | 20 | |
| Multimodal Understanding | MME-RealWorld Chinese | Overall Score64.53 | 19 | |
| General Visual Reasoning | MME-RealWorld-Lite | Accuracy73.06 | 17 | |
| Multimodal Understanding | MME-RealWorld English | Overall Score63.7 | 9 | |
| Fine-grained Perception | MME-RealWorld Lite | Score51.49 | 6 | |
| General Visual Question Answering | MME-RealWorld en | Score63.2 | 6 | |
| Reasoning | MME-RealWorld Lite (test) | OCR72 | 3 | |
| Perception | MME-RealWorld Lite (test) | OCR83.6 | 3 | |
| Multimodal Evaluation | MME-RealWorld zero-shot | Zero-shot Accuracy48.03 | 2 |