| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Perception | MME-RealWorld Lite | Overall Score67 | 46 | |
| Reasoning | MME-RealWorld Lite | OCR Score84 | 37 | |
| General Visual Reasoning | MME-RealWorld-Lite | Accuracy73.06 | 37 | |
| Multimodal Understanding | MME-RealWorld Lite | Overall Score67.3 | 34 | |
| Real-world Multimodal Understanding | MME-RealWorld Lite | Lite Score54.9 | 25 | |
| Multimodal Understanding | MME-RealWorld Chinese | Accuracy64.3 | 25 | |
| Multimodal Understanding | MME-RealWorld English | Accuracy59.9 | 25 | |
| General Perception and Reasoning | MME-RealWorld Lite | Overall Accuracy54.3 | 21 | |
| Real-world Understanding | MME-RealWorld EN | Score64 | 20 | |
| Real-world visual perception | MME-RealWorld CN | Accuracy73.31 | 19 | |
| Real-world visual perception | MME-RealWorld Lite | Accuracy56.38 | 19 | |
| Multimodal Question Answering | MME-RealWorld-Lite 1.0 (test) | Perception (AD) Acc57.7 | 19 | |
| Perception-intensive Reasoning | MME-RealWorld-Lite (MRWL) | Score55.13 | 18 | |
| Real-world Visual Question Answering | MME-RealWorld-Lite (MMERW) | Accuracy49.03 | 16 | |
| Multimodal Evaluation | MME-RealWorld | Accuracy71.2 | 15 | |
| Fine-grained visual reasoning | MME Realworld Lite | Avg@155.8 | 12 | |
| Reasoning | MME-RealWorld Lite (test) | OCR76 | 12 | |
| Autonomous Driving (Perception, Prediction & Planning) | MME-RealWorld | Overall Score (P+P+P)67 | 11 | |
| Remote Sensing Visual Question Answering | MME-RealWorld-RS | Position Score58.15 | 11 | |
| Multimodal Evaluation | MME-RealWorld Lite | Score57.8 | 10 | |
| Fine-Grained Perception & Understanding | MME-RealWorld lite | Accuracy49.87 | 9 | |
| Multimodal Perception and Reasoning | MME-RealWorld Lite | Overall Score59.87 | 7 | |
| Fine-grained Perception | MME-RealWorld Lite | Score51.49 | 6 | |
| General Visual Question Answering | MME-RealWorld en | Score63.2 | 6 | |
| Efficiency Evaluation | MME-RealWorld Lite | Average Time per Sample (s)3.2 | 5 |