| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Visual Question Answering | RealworldQA | Accuracy80.2 | 98 | |
| Real-world Visual Question Answering | RealWorldQA | Accuracy77.8 | 91 | |
| Real-world Multimodal Reasoning | RealWorldQA | Accuracy75.4 | 40 | |
| Visual Question Answering | RealWorldQA (test) | Accuracy79 | 36 | |
| Real-world QA | RealworldQA | Accuracy73.1 | 33 | |
| Spatial Reasoning | RealWorldQA | Accuracy69.67 | 32 | |
| Spatial Understanding | RealWorldQA | RWQA Score66.01 | 30 | |
| General Visual Understanding | RealWorldQA | Accuracy67.58 | 28 | |
| Real-world Question Answering | RealWorldQA | Accuracy79 | 27 | |
| Real-world Visual Understanding | RealWorldQA | Accuracy65.5 | 24 | |
| Multimodal Understanding | RealWorldQA | RWQA Score78 | 24 | |
| Vision-centric Reasoning | RealWorldQA | Accuracy73.3 | 18 | |
| Real-world Multimodal Interaction | RealWorldQA (test) | Accuracy77.8 | 18 | |
| Vision Understanding | RealworldQA | Overall Score75.4 | 17 | |
| Real-world Multimodal Interaction | RealWorldQA | RealWorldQA Score76.5 | 15 | |
| Visual Question Answering | RealWorldQA 1.0 (test) | Accuracy0.6353 | 15 | |
| Visual Understanding | RealWorldQA | Accuracy (Clean)68.23 | 7 | |
| Visual Question Answering | RealWorldQA 2024 | Score64.8 | 7 | |
| General visual question answering | RealWorldQA | Pass@178.4 | 7 | |
| General Visual Question Answering | RealWorldQA (avg) | Score0.787 | 7 | |
| Spatial Understanding | RealWorldQA | Accuracy79.61 | 6 | |
| General Visual Question Answering | RealWorldQA 2024 | Accuracy71.9 | 6 | |
| Real-World Understanding | Real-world Understanding (RealWorldQA, MME-RW, R-Bench) | RealWorld QA Score68.2 | 5 | |
| General | RealWorldQA | Score0.779 | 4 | |
| Multimodal Hallucination and Real-world Evaluation | RealWorldQA | Accuracy74.6 | 3 |