| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Visual Question Answering | RealworldQA | Accuracy80.2 | 259 | |
| Real-world Visual Question Answering | RealWorldQA | Accuracy79.35 | 173 | |
| Real-world Visual Understanding | RealWorldQA | Accuracy81.4 | 110 | |
| Vision-centric Reasoning | RealWorldQA | Accuracy75.4 | 66 | |
| General Visual Understanding | RealWorldQA | Accuracy71.3 | 62 | |
| Real-world QA | RealworldQA | Accuracy75.7 | 62 | |
| Real-world Question Answering | RealWorldQA | Overall Score78.7 | 58 | |
| Real-world Multimodal Reasoning | RealWorldQA | Accuracy75.4 | 57 | |
| Spatial Reasoning | RealWorldQA | Accuracy69.67 | 52 | |
| Visual Question Answering | RealWorldQA (test) | Accuracy79 | 47 | |
| Multimodal Reasoning | RealWorldQA | Accuracy81.39 | 40 | |
| Multimodal Reasoning | RealWorldQA | Mean@8 Accuracy70.46 | 40 | |
| Real-world Visual Understanding | RealWorldQA | Score72.29 | 39 | |
| Multimodal Understanding | RealWorldQA | RWQA Score78 | 33 | |
| Perception and Reasoning | RealWorldQA | Score74.2 | 31 | |
| Spatial Understanding | RealWorldQA | RWQA Score66.01 | 30 | |
| General Utility | RealWorldQA | RealWorldQA Score72.2 | 21 | |
| Real-world Multimodal Understanding | RealWorldQA | Accuracy73.99 | 21 | |
| General Reasoning & Understanding | RealWorldQA | Accuracy (RealWorldQA)72.6 | 21 | |
| Real-world Visual Understanding | RealWorldQA (test) | Final Performance77.1 | 20 | |
| General Visual Question Answering | RealWorldQA | Score73.1 | 20 | |
| Real-world Multimodal Interaction | RealWorldQA (test) | Accuracy77.8 | 18 | |
| Vision Understanding | RealworldQA | Overall Score75.4 | 17 | |
| Vision-Centric Perception | RealWorldQA | Accuracy69.3 | 16 | |
| Real-world Perception | RealWorldQA | Accuracy65.1 | 16 |