| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| V*Bench | Accuracy95.7 | 58 | 3d ago | ||
| BLINK | Accuracy81 | 50 | 3d ago | ||
| NLVR2 | BEIT-3 | Accuracy92.6 | 49 | 3d ago | |
| NLVR2 (test) | SimVLM_HUGE | Accuracy85.15 | 44 | 3d ago | |
| MM-Vet | UniDFlow | Score82.7 | 34 | 3d ago | |
| HR-Bench 4K FSP | RTWI | ACC96.5 | 29 | 3d ago | |
| Geometric Shapes | RoT | Accuracy95.2 | 28 | 3d ago | |
| Jigsaw | AdaReasoner 7B | Accuracy88.6 | 25 | 3d ago | |
| HalluBench | SaEI | Accuracy71.85 | 24 | 3d ago | |
| NLVR2 (test-P) | BEiT-3 | Accuracy92.6 | 21 | 3d ago | |
| Vision-Centric Benchmarks | ViThinker | BLINK Score59.1 | 20 | 3d ago | |
| NLVR2 (val) | Accuracy91.1 | 20 | 3d ago | ||
| NLVR2 v2 (dev) | X2-VLM_large | Accuracy88.7 | 20 | 3d ago | |
| GQA (test-dev) | LLaVA-1.5-7B | Accuracy62 | 19 | 3d ago | |
| MMVP | GPT-4o | Accuracy86.3 | 19 | 3d ago | |
| VisualProbe Hard | Deepconf | Accuracy0.434 | 18 | 3d ago | |
| VisualProbe Medium | Deepconf | Accuracy40.6 | 18 | 3d ago | |
| VisualProbe Easy | RTWI | Accuracy65.4 | 18 | 3d ago | |
| MMMU Pro Vision | Accuracy51.9 | 18 | 3d ago | ||
| REASONMAP-PLUS | Weighted Accuracy88.95 | 16 | 3d ago | ||
| REASONMAP Long questions | Weighted Accuracy62.5 | 16 | 3d ago | ||
| REASONMAP Short questions | Weighted Accuracy0.5998 | 16 | 3d ago | ||
| CharXiv (val) | VReST-Vote | Text in Chart Accuracy37.95 | 16 | 3d ago | |
| NLVR2 (dev) | MADTP | Accuracy82.5 | 16 | 3d ago | |
| V* cross-domain (test) | VIRC-7B | Accuracy79.06 | 15 | 3d ago |