| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| BLINK | Qwen3-VL-8B-Inst. | Accuracy85.2 | 107 | 21d ago | |
| GQA | ViCrop | Accuracy64.54 | 93 | 1mo ago | |
| V*Bench | Accuracy95.7 | 62 | 1mo ago | ||
| MMVP | GPT-4o | Accuracy86.3 | 58 | 23d ago | |
| V* | S1-VL-32B-RL | Accuracy92.7 | 52 | 19d ago | |
| MMStar | GazeVLM (Ours) | Accuracy69 | 51 | 2d ago | |
| BLINK | Human | Jigsaw Accuracy99 | 49 | 19d ago | |
| NLVR2 | BEIT-3 | Accuracy92.6 | 49 | 3mo ago | |
| MMBench | ThinkLite | Accuracy88.7 | 48 | 15d ago | |
| NLVR2 (test) | SimVLM_HUGE | Accuracy85.15 | 46 | 1mo ago | |
| HR-Bench 8K | DeepEyes | Overall Score72.6 | 42 | 21d ago | |
| HR-Bench 4K | SubagentVL | Overall Score0.77 | 42 | 21d ago | |
| MathVerse | Accuracy61.29 | 40 | 1mo ago | ||
| Jigsaw | AdaReasoner 7B | Accuracy88.6 | 40 | 1d ago | |
| MM-Vet | UniDFlow | Score82.7 | 40 | 2mo ago | |
| MMMU-Pro | Prod(VF, Contr.) | ECE14.4 | 32 | 1mo ago | |
| VizWiz | Plausibility | ECE0.11 | 32 | 1mo ago | |
| A-OKVQA | Avg(VF, Contr.) | ECE5.4 | 32 | 1mo ago | |
| Commonsense Reasoning | Jaccard Index (J)8 | 30 | 7d ago | ||
| Quantitative Reasoning | µCRASP | J Score8.44 | 30 | 7d ago | |
| Physical Reasoning | J Score9.01 | 30 | 7d ago | ||
| MMMU-Pro | Avg@839.42 | 29 | 1mo ago | ||
| HR-Bench 4K FSP | RTWI | ACC96.5 | 29 | 3mo ago | |
| VLMs are Blind | Accuracy77.8 | 28 | 12d ago | ||
| Geometric Shapes | RoT | Accuracy95.2 | 28 | 3mo ago |