| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| MathVision (test) | Accuracy (%)47.7 | 45 | 25d ago | ||
| MMVet (test) | GPT-4o | Accuracy80.8 | 30 | 1mo ago | |
| MMMU-Pro | CoT2-Meta | Accuracy85.6 | 28 | 16d ago | |
| MMK12 (test) | Accuracy73.9 | 28 | 1mo ago | ||
| EMMA | GPT-4o | Accuracy32.7 | 26 | 1mo ago | |
| MUIRBENCH | Difference Reasoning Accuracy92.94 | 19 | 1mo ago | ||
| NuPlanQA EVAL | GPT-4o | Traffic Light Accuracy68.5 | 18 | 22d ago | |
| M3CoT | SINKTRACK | Accuracy66.94 | 12 | 5d ago | |
| MMStar | SINKTRACK | Accuracy63.78 | 12 | 5d ago | |
| LLMs-Eval mini (test) | SAPO + AT-RL | GeoQAtest Accuracy52.77 | 12 | 1mo ago | |
| MMBench Overall & Relation Reasoning | ChainMPQ | Overall Accuracy84.7 | 8 | 1mo ago | |
| Science QA v1.3-13B (test) | AWQ | Time8.9 | 7 | 1mo ago | |
| CV-Bench | Robix 7B Base | Overall Accuracy86.5 | 6 | 1mo ago | |
| RISE | GPT-4o | Temporal Reasoning Score34.1 | 5 | 1mo ago | |
| MMStar | V-GIFT | MMStar Score43.7 | 3 | 3d ago | |
| WH-VQA | STaR | SR64 | 3 | 1mo ago |