| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| M3CoT | Gemini-3.0-Flash | Accuracy82.68 | 90 | 8d ago | |
| EMMA | Qwen2.5-VL-Instruct | Accuracy38.5 | 57 | 7d ago | |
| MMVet (test) | GPT-4o | Accuracy80.8 | 49 | 7d ago | |
| MoMentS | GPT-4o | Accuracy72.26 | 48 | 1mo ago | |
| MathVision (test) | Accuracy (%)47.7 | 45 | 2mo ago | ||
| MMMU-Pro | CoT2-Meta | Accuracy85.6 | 36 | 1mo ago | |
| MMStar | SINKTRACK | Accuracy63.78 | 28 | 1mo ago | |
| MMK12 (test) | Accuracy73.9 | 28 | 3mo ago | ||
| MUIRBENCH | Difference Reasoning Accuracy92.94 | 19 | 3mo ago | ||
| MMVet | AsyMoE-LLaMA3 | Score49.2 | 18 | 1d ago | |
| NuPlanQA EVAL | GPT-4o | Traffic Light Accuracy68.5 | 18 | 2mo ago | |
| MathVista | AutoNPO | Accuracy79.2 | 16 | 23d ago | |
| LLMs-Eval mini (test) | SAPO + AT-RL | GeoQAtest Accuracy52.77 | 12 | 3mo ago | |
| Geometry3K | APMPO | Pass@137.9 | 11 | 27d ago | |
| MMBench Overall & Relation Reasoning | ChainMPQ | Overall Accuracy84.7 | 8 | 3mo ago | |
| Science QA v1.3-13B (test) | AWQ | Time8.9 | 7 | 3mo ago | |
| CV-Bench | Robix 7B Base | Overall Accuracy86.5 | 6 | 3mo ago | |
| RISE | GPT-4o | Temporal Reasoning Score34.1 | 5 | 3mo ago | |
| MMStar | V-GIFT | MMStar Score43.7 | 3 | 1mo ago | |
| WH-VQA | STaR | SR64 | 3 | 3mo ago |