| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| MM-Vet | GPT-4o-0806 | MM-Vet Score80.8 | 281 | 2d ago | |
| MMMU (val) | Accuracy75 | 114 | 2d ago | ||
| MMStar | Masters | Accuracy82 | 81 | 3d ago | |
| MMMU Pro | Gemini-2.5 (Pro) | Accuracy76.96 | 55 | 3d ago | |
| MMBench | Qwen3-VL-8B-Thinking | Accuracy87 | 50 | 3d ago | |
| MMBench (dev) | GPT-4o | Accuracy87.6 | 47 | 3d ago | |
| MMMU | Gemini-2.5 (Pro) | Accuracy83.89 | 44 | 3d ago | |
| WeMath | Accuracy63.8 | 43 | 3d ago | ||
| SEED-Bench Image | Sphinx | Score74.2 | 32 | 3d ago | |
| M3CoT (test) | Total Acc91.61 | 31 | 3d ago | ||
| O3-BENCH (test) | INSIGHT-O3 | Chart Score0.756 | 30 | 3d ago | |
| MMMU (test) | GPT-4o | Accuracy64.7 | 30 | 2d ago | |
| MathVista | Seed 1.5-VL | Pass@185.6 | 30 | 3d ago | |
| MMStar | Octopus-8B (Ours) | Accuracy75.2 | 29 | 3d ago | |
| MathVista | Qwen3-VL-32B-Thinking | Accuracy85.9 | 29 | 3d ago | |
| MathVerse MINI | Qwen3-VL-8B-Thinking | Accuracy77.7 | 25 | 3d ago | |
| LMMs-Eval Average of 12 benchmarks | InternVL2-8B | Average Accuracy70.83 | 25 | 3d ago | |
| MMMU-Pro | Std-10 Score55 | 25 | 2d ago | ||
| NaturalBench | DART | Accuracy82.5 | 24 | 3d ago | |
| LogicVista | RTWI | Accuracy61.7 | 24 | 3d ago | |
| DynaMath | SwimBird | Accuracy67.2 | 24 | 3d ago | |
| MATH-Vision (full) | Qwen3-VL-8B-Thinking | Accuracy62.7 | 23 | 3d ago | |
| R1-Onevision-Bench (Overall) | MSSR | Accuracy39.2 | 23 | 3d ago | |
| MathVision | Pass@173.3 | 23 | 3d ago | ||
| MMT-Bench | Jigsaw | Accuracy57.88 | 23 | 3d ago |