| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| MMBench | Qwen3-VL-32B | Accuracy90.6 | 847 | 1d ago | |
| MM-Vet | InternVL3.5-38B | MM-Vet Score82.2 | 631 | 2d ago | |
| SEED-Bench | LLaVA-UHD | Accuracy81.7 | 516 | 1d ago | |
| MMMU | Accuracy81.8 | 437 | 2mo ago | ||
| MMStar | InternVL3-8B-Masters | Accuracy82 | 407 | 5d ago | |
| MMBench CN | InternVL2.5-78B | Accuracy88.5 | 254 | 1d ago | |
| MMMU | LLaVA-1.5 | MMMU Score67.8 | 232 | 14d ago | |
| SEED | InternVL3-8B-Masters | Accuracy82.6 | 216 | 14d ago | |
| MME | Qwen2-VL-7B | MME Score2,322 | 207 | 2mo ago | |
| MMMU (val) | MMMU Score85.2 | 199 | 1d ago | ||
| MMBench (MMB) | VLsI-7B | Accuracy86.3 | 166 | 5d ago | |
| SEED-Bench Image | LLaVA-OneVision-72B | Accuracy78 | 143 | 5d ago | |
| SEEDBench2 Plus | MIRROR (ours) | Accuracy76.86 | 138 | 14d ago | |
| non-MME benchmarks | Accuracy84.4 | 128 | 15d ago | ||
| MME | InternVL3 | Score2,393 | 125 | 5d ago | |
| MM-VET (test) | GPT-4V | Total Score67.6 | 120 | 1mo ago | |
| POPE | InternVL2.5 | POPE Score0.906 | 112 | 14d ago | |
| MMMU (test) | Qwen3-VL | MMMU Score69.6 | 112 | 1mo ago | |
| SEED-2-Plus | VLSI-2B | Accuracy81.1 | 110 | 3mo ago | |
| MMMU | MMMU Score81.8 | 102 | 7d ago | ||
| LLaVA Evaluation Suite 1.5 | Vanilla | Average Score100 | 95 | 2mo ago | |
| LLaVA-Bench | ResDec | Overall Score91.9 | 94 | 16d ago | |
| MMBench Chinese | Qwen3-VL-30B A3B-Instruct | MMB Benchmark (CN)89.5 | 86 | 15d ago | |
| MMMU, SEED, OCRBench, VizWiz, ScienceQA, and TextVQA Average (test) | Average Accuracy78.1 | 84 | 8d ago | ||
| MMBench English | Qwen3-VL-235B | Accuracy88.79 | 81 | 8d ago |