| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| MMBench EN | MaLoRA | Accuracy93.53 | 105 | 15d ago | |
| LLaVA-Bench Wild | GPT4V | LLaVA^W Score91.2 | 86 | 1mo ago | |
| MMBench | Oryx-1.5 | Mean Accuracy86.3 | 63 | 1mo ago | |
| MMBench EN | Overall Score86.3 | 55 | 2mo ago | ||
| MMVet | Accuracy85.67 | 55 | 26d ago | ||
| LLaVA Multi-modal Evaluation Suite (GQA, MMB, MME, POPE, SQA, VQAv2, TextVQA, MMMU, SEED-I) v1.6 (test) | Average Score100 | 53 | 3mo ago | ||
| MMBench (dev) | Mini-Gemini-HD | Overall Score80.6 | 40 | 3mo ago | |
| SEED-Bench (overall) | CSR | Overall Score62.9 | 40 | 3mo ago | |
| MMMU Pro (Overall) | PRISM + GRPO | Score53.3 | 34 | 1mo ago | |
| SEED-IMG | Qwen3-VL-4B | Accuracy78.46 | 29 | 1mo ago | |
| MME | InternVL3.5-8B | MME Score2,371.9 | 25 | 12d ago | |
| MMBench V1.1 | InternVL3.5-38B | Accuracy87.03 | 22 | 3mo ago | |
| MM-Vet | LLaVA1.5-BPO | Rec46.9 | 19 | 2mo ago | |
| MME | MME Score1,842 | 17 | 2mo ago | ||
| MM-Vet v1 (full) | Ours | Overall Score (MM-Vet v1)36.2 | 16 | 9d ago | |
| Cambrian | InternVL3.5-8B | Accuracy76.22 | 16 | 15d ago | |
| MuirBench | Score59.6 | 16 | 2mo ago | ||
| WorldSense | Qwen2.5-Omni-7B | WorldSense Performance46.85 | 14 | 19d ago | |
| TVL Benchmark | TVL-LLaMA (ViT-Base) | SSVTP Score6.16 | 14 | 1mo ago | |
| SEED-Bench all (val) | Accuracy65.6 | 14 | 3mo ago | ||
| MM-Bench-CN (MMBCN) (test) | Qwen | MMBCN Score84 | 13 | 1mo ago | |
| MM-Bench (MMB) (test) | Qwen | MMB Score86.3 | 13 | 1mo ago | |
| MMBench (test) | ScalSelect | MMBench Accuracy (En)65.3 | 12 | 3mo ago | |
| MMMU | iLLaVA | Accuracy (77.8% reduction ratio)64.3 | 11 | 2mo ago | |
| MMB | HiPrune | Score67 | 10 | 6d ago |