| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| MMMU, SEED, OCRBench, VizWiz, ScienceQA, TextVQA (test/val) | MMMU Score61.9 | 42 | 3mo ago | ||
| MME | MME Score78.7 | 26 | 3mo ago | ||
| MMBench Chinese (dev) | Accuracy72.8 | 22 | 3mo ago | ||
| Multimodal Evaluation Suite (MMVet, MMBench_EN, SEED-Bench, LLaVABench, POPE, MME-P, MMVP, MMStar) (Random Sampling Splits of CC12M) | PivotMerge | MMVet Score30.1 | 13 | 1mo ago | |
| Image Benchmarks HallBench, MME, TextVQA, ChartQA, AI2D, RealWorldQA, CCBench, OCRVQA, SQA-IMG, POPE | Qwen2.5-VL-7B | HallBench Score46.5 | 13 | 1mo ago | |
| MMMU | PTA-GRPO | Accuracy59 | 7 | 7d ago | |
| MLLM Evaluation Suite (MME, MMB, VizWiz, POPE, GQA, RQA, VQAT, SQA) standard (test val) | MME Score2,375 | 5 | 5d ago | ||
| MM-Vet | MM-Vet Score44.6 | 4 | 3mo ago | ||
| MMMU-Pro | PTA-GRPO | Accuracy44.7 | 3 | 7d ago | |
| MMBench Tibetan | FTibVLM | Overall Score67.78 | 2 | 7d ago |