| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| MMVet | Bee-8B | Score83.9 | 40 | 1mo ago | |
| MMMU (val) | Ours (CDRL + CA-TTS) | Score59.9 | 15 | 1mo ago | |
| POPE | Molmo | Accuracy89 | 14 | 3d ago | |
| HRBench | Accuracy78.5 | 14 | 1mo ago | ||
| GUIChat | Accuracy93.14 | 14 | 1mo ago | ||
| CountBench | Accuracy97.35 | 10 | 1mo ago | ||
| HallusionBench | Accuracy73.48 | 9 | 1mo ago | ||
| MMStar (test) | Ours (CDRL + CA-TTS) | Overall Accuracy71.3 | 8 | 1mo ago | |
| MMT-Bench (val) | Bee-8B | Score67 | 7 | 1mo ago | |
| HallusionBench avg | Keye-VL | Score67 | 7 | 1mo ago | |
| AI2D | Keye-VL | Score86.7 | 7 | 1mo ago | |
| VLMs are Blind | Keye-VL | Score57.1 | 5 | 1mo ago | |
| MMMU-Pro standard | Bee-8B | Score50.7 | 5 | 1mo ago | |
| CV-Bench | Accuracy90.07 | 5 | 1mo ago | ||
| BLINK | Accuracy77.49 | 5 | 1mo ago | ||
| SimpleVQA | Accuracy74.06 | 5 | 1mo ago | ||
| MMStar | FineViT-VL | Accuracy60.87 | 4 | 1mo ago | |
| MMBench en v11 (dev) | FineViT-VL | Accuracy79.33 | 4 | 1mo ago | |
| VisuLogic | Bee-8B | Score26.5 | 4 | 1mo ago | |
| MMVP | Bee-8B | Score82 | 4 | 1mo ago | |
| MMBench CN (dev) | Keye-VL | Score92 | 4 | 1mo ago | |
| MMBench en (dev) | Qwen2.5-VL-72B | Accuracy88.6 | 3 | 1mo ago |