| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| MM-Vet | MM-Vet Score76.9 | 142 | 18d ago | ||
| MME | Qwen2-VL | MME Score2,321 | 54 | 26d ago | |
| ScienceQA, TextVQA, and GQA | Avg Relative Accuracy100 | 26 | 3d ago | ||
| MME perception and cognition v1.0 | BAGEL | MME Perception Score1,687 | 24 | 1mo ago | |
| SEED-Bench | DualToken | SEED Score71.8 | 23 | 1mo ago | |
| BLINK | GPT-5 | Accuracy69.86 | 21 | 1mo ago | |
| MME (total) | LVRPO | MME-P Score1,699 | 18 | 18d ago | |
| V* Bench | SenseNova-MARS-32B | Avg@8 EM0.942 | 18 | 1mo ago | |
| HR-Bench 8K | SenseNova-MARS-32B | Avg@8 Exact Match86.6 | 17 | 1mo ago | |
| HR-Bench 4K | SenseNova-MARS-32B | Avg@8 Exact Match90.2 | 17 | 1mo ago | |
| MMStar | MM-Eureka-Qwen-7B | Accuracy (Clean)65.9 | 16 | 1mo ago | |
| BLINK sub-tasks | InternVL3.5-4B+P^2 | Jigsaw Accuracy90.67 | 14 | 3d ago | |
| V* Bench, HR-Bench, and MME RealWorld | SenseNova-MARS-32B | Average Score85.9 | 13 | 1mo ago | |
| MME RealWorld | SenseNova-MARS-32B | Pass@1 Exact Match72.7 | 13 | 1mo ago | |
| CV-Bench | ERNIE 5.0-Base | Accuracy86.96 | 12 | 1mo ago | |
| SAT | GPT-5 | Accuracy73.3 | 11 | 1mo ago | |
| BLINK J | InternVL3.5 | Accuracy80.67 | 11 | 1mo ago | |
| VStar | Qwen2.5-VL | Accuracy85.86 | 11 | 1mo ago | |
| VisPuzzle | ThinkMorph | Accuracy79 | 11 | 1mo ago | |
| VSP | ThinkMorph | Accuracy75.83 | 11 | 1mo ago | |
| JARVIS-VLA Benchmark 1.0 (test) | GPT-4o | Accuracy76.7 | 10 | 1mo ago | |
| MMBench-EN (full) | Bagel | Score85 | 9 | 1mo ago | |
| R-Bench (test) | Robust-R1 (SFT and RL) | MCQ (low)65.29 | 8 | 1mo ago | |
| MMT | LLaVA | Score1,075.5 | 8 | 1mo ago | |
| RealWorldQA | Robust-R1 (SFT) | Accuracy (Clean)68.23 | 7 | 1mo ago |