| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| MuirBench | L2-VMAS | Accuracy77.2 | 89 | 6d ago | |
| MIRB | Qwen2.5-VL | Accuracy63.57 | 70 | 2mo ago | |
| Mantis (test) | DelimScaling | Accuracy72.81 | 39 | 3mo ago | |
| Mantis | Qwen3-VL + S2H-DPO | Accuracy81.71 | 38 | 1mo ago | |
| Mantis-Eval | InternVL2.5-38B | Overall Score78.3 | 28 | 3mo ago | |
| Muirbench (test) | Accuracy68 | 24 | 3mo ago | ||
| BLINK (val) | Accuracy52.6 | 21 | 3mo ago | ||
| QBench2 (val) | MM1.5-30B | Accuracy79.3 | 21 | 3mo ago | |
| DEMON | Brote-IM-XXL | Accuracy38.94 | 21 | 3mo ago | |
| OmniContext | Single Scene Char Score8.96 | 20 | 3mo ago | ||
| VideoEspresso | ROVER-LSW | Narration Score50.6 | 12 | 6d ago |