| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| AI2D | Accuracy94.2 | 317 | 1d ago | ||
| AI2D (test) | Accuracy94.7 | 154 | 1d ago | ||
| AI2D 1.0 (test) | Molmo-72B | Accuracy96.3 | 58 | 3mo ago | |
| AI2D | AI2D Score87.14 | 39 | 26d ago | ||
| AI2D F | Accuracy59.7 | 23 | 3mo ago | ||
| AI2D lite | PVM-8B (SFT + GRPO) | Accuracy82.8 | 20 | 1mo ago | |
| AI2D | V^2Drop | AI2D Accuracy55.38 | 19 | 21d ago | |
| AI2D | Exact Match79.11 | 19 | 3mo ago | ||
| AI2D | Qwen2.5-VL-7B-Instruct + cold start | Pass@1 Accuracy86.5 | 16 | 1mo ago | |
| AI2D | Qwen2.5-VL-32B | Accuracy84.5 | 16 | 2mo ago | |
| AI2D | Gemini 2.5 Flash | Score87.7 | 15 | 1mo ago | |
| AI2D w mask | BAGEL | Score88.9 | 7 | 1mo ago | |
| AI2D | Vanilla RoPE | Score76.2 | 7 | 1mo ago | |
| AI2D zero-shot | Zero-shot Accuracy83.65 | 5 | 3mo ago |