| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| AI2D | Accuracy94.2 | 247 | 18d ago | ||
| AI2D (test) | Accuracy94.7 | 131 | 11d ago | ||
| AI2D 1.0 (test) | Molmo-72B | Accuracy96.3 | 58 | 1mo ago | |
| AI2D | AI2D Score87.14 | 33 | 15d ago | ||
| AI2D F | Accuracy59.7 | 23 | 1mo ago | ||
| AI2D | Exact Match79.11 | 19 | 1mo ago | ||
| AI2D | Qwen2.5-VL-7B-Instruct + cold start | Pass@1 Accuracy86.5 | 16 | 4d ago | |
| AI2D | Qwen2.5-VL-32B | Accuracy84.5 | 16 | 16d ago | |
| AI2D | Score80.7 | 10 | 8d ago | ||
| AI2D | Vanilla RoPE | Score76.2 | 7 | 11d ago | |
| AI2D zero-shot | Zero-shot Accuracy83.65 | 5 | 1mo ago |