| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| ScienceQA | Phi-4 pass@N (Upper Bound) | Accuracy99.8 | 791 | 18h ago | |
| ARC Challenge | Qwen-3-30B-A3B | Accuracy96 | 354 | 21d ago | |
| ScienceQA IMG | EMOVA | Accuracy98.2 | 335 | 14d ago | |
| ScienceQA (SQA) | Qwen2.5-VL | Accuracy88.8 | 273 | 1mo ago | |
| ScienceQA (test) | Perceptio | Average Accuracy98.3 | 273 | 1d ago | |
| ARC-C | Accuracy96.3 | 261 | 5d ago | ||
| ARC-E | Qwen3-4B | Accuracy97.53 | 240 | 5d ago | |
| ScienceQA SQA-IMG | TroL | Accuracy92.8 | 186 | 1mo ago | |
| ARC Easy | Base | Accuracy98 | 162 | 7d ago | |
| SciQ | HPTQ | Normalized Accuracy97.7 | 137 | 2mo ago | |
| ScienceQA SQA-I | InternVL2-8B + RP | Accuracy96.6 | 122 | 4d ago | |
| ARC Challenge | TSDS | Accuracy93 | 108 | 19d ago | |
| SciQ | Accuracy (SciQ)94.3 | 101 | 20d ago | ||
| GPQA | M2CL | pass@1 Accuracy87.6 | 85 | 3mo ago | |
| OpenBookQA | EVOSELECT | Accuracy94.6 | 82 | 5d ago | |
| ARC | ARC Accuracy98.8 | 76 | 1d ago | ||
| ARC Easy | GEMMA-7B | Accuracy81.65 | 75 | 5d ago | |
| ScienceQA IMG (test) | InternVL2.5-8B | Accuracy98.4 | 74 | 1mo ago | |
| ScienceQA Image | LLaVA-OV-7B | Score95.6 | 70 | 19d ago | |
| GPQA | Accuracy63.51 | 69 | 1mo ago | ||
| GPQA | UAB | Accuracy55.8 | 69 | 6d ago | |
| ScienceQA | Qwen2.5-VL-7B-Instruct | IMG Score88.6 | 64 | 2mo ago | |
| GPQA main (test) | Min-k | Exact Match Accuracy40.85 | 60 | 1mo ago | |
| GPQA Diamond | Accuracy91.9 | 59 | 15d ago | ||
| SciQA-IMG | Phi 3.5 Vision | SciQA-IMG Accuracy89 | 53 | 3mo ago |