| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| ScienceQA | LongVILA-7B (S3) | Accuracy98.5 | 502 | 3d ago | |
| ARC Challenge | Qwen-3-30B-A3B | Accuracy96 | 342 | 11d ago | |
| ScienceQA IMG | EMOVA | Accuracy98.2 | 294 | 12d ago | |
| ScienceQA (SQA) | Qwen2.5-VL | Accuracy88.8 | 273 | 3d ago | |
| ScienceQA (test) | Perceptio | Average Accuracy98.3 | 245 | 3d ago | |
| ARC-C | Accuracy96.3 | 193 | 9d ago | ||
| ARC-E | Qwen3-4B | Accuracy97.53 | 184 | 9d ago | |
| ARC Easy | Base | Accuracy98 | 155 | 4d ago | |
| ScienceQA SQA-IMG | TroL | Accuracy92.8 | 139 | 4d ago | |
| SciQ | HPTQ | Normalized Accuracy97.7 | 137 | 1mo ago | |
| ScienceQA SQA-I | InternVL2-8B + RP | Accuracy96.6 | 103 | 25d ago | |
| GPQA | M2CL | pass@1 Accuracy87.6 | 85 | 1mo ago | |
| ScienceQA IMG (test) | InternVL2.5-8B | Accuracy98.4 | 74 | 11d ago | |
| ScienceQA | Qwen2.5-VL-7B-Instruct | IMG Score88.6 | 64 | 1mo ago | |
| GPQA main (test) | Min-k | Exact Match Accuracy40.85 | 60 | 4d ago | |
| SciQA-IMG | Phi 3.5 Vision | SciQA-IMG Accuracy89 | 53 | 1mo ago | |
| SciQ | Accuracy (SciQ)85.1 | 52 | 3d ago | ||
| ScienceQA Image | FastV | Score74.2 | 51 | 25d ago | |
| GPQA | CoT2-Meta | Accuracy91.5 | 46 | 11d ago | |
| GPQA | RLTR | Accuracy34.8 | 46 | 1mo ago | |
| ARC | ARC Accuracy98.8 | 46 | 26d ago | ||
| GPQA | Llama-Instruct | Accuracy54.5 | 42 | 1mo ago | |
| ARC-c (test) | Trinity Large (MoE) | Accuracy90 | 40 | 1mo ago | |
| Sci-QA | Score94.7 | 32 | 19d ago | ||
| ARC-C | Evo 8B | Accuracy92.5 | 32 | 1mo ago |