| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| ScienceQA image | InternVL2.5-8B | Accuracy98 | 184 | 4d ago | |
| GPQA Diamond | Accuracy84.4 | 64 | 1mo ago | ||
| ScienceQA | MASQuant | Accuracy88.6 | 61 | 1mo ago | |
| SciQA | Qwen2.5-VL-32B | Accuracy91.4 | 35 | 11d ago | |
| GPQA | CoT | Average Inference Time (s)1.58 | 30 | 24d ago | |
| GPQA | LARFT | Score33.48 | 28 | 26d ago | |
| GPQA-D | IOA | Accuracy (GPQA-D)14.43 | 20 | 1mo ago | |
| SciRAG-SSLI hard 1.0 (test) | F1 Score46.86 | 19 | 1mo ago | ||
| SciRAG-SSLI easy 1.0 (test) | RankGPT | F1 Score46.55 | 19 | 1mo ago | |
| GPQA | CISPO | pass@118.2 | 18 | 1mo ago | |
| GPQA Diamond (test) | Transformers | Pass@149 | 16 | 1mo ago | |
| Science & QA Domain Out-of-Domain | SampleQA Score3.19 | 11 | 1mo ago | ||
| GPQA | AIM | Accuracy70.7 | 11 | 1mo ago | |
| HLE Drug Discovery | Mozi | Exact Match Accuracy21.42 | 9 | 1mo ago | |
| ScienceQA I | CVLM (3M IKPairs) w/o FKA | Accuracy69.96 | 8 | 1mo ago | |
| Scientific Disciplines In-Domain | FT (tuned) | Chemistry Accuracy64.9 | 6 | 1mo ago | |
| MMLU-Pro | OctoTools | Accuracy73.7 | 4 | 3d ago | |
| GPQA | OctoTools | Accuracy54.7 | 4 | 3d ago | |
| GPQA | Score27 | 4 | 1mo ago | ||
| GPQA Diamond | Gemma-3-27b-it | pass@5092.42 | 2 | 1mo ago | |
| GPQA Diamond (VeRA-E) | Avg@5 Accuracy (Seeds)79.27 | 1 | 1mo ago |