| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| ARC Easy | BERT-Judge | Accuracy99.7 | 188 | 4d ago | |
| MMLU | M2CL | Accuracy97.5 | 185 | 5d ago | |
| MMLU-Pro | BERT-Judge | MMLU-Pro Overall Accuracy96.5 | 119 | 5d ago | |
| ARC Challenge | Acc74.7 | 118 | 24d ago | ||
| IndoCulture native prompts (test) | Gemma2-9B | Accuracy67.5 | 99 | 1mo ago | |
| HellaSwag | Stable-LoRA | Accuracy93.59 | 93 | 24d ago | |
| ArabCulture 1.0 (test) | Qwen2.5-7B | Accuracy59.6 | 84 | 1mo ago | |
| SciQ | QA | Accuracy100 | 81 | 24d ago | |
| MMLU 5-shot | EdgeJury | Accuracy73.4 | 73 | 16d ago | |
| OBQA | ETN | Accuracy93.2 | 69 | 5d ago | |
| ARC-Easy (test) | TaT | Accuracy89.1 | 68 | 1mo ago | |
| RACE | QA | Accuracy98.24 | 54 | 5d ago | |
| MC (test) | MC Avg78 | 46 | 29d ago | ||
| ARC Challenge (test) | TaT | Accuracy82.17 | 44 | 1mo ago | |
| ConFiQA MC | ProbeRAG | F1 Score91.2 | 42 | 4d ago | |
| OpenBookQA (test) | TaT | Accuracy90.8 | 39 | 1mo ago | |
| MedQA | DS2-INSTRUCT | Accuracy50.98 | 39 | 1mo ago | |
| TruthfulQA MC1 | EdgeJury | MC1 Accuracy76.2 | 39 | 15d ago | |
| ARC Challenge | Non-generative Accuracy0.6451 | 36 | 1mo ago | ||
| MMLU Medical subjects | Qwen3-32B | Anatomy (EN) Accuracy80 | 35 | 9d ago | |
| MMLU Medical Subjects (test) | Qwen3-14B | Accuracy (College Biology, EN)92.36 | 35 | 9d ago | |
| Bangla MMLU 1.0 (test) | Qwen-2.5-1.5b | Accuracy35 | 33 | 1mo ago | |
| MMLU | Original | STEM Accuracy67.1 | 33 | 9d ago | |
| UNED-Access Spanish 2024 (test) | Accuracy93 | 32 | 19d ago | ||
| UNED-Access English 2024 (test) | Accuracy92 | 32 | 19d ago |