| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| ARC Challenge | GPT-4 | Accuracy96.3 | 906 | 2mo ago | |
| ARC Challenge | Frozen LLM graph | Accuracy (ARC)87.3 | 598 | 4d ago | |
| ARC Easy | Mistral Small 24B Inst 2501 | Accuracy98.2 | 597 | 16h ago | |
| ARC-E | Direct Fine-tuning | Accuracy95.23 | 523 | 14d ago | |
| PIQA | Mashup Learning | Accuracy86.5 | 505 | 14d ago | |
| OpenBookQA | LMSI | Accuracy94.4 | 465 | 3mo ago | |
| ARC Easy | LFTF | Normalized Acc96.4 | 391 | 20h ago | |
| SQuAD v1.1 (dev) | Megatron-3.9B ensemble | F1 Score95.8 | 380 | 1mo ago | |
| OBQA | Direct Fine-tuning | Accuracy94.95 | 347 | 5d ago | |
| BoolQ | PaLM 2-L | Accuracy90.9 | 317 | 1mo ago | |
| OpenBookQA | Accuracy96.07 | 305 | 4d ago | ||
| SciQ | MSSRfull | Accuracy97.2 | 283 | 1mo ago | |
| SQuAD v1.1 (test) | LUKE | F1 Score95.4 | 260 | 3mo ago | |
| GPQA | UPA | Accuracy84.2 | 258 | 3mo ago | |
| ARC-C | DRAG | Accuracy94.1 | 258 | 19d ago | |
| 2WIKI | EM86 | 241 | 4d ago | ||
| TriviaQA | RankCoT | Accuracy86.68 | 238 | 2mo ago | |
| ARC | Yi-34B + RTD | Accuracy94.6 | 230 | 2mo ago | |
| Bamboogle | RAGShaper | EM60 | 227 | 1d ago | |
| SQuAD 2.0 | RoBERTa | F189.4 | 215 | 25d ago | |
| ARC Easy | IT-Prun | Accuracy90.48 | 210 | 4d ago | |
| BoolQ | ShortGPT | Accuracy90.03 | 201 | 16h ago | |
| PopQA | LogicGaze | Accuracy68.4 | 186 | 2mo ago | |
| TriviaQA | PaLM 2-L | EM86.1 | 182 | 1mo ago | |
| HotpotQA | SGIC | EM77.2 | 173 | 1d ago |