| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| RACE high | Accuracy94.5 | 295 | 1mo ago | ||
| BOOLQ | In-Squeeze | Accuracy94.47 | 279 | 1mo ago | |
| RACE mid | Flexora | Accuracy89.9 | 196 | 1mo ago | |
| RACE | Qwen-1.5 14B | Accuracy89.95 | 151 | 1mo ago | |
| DROP | Direct Fine-tuning | DROP Accuracy88.8 | 111 | 1mo ago | |
| C3 | InternLM2-Chat-20B-SFT | Accuracy93.5 | 73 | 16d ago | |
| DROP | DeepSeek-R1 | F1 Score92.2 | 73 | 5d ago | |
| RACE | CORAL | Accuracy74.93 | 70 | 1mo ago | |
| DROP (dev) | QDGATp | F1 Score88.1 | 63 | 1mo ago | |
| DROP (test) | Human Performance | F1 Score96.42 | 61 | 1mo ago | |
| BoolQ | Accuracy (BoolQ)86.23 | 55 | 5d ago | ||
| Belebele | TildeOpen LLM | Accuracy84.7 | 39 | 1mo ago | |
| Belebele 28 European languages | Overall Score85.91 | 34 | 4d ago | ||
| BoolQ (val) | Accuracy97.7 | 34 | 1mo ago | ||
| SciQ | Accuracy93.7 | 32 | 8d ago | ||
| Quoref | OLTQA | F1 Score69.42 | 32 | 8d ago | |
| BELEBELE | Trinity Large (MoE) | Average RC Score (BELEBELE)80 | 31 | 1mo ago | |
| RACE-m | Fine-tuned SOTA | Accuracy0.931 | 31 | 1mo ago | |
| DROP (test) | TFL | F1 Score76 | 29 | 1mo ago | |
| QuAC | Fine-tuned SOTA | F1 Score74.4 | 28 | 1mo ago | |
| ReCoRD | PaLM 2-L | Accuracy93.8 | 25 | 1mo ago | |
| RACE | Zephyr-7B | First-Token Accuracy87.3 | 24 | 12d ago | |
| Lambada | U-PaLM | Accuracy80.5 | 24 | 1mo ago | |
| RACE-h (test) | Qwen-1.5 14B (Teacher) | Accuracy89.95 | 24 | 1mo ago | |
| RC | Yi | Accuracy76.5 | 23 | 4d ago |