| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| RACE high | Accuracy94.5 | 295 | 3mo ago | ||
| BOOLQ | In-Squeeze | Accuracy94.47 | 279 | 2mo ago | |
| BoolQ | BEAM | Accuracy (BoolQ)88.07 | 228 | 7d ago | |
| RACE mid | Flexora | Accuracy89.9 | 196 | 3mo ago | |
| RACE | Qwen-1.5 14B | Accuracy89.95 | 151 | 3mo ago | |
| DROP | FLOWBOT | DROP Accuracy92.28 | 129 | 1d ago | |
| DROP | DeepSeek-R1 | F1 Score92.2 | 96 | 12d ago | |
| C3 | InternLM2-Chat-20B-SFT | Accuracy93.5 | 89 | 7d ago | |
| DROP (test) | Human Performance | F1 Score96.42 | 76 | 19d ago | |
| RACE | CORAL | Accuracy74.93 | 75 | 7d ago | |
| DROP (dev) | QDGATp | F1 Score88.1 | 63 | 3mo ago | |
| RACE | MSRS | Accuracy68.3 | 59 | 7d ago | |
| BoolQ (test) | Accuracy99.87 | 43 | 21d ago | ||
| SQuAD | Attack Accuracy75.91 | 40 | 1mo ago | ||
| Belebele | TildeOpen LLM | Accuracy84.7 | 39 | 2mo ago | |
| Belebele 28 European languages | Overall Score85.91 | 34 | 1mo ago | ||
| BoolQ (val) | Accuracy97.7 | 34 | 3mo ago | ||
| Belebele c | Q3 negatives | Accuracy (Normalized)37.11 | 32 | 1mo ago | |
| SciQ | Accuracy93.7 | 32 | 1mo ago | ||
| Quoref | OLTQA | F1 Score69.42 | 32 | 1mo ago | |
| BELEBELE | Trinity Large (MoE) | Average RC Score (BELEBELE)80 | 31 | 3mo ago | |
| RACE-m | Fine-tuned SOTA | Accuracy0.931 | 31 | 3mo ago | |
| DROP (test) | TFL | F1 Score76 | 29 | 3mo ago | |
| QuAC | Fine-tuned SOTA | F1 Score74.4 | 28 | 3mo ago | |
| RACE-h | PaLM 2-L | Accuracy62.3 | 26 | 19d ago |