| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| RACE high | Accuracy94.5 | 295 | 2d ago | ||
| BOOLQ | In-Squeeze | Accuracy94.47 | 219 | 3d ago | |
| RACE mid | Flexora | Accuracy89.9 | 196 | 3d ago | |
| RACE | Qwen-1.5 14B | Accuracy89.95 | 151 | 3d ago | |
| DROP | Direct Fine-tuning | DROP Accuracy88.8 | 103 | 3d ago | |
| DROP (dev) | QDGATp | F1 Score88.1 | 63 | 2d ago | |
| DROP (test) | Human Performance | F1 Score96.42 | 61 | 2d ago | |
| C3 | InternLM2-Chat-20B-SFT | Accuracy93.5 | 56 | 3d ago | |
| DROP | DeepSeek-R1 | F1 Score92.2 | 55 | 3d ago | |
| RACE | CORAL | Accuracy74.93 | 34 | 3d ago | |
| BoolQ (val) | Accuracy97.7 | 34 | 3d ago | ||
| BELEBELE | Trinity Large (MoE) | Average RC Score (BELEBELE)80 | 31 | 3d ago | |
| DROP (test) | TFL | F1 Score76 | 29 | 3d ago | |
| RACE-m | Fine-tuned SOTA | Accuracy0.931 | 28 | 3d ago | |
| QuAC | Fine-tuned SOTA | F1 Score74.4 | 28 | 3d ago | |
| ReCoRD | PaLM 2-L | Accuracy93.8 | 25 | 2d ago | |
| RACE-h (test) | Qwen-1.5 14B (Teacher) | Accuracy89.95 | 24 | 3d ago | |
| Belebele | BYOL-nya | Accuracy61 | 20 | 2d ago | |
| Race M | RWKV-7B | Race M Score45.47 | 18 | 3d ago | |
| Race-H | RWKV-7B | RACE-h Score38.56 | 18 | 3d ago | |
| RACE Middle School | Human Ceiling | Accuracy (RACE MS)95.4 | 16 | 2d ago | |
| C3 (test) | Qwen-1.5 14B (Teacher) | Accuracy77.38 | 16 | 3d ago | |
| RACE (dev) | ALBERTxxlarge | Accuracy88.1 | 16 | 3d ago | |
| BoolQ (test) | Accuracy99.87 | 16 | 3d ago | ||
| SQuAD Extract | Mean Per-Step Regret0.128 | 15 | 3d ago |