| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Reading Comprehension | RACE high | Accuracy94.5 | 295 | |
| Reading Comprehension | RACE mid | Accuracy89.9 | 196 | |
| Reading Comprehension | RACE | Accuracy89.95 | 151 | |
| Machine Reading Comprehension | RACE (test) | RACE Accuracy (Medium)95.4 | 111 | |
| Reading Comprehension | RACE | Accuracy74.93 | 70 | |
| Multiple Choice Question Answering | RACE | Accuracy98.24 | 54 | |
| Out-of-Distribution Detection | RACE to MMLU | AUROC87.57 | 41 | |
| Machine Reading Comprehension | RACE | RACE Overall Accuracy94.5 | 38 | |
| Reading Comprehension | RACE-m | Accuracy0.931 | 31 | |
| Uncertainty Estimation | RACE Llama-3.1-8B and Gemma-2-9B backbones (test) | AUROC91.3 | 24 | |
| Reading Comprehension | RACE | First-Token Accuracy87.3 | 24 | |
| Reading Comprehension | RACE-h (test) | Accuracy89.95 | 24 | |
| Question Answering | RACE MRQA out-of-domain evaluation | EM46.3 | 23 | |
| Reading Comprehension | RACE | RACE Middle Score70.2 | 21 | |
| Distribution Alignment | Race Even | MAE0.072 | 20 | |
| Understanding | RACE Middle | Score67.27 | 20 | |
| Question Answering | RACE-C | Accuracy93.66 | 19 | |
| Reading Comprehension | Race M | Race M Score45.47 | 18 | |
| Reading Comprehension | Race-H | RACE-h Score38.56 | 18 | |
| Reading Comprehension | RACE-h | Accuracy62.3 | 18 | |
| Reading Comprehension | RACE Middle School | Accuracy (RACE MS)95.4 | 16 | |
| Reading Comprehension | RACE (dev) | Accuracy88.1 | 16 | |
| Difficulty-controllable Question Generation | RACE (test) | Estimated Difficulty2.18 | 15 | |
| Reading Comprehension | Race 3shots | Accuracy83.31 | 14 | |
| Reading Comprehension | RACE MRQA out-of-domain | EM43.62 | 14 |