| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Commonsense Question Answering | CosmosQA | Accuracy94 | 54 | |
| Commonsense Question Answering | CosmosQA (test) | EM92.25 | 24 | |
| Binary Classification | CosmosQA | Accuracy90 | 18 | |
| Reading Comprehension | CosmosQA (test) | Accuracy91.8 | 5 | |
| Reasoning trace quality evaluation | CosmosQA | Grammar Score2.1 | 2 |