| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| General Language Modeling | CodaSet OOD Average (test) | Performance (%)87.84 | 16 | |
| Holistic Evaluation | CodaSet ID Average (test) | Accuracy90.6 | 16 | |
| Instruction Following | CodaSet ID IFEVAL (test) | Accuracy88.76 | 16 | |
| Symbolic and Logical Reasoning | CodaSet BBH ID (test) | Accuracy94.29 | 16 | |
| Mathematical Reasoning | CodaSet ID GSM8k (test) | Accuracy0.964 | 16 |