| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| General Language Performance | Aggregate Suite | Average Score78.03 | 14 | |
| Text Clustering | Aggregate Suite (test) | Macro Accuracy82.2 | 14 | |
| General Evaluation | Aggregate Suite PIQA, HellaSwag, WinoGrande, ARC-e, ARC-c | Average Score69 | 10 |