| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| General Benchmark Aggregation | Overall Evaluation Suite | Math Average59.8 | 21 | |
| Aggregated Programming Capability Evaluation | Overall Evaluation Suite | Macro Average Score64.2 | 10 | |
| General Language Modeling | Overall Evaluation Suite | Average Score73.6 | 4 |