| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Quantization Performance Summary | Aggregated Benchmarks HellaSwag, MMLU, Arc-C, MATH-500 | Average Score1.014 | 22 | |
| Reward Modeling | Aggregated Benchmarks Macro | Average Score (excl. MM-RB, VL-RB)74.74 | 12 | |
| Overall Language Model Evaluation | Aggregated Benchmarks STEM Code IF General | Average Score61.7 | 7 |