| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Zero-shot Evaluation | Downstream Task Suite (ARC-C, BoolQ, HellaSwag, MMLU, OBQA, PIQA, RTE, WinoGrande) zero-shot Qwen1.5-MoE-A2.7B | ARC-C Accuracy45 | 6 | |
| Language Understanding and Reasoning | Downstream Task Suite (PIQA, ARC-e, HellaSwag, GPQA, Lambada, MMLU, BBH) | PIQA50.67 | 2 |