| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Commonsense Reasoning | Commonsense Reasoning Benchmark | BoolQ Accuracy75.91 | 22 | |
| Commonsense Reasoning | Commonsense Reasoning Benchmark Intra-domain Multi-task | Average Accuracy87.27 | 14 | |
| Commonsense Reasoning | Commonsense Reasoning Benchmark Qwen2.5-1.5B-Instruct (test) | ARC-c Accuracy75.82 | 4 |