| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Common-sense reasoning | CSR (ARC-Easy, ARC-Challenge, BoolQ, PIQA, SIQA, HellaSwag, OpenBookQA, WinoGrande) zero-shot lm-evaluation-harness v0.4.2 | Accuracy68.95 | 32 | |
| Commonsense Reasoning | CSR (Commonsense Reasoning Suite) | Average Accuracy72 | 10 | |
| Common Sense Reasoning | CSR zero-shot | CF Score5.2 | 2 |