| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| Zero-Shot Evaluation Suite (Arc-e, Arc-c, Boolq, Hellaswag, Openbookqa, Piqa, SciQ, Winogrande) | StableQAT | ARC-E65.74 | 18 | 4d ago | |
| LM Eval ARCC, ARCE, HellaSwag, PIQA 0.4.4 standard (test) | ARCC61.6 | 18 | 4d ago | ||
| lm-eval-harness PIQA, COPA, OpenBookQA, Winogrande, SciQA, ARC-E, ARC-C | PIQA Accuracy78.8 | 10 | 4d ago |