| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| LM Eval Harness (HellaSwag, BoolQ, WinoGrande, PiQA, ARC-easy, ARC-challenge) zero-shot | GIFT-SW | Mean Accuracy75.46 | 60 | 4d ago | |
| HellaSwag, BoolQ, WinoGrande, PiQA, ARC-easy, and ARC-challenge Zero-shot LM Eval Harness | Mean Accuracy77.02 | 24 | 4d ago |