| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| Downstream Suite Zero-shot (ARC-E, ARC-C, HellaS., PIQA, WG, OBQA, SciQ, BoolQ) | Looped Hybrid (Full+GDN) | ARC-Easy Accuracy74.82 | 26 | 13d ago | |
| Downstream Evaluation Suite (ARC, PIQA, HellaSwag, WinoGrande, LAMBADA, RACE) Zero-shot | UNIPOOL | ARC-E Accuracy56.57 | 14 | 26d ago | |
| Downstream Tasks MMLU, HellaSwag, PIQA, BoolQ, WinoGrande, ARC-E, ARC-C, OBQA (500 samples per task) | MMLU74.4 | 7 | 1mo ago | ||
| LM Evaluation Harness 0-shot v1.0.0 | HellaSwag Accuracy50.1 | 4 | 14d ago |