| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| Standard Zero-shot NLU Suite (ARC-challenge, ARC-easy, BoolQ, HellaSwag, LAMBADA, PIQA, RACE, SciQ, Record, OBQA) | MaxScore | ARC Challenge21.22 | 18 | 4d ago | |
| LM-Evaluation-Harness ARC, BoolQ, HellaSwag, LAMBADA, PIQA, RACE, SciQ, Record, OBQA | ARC Challenge46.8 | 13 | 4d ago | ||
| T0 (test) | T0-3B | Accuracy65.5 | 8 | 4d ago |