| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| 11 Downstream Tasks Aggregate | Average Accuracy64.6 | 32 | 1mo ago | ||
| HellaSwag | Phi-3 Mini-4k | Accuracy76.7 | 13 | 1mo ago | |
| OBQA | TinyLlama-1.1B | Accuracy25.2 | 7 | 1mo ago | |
| BoolQ | LLaMA-MoE 2/8 | Accuracy61.93 | 7 | 1mo ago | |
| LogiQA | DIVE 2/8 | Accuracy22.12 | 7 | 1mo ago | |
| MathQA | TinyLlama-1.1B | Accuracy24.32 | 7 | 1mo ago | |
| ARC Challenge | TinyLlama-1.1B | Accuracy35.67 | 7 | 1mo ago | |
| ARC Easy | TinyLlama-1.1B | Accuracy61.66 | 7 | 1mo ago | |
| WinoGrande | TinyLlama-1.1B | Accuracy59.43 | 7 | 1mo ago | |
| PIQA | TinyLlama-1.1B | Accuracy72.6 | 7 | 1mo ago | |
| SciQ | TinyLlama-1.1B | Accuracy89.3 | 7 | 1mo ago | |
| Poisoned Apple simple_5x5 | Total Reward0.96 | 4 | 5d ago | ||
| Frozen Lake standard_4x4 | Total Reward1 | 4 | 5d ago |