| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| MMLU | Llama-Guard-3-8B | Accuracy100 | 47 | 1d ago | |
| When2Call | Task Calibration | Accuracy78.63 | 24 | 21d ago | |
| StrategyQA | UnifiedQA-3b | Accuracy83.4 | 16 | 3mo ago | |
| CIKQA | UnifiedQA-3b | Accuracy66.9 | 16 | 3mo ago | |
| e-SNLI | UnifiedQA-3b | Accuracy89.6 | 16 | 3mo ago | |
| AGNews | UnifiedQA-3b | Accuracy84.5 | 16 | 3mo ago | |
| OpenBookQA ARC-Easy WinoGrande HellaSwag PIQA MathQA | Original | Accuracy (OpenBookQA)35 | 11 | 8d ago | |
| 14 standard NLP tasks suite (held-out) | OPT 175B | StoryCloze86.9 | 8 | 3mo ago |