| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| MMLU, ARC-c, HellaSwag, BOOLQ, PIQA, WinoGrande zero-shot | Average Score (Zero-shot)69.72 | 20 | 1mo ago | ||
| Standard Downstream Benchmarks Two-Shot (val) | AdaGC | ARC-E Accuracy (Normalized)56.86 | 11 | 1mo ago | |
| General Benchmarks Italian | Qwen2.5 | ARC-C-it37.47 | 6 | 1mo ago |