| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| BENCH-PROXY (MMLU, ANLI, HellaSwag, PIQA, SIQA, W.G., ARC-E, ARC-C, C.QA, WSC) (test) | MMLU34.32 | 24 | 4d ago | ||
| LLM Evaluation Suite (MMLU, ARC-C, PIQA, WinoG, GSM8K, HellaSwag, GPQA, RACE) zero-shot LLaDA1.5 | Average Score58.59 | 13 | 4d ago |