| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| MT-AIME 24 | ROSA2 | Accuracy (%)44.4 | 40 | 1mo ago | |
| M-IMO | ROSA (+LM + M) | Accuracy39.16 | 20 | 1mo ago | |
| MMLU-ProX stratified subset 1,000 instances (test) | Task Arithmetic | Accuracy18.7 | 6 | 1mo ago | |
| GlobalMMLU | English Accuracy88.77 | 6 | 1mo ago | ||
| XCOPA | LLaMA3-8B | Accuracy73 | 6 | 1mo ago | |
| MMLU ProX Lite | Accuracy (en)79 | 3 | 4d ago | ||
| Polymath Low | Accuracy (en)96.5 | 3 | 4d ago | ||
| MGSM | U-PaLM | Accuracy49.9 | 2 | 1mo ago |