| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Language Understanding | MMLU-Redux Generative | Humanities Accuracy87.8 | 16 | |
| Knowledge Evaluation | MMLU-Redux 2.0 (Continual) | Accuracy33.49 | 6 | |
| Knowledge Evaluation | MMLU-Redux 2.0 (Original) | Accuracy42.03 | 6 | |
| Query Routing | MMLU-Redux OOD | CPT (80%)52.5 | 4 |