Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

MMLU-Redux

Benchmarks

Task NameDataset NameSOTA ResultTrend
Language UnderstandingMMLU-Redux Generative
Humanities Accuracy87.8
16
Knowledge EvaluationMMLU-Redux 2.0 (Continual)
Accuracy33.49
6
Knowledge EvaluationMMLU-Redux 2.0 (Original)
Accuracy42.03
6
Query RoutingMMLU-Redux OOD
CPT (80%)52.5
4
Showing 4 of 4 rows