Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

MMLU-CK

Benchmarks

Task NameDataset NameSOTA ResultTrend
Explanation Quality EvaluationMMLU-CK (test)
Reasoning Soundness Loss (%)44
2
Showing 1 of 1 rows