Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

General Knowledge Reasoning on MMLU-Redux (test)

0.923Hypervolume

RADAR

0.7203040.7729270.825550.878173Sep 29, 2025
Updated 1mo ago

Evaluation Results

MethodLinks
2025.09
0.923
2025.09
0.9117
2025.09
0.9053
2025.09
0.7281