Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
General Knowledge Reasoning on MMLU-Redux (test)
Loading...
0.923
Hypervolume
RADAR
0.720304
0.772927
0.82555
0.878173
Sep 29, 2025
Hypervolume
Updated 1mo ago
Evaluation Results
Method
Method
Links
Hypervolume
RADAR
Evaluation Protocol=ID...
2025.09
0.923
IRT-Router
Evaluation Protocol=ID...
2025.09
0.9117
RouterBench
Evaluation Protocol=ID...
2025.09
0.9053
Random-Pair
Evaluation Protocol=ID...
2025.09
0.7281
Feedback
Search any
task
Search any
task