Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
General Knowledge Reasoning on MMLU (test)
Loading...
87.2
Hypervolume
RADAR
68.324
73.2245
78.125
83.0255
Sep 29, 2025
Hypervolume
Updated 1mo ago
Evaluation Results
Method
Method
Links
Hypervolume
RADAR
Evaluation Protocol=ID...
2025.09
87.2
IRT-Router
Evaluation Protocol=ID...
2025.09
86.04
RouterBench
Evaluation Protocol=ID...
2025.09
85.92
Random-Pair
Evaluation Protocol=ID...
2025.09
69.05
Feedback
Search any
task
Search any
task