Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Mathematical Reasoning on AIME (test)
Loading...
0.7766
Hypervolume
IRT-Router
0.505472
0.575861
0.64625
0.716639
Sep 29, 2025
Hypervolume
Updated 1mo ago
Evaluation Results
Method
Method
Links
Hypervolume
IRT-Router
Evaluation Protocol=ID...
2025.09
0.7766
RADAR
Evaluation Protocol=ID...
2025.09
0.776
RouterBench
Evaluation Protocol=ID...
2025.09
0.768
Random-Pair
Evaluation Protocol=ID...
2025.09
0.5159
Feedback
Search any
task
Search any
task