Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Science Reasoning on GPQA-Diamond (test)
Loading...
0.7513
Hypervolume
RADAR
0.546628
0.599764
0.6529
0.706036
Sep 29, 2025
Hypervolume
Updated 1mo ago
Evaluation Results
Method
Method
Links
Hypervolume
RADAR
Evaluation Protocol=ID...
2025.09
0.7513
IRT-Router
Evaluation Protocol=ID...
2025.09
0.6942
RouterBench
Evaluation Protocol=ID...
2025.09
0.6866
Random-Pair
Evaluation Protocol=ID...
2025.09
0.5545
Feedback
Search any
task
Search any
task