Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Ranking Correlation Analysis on AIME 25
Loading...
0.798
Kendall's tau_b (vs Gold Standard)
Bayes_R0@1
0.7581
0.77805
0.798
0.81795
Mar 11, 2026
Kendall's tau_b (vs Gold Standard)
Kendall's tau_b (vs Method@80)
Updated 1mo ago
Evaluation Results
Method
Method
Links
Kendall's tau_b (vs Gold Standard)
Kendall's tau_b (vs Method@80)
Bayes_R0@1
Sample Budget (N)=1, P...
2026.03
0.798
-
Rasch MML LCB (rasch_mml_credible)
Sample Budget (N)=1
2026.03
-
0.834
Feedback
Search any
task
Search any
task