Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Ranking Correlation Analysis on AIME '24
Loading...
0.779
Kendall's tau_b (vs. Gold)
Bayes_R0@1
0.74005
0.759525
0.779
0.798475
Mar 11, 2026
Kendall's tau_b (vs. Gold)
Kendall's tau_b (vs. Baseline)
Updated 1mo ago
Evaluation Results
Method
Method
Links
Kendall's tau_b (vs. Gold)
Kendall's tau_b (vs. Baseline)
Bayes_R0@1
Sample Budget (N)=1, P...
2026.03
0.779
-
Rasch MML LCB (rasch_mml_credible)
Sample Budget (N)=1
2026.03
-
0.804
Feedback
Search any
task
Search any
task