Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Correctness Prediction on MMLU Pro
Loading...
0.639
WP-AUC
Op-XGB
0.51524
0.54737
0.5795
0.61163
May 28, 2026
WP-AUC
Updated 5d ago
Evaluation Results
Method
Method
Links
WP-AUC
Op-XGB
Evaluation Protocol=ID
2026.05
0.639
OST
Evaluation Protocol=ID
2026.05
0.638
Op-XGB
Evaluation Protocol=CD
2026.05
0.635
OST
Evaluation Protocol=CD
2026.05
0.635
Wait
Evaluation Protocol=CD
2026.05
0.633
Wait
Evaluation Protocol=ID
2026.05
0.633
Backtrack
Evaluation Protocol=CD
2026.05
0.628
Backtrack
Evaluation Protocol=ID
2026.05
0.628
Length
Evaluation Protocol=CD
2026.05
0.587
Length
Evaluation Protocol=ID
2026.05
0.587
SelfCheck
Evaluation Protocol=CD
2026.05
0.52
SelfCheck
Evaluation Protocol=ID
2026.05
0.52
Feedback
Search any
task
Search any
task