Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Correctness Prediction on MATH
Loading...
0.662
WP-AUC
OST
0.3968
0.46565
0.5345
0.60335
May 28, 2026
WP-AUC
Updated 5d ago
Evaluation Results
Method
Method
Links
WP-AUC
OST
Evaluation Protocol=ID
2026.05
0.662
OST
Evaluation Protocol=CD
2026.05
0.612
Op-XGB
Evaluation Protocol=CD
2026.05
0.597
Op-XGB
Evaluation Protocol=ID
2026.05
0.57
Wait
Evaluation Protocol=ID
2026.05
0.517
SelfCheck
Evaluation Protocol=CD
2026.05
0.504
SelfCheck
Evaluation Protocol=ID
2026.05
0.504
Wait
Evaluation Protocol=CD
2026.05
0.483
Backtrack
Evaluation Protocol=CD
2026.05
0.464
Backtrack
Evaluation Protocol=ID
2026.05
0.464
Length
Evaluation Protocol=CD
2026.05
0.407
Length
Evaluation Protocol=ID
2026.05
0.407
Feedback
Search any
task
Search any
task