Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Correctness Prediction on Global Pooled Datasets
Loading...
0.723
WP-AUC
Op-XGB
0.49836
0.55668
0.615
0.67332
May 28, 2026
WP-AUC
Updated 5d ago
Evaluation Results
Method
Method
Links
WP-AUC
Op-XGB
Evaluation Protocol=ID
2026.05
0.723
OST
Evaluation Protocol=ID
2026.05
0.703
Op-XGB
Evaluation Protocol=CD
2026.05
0.701
OST
Evaluation Protocol=CD
2026.05
0.701
Wait
Evaluation Protocol=ID
2026.05
0.603
Wait
Evaluation Protocol=CD
2026.05
0.6
Backtrack
Evaluation Protocol=CD
2026.05
0.594
Backtrack
Evaluation Protocol=ID
2026.05
0.594
Length
Evaluation Protocol=ID
2026.05
0.551
SelfCheck
Evaluation Protocol=CD
2026.05
0.512
SelfCheck
Evaluation Protocol=ID
2026.05
0.512
Length
Evaluation Protocol=CD
2026.05
0.507
Feedback
Search any
task
Search any
task