Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Correctness Prediction on LiveCodeBench
Loading...
79.5
WP-AUC
Op-XGB
41.124
51.087
61.05
71.013
May 28, 2026
WP-AUC
Updated 5d ago
Evaluation Results
Method
Method
Links
WP-AUC
Op-XGB
Evaluation Protocol=ID
2026.05
79.5
Op-XGB
Evaluation Protocol=CD
2026.05
75.4
OST
Evaluation Protocol=ID
2026.05
73.2
OST
Evaluation Protocol=CD
2026.05
72.7
Backtrack
Evaluation Protocol=CD
2026.05
61.6
Backtrack
Evaluation Protocol=ID
2026.05
61.6
Wait
Evaluation Protocol=CD
2026.05
59.1
Wait
Evaluation Protocol=ID
2026.05
59.1
Length
Evaluation Protocol=ID
2026.05
57.4
SelfCheck
Evaluation Protocol=CD
2026.05
49.6
SelfCheck
Evaluation Protocol=ID
2026.05
49.6
Length
Evaluation Protocol=CD
2026.05
42.6
Feedback
Search any
task
Search any
task