Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Correctness Prediction on GPQA (WP-AUC)
Loading...
0.703
WP-AUC
Op-XGB
0.51892
0.56671
0.6145
0.66229
May 28, 2026
WP-AUC
Updated 5d ago
Evaluation Results
Method
Method
Links
WP-AUC
Op-XGB
Evaluation Protocol=ID
2026.05
0.703
OST
Evaluation Protocol=CD
2026.05
0.691
OST
Evaluation Protocol=ID
2026.05
0.691
Op-XGB
Evaluation Protocol=CD
2026.05
0.682
Wait
Evaluation Protocol=CD
2026.05
0.602
Wait
Evaluation Protocol=ID
2026.05
0.602
Length
Evaluation Protocol=CD
2026.05
0.591
Length
Evaluation Protocol=ID
2026.05
0.591
Backtrack
Evaluation Protocol=CD
2026.05
0.591
Backtrack
Evaluation Protocol=ID
2026.05
0.591
SelfCheck
Evaluation Protocol=CD
2026.05
0.526
SelfCheck
Evaluation Protocol=ID
2026.05
0.526
Feedback
Search any
task
Search any
task