Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Correctness Prediction on ARC Challenge
Loading...
64.5
WP-AUC
OST
49.316
53.258
57.2
61.142
May 28, 2026
WP-AUC
Updated 5d ago
Evaluation Results
Method
Method
Links
WP-AUC
OST
Evaluation Protocol=CD
2026.05
64.5
Op-XGB
Evaluation Protocol=CD
2026.05
61.8
SelfCheck
Evaluation Protocol=CD
2026.05
57.8
SelfCheck
Evaluation Protocol=ID
2026.05
57.8
Op-XGB
Evaluation Protocol=ID
2026.05
55.9
Length
Evaluation Protocol=CD
2026.05
55.9
Length
Evaluation Protocol=ID
2026.05
55.9
OST
Evaluation Protocol=ID
2026.05
55.6
Backtrack
Evaluation Protocol=CD
2026.05
52
Backtrack
Evaluation Protocol=ID
2026.05
52
Wait
Evaluation Protocol=CD
2026.05
50.1
Wait
Evaluation Protocol=ID
2026.05
49.9
Feedback
Search any
task
Search any
task