Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Reasoning on CorrectBench

0.8256Accuracy

MP

0.517240.5972950.677350.757405Feb 21, 2026
Updated 1mo ago

Evaluation Results

MethodLinks
2026.02
0.8256
2026.02
0.8256
2026.02
0.8215
2026.02
0.7903
2026.02
0.7502
2026.02
0.7486
2026.02
0.6814
2026.02
0.5291