Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Reasoning Quality Assessment on TheoremQA

0.873AUROC

TRACED

0.3054720.4528110.600150.747489Mar 11, 2026
Updated 1mo ago

Evaluation Results

MethodLinks
2026.03
0.8730.70940.7625
2026.03
0.85180.66170.8421
2026.03
0.84350.64970.8158
2026.03
0.79090.78470.6327
2026.03
0.77520.73140.6125
2026.03
0.76380.63330.625
2026.03
0.75830.69570.875
2026.03
0.73640.74930.6455
2026.03
0.69580.72620.8375
2026.03
0.67380.59510.8133
2026.03
0.66250.62420.7015
2026.03
0.6550.6750.725
2026.03
0.65450.54950.6954
2026.03
0.64710.65510.7308
2026.03
0.64550.61430.8394
2026.03
0.64350.60650.8077
2026.03
0.62730.55540.6364
2026.03
0.60530.64020.8474
2026.03
0.57540.50.85
2026.03
0.5270.49510.8737
2026.03
0.52290.55520.8211
2026.03
0.51330.54680.8661
2026.03
0.51110.49510.8737
2026.03
0.48080.48350.8503
2026.03
0.47190.53180.8615
2026.03
0.4630.5230.8589
2026.03
0.45830.48590.8375
2026.03
0.44580.47250.8375
2026.03
0.44170.50.8906
2026.03
0.39580.52790.8375
2026.03
0.32730.46220.8091
2026.03
0.32730.46220.8091