Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Reasoning Quality Assessment on GPQA

0.83AUROC

TRACED

0.2936720.4329110.572150.711389Mar 11, 2026
Updated 1mo ago

Evaluation Results

MethodLinks
2026.03
0.830.66070.64
2026.03
0.76360.75830.475
2026.03
0.76220.7450.4257
2026.03
0.760.76830.6
2026.03
0.75880.54510.8571
2026.03
0.75710.65520.5
2026.03
0.73440.67940.525
2026.03
0.73330.66610.4286
2026.03
0.71940.6680.5383
2026.03
0.7180.55050.7143
2026.03
0.7050.73280.425
2026.03
0.68330.6450.6889
2026.03
0.68060.70380.7778
2026.03
0.680.69620.4402
2026.03
0.63890.65190.7778
2026.03
0.60.67870.8
2026.03
0.580.60.85
2026.03
0.560.5450.6
2026.03
0.560.65870.8
2026.03
0.5490.50.85
2026.03
0.540.67460.8576
2026.03
0.52380.53990.8714
2026.03
0.50.60.8317
2026.03
0.48570.51590.8823
2026.03
0.47220.60.7444
2026.03
0.39170.38710.8723
2026.03
0.38570.49170.8678
2026.03
0.38370.4750.8887
2026.03
0.38310.41070.8767
2026.03
0.3810.54030.8325
2026.03
0.31860.39590.8324
2026.03
0.31430.37090.8672