Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Science Reasoning on GPQA (r* Accuracy and r_self)

30.9r* Accuracy

Phi

9.16414.80720.4526.093May 6, 2026
Updated 27d ago

Evaluation Results

MethodLinks
2026.05
30.90.215
2026.05
30.90.214
2026.05
29.80.262
2026.05
29.20.284
2026.05
29.10.201
2026.05
28.70.118
2026.05
28.50.21
2026.05
28.30.118
2026.05
28.10.113
2026.05
280.437
2026.05
27.90.087
2026.05
27.80.426
2026.05
27.70.088
2026.05
27.70.149
2026.05
27.70.109
2026.05
27.50.165
2026.05
26.80.321
2026.05
26.70.319
2026.05
24.51.527
2026.05
24.41.531
2026.05
22.30.802
2026.05
22.30.8
2026.05
10.30.675
2026.05
100.675