Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Scientific Reasoning on GPQA Main (Accuracy, Tokens)

20.8Accuracy

Static

14.0415.79517.5519.305Apr 6, 2026
Updated 10d ago

Evaluation Results

MethodLinks
2026.04
20.810,994
2026.04
19.46,415
2026.04
18.15,712
2026.04
17.45,910
2026.04
17.45,940
2026.04
17.24,843
2026.04
16.73,929
2026.04
16.52,971
2026.04
15.62,258
2026.04
14.72,544
2026.04
14.31,822