Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Scientific Reasoning on GPQA-D (Acc, Time, Token)

66.2Accuracy

RecursiveMAS

26.57636.86347.1557.437Apr 28, 2026
Updated 1mo ago

Evaluation Results

MethodLinks
2026.04
66.22,6382,524
2026.04
64.62,3422,521
2026.04
63.11,9652,675
2026.04
61.52,1903,693
2026.04
59.14,2076,128
2026.04
58.67,5378,091
2026.04
32.6861786
2026.04
32.3752813
2026.04
30.3586829
2026.04
28.71,8253,708
2026.04
28.73,3225,820
2026.04
28.11,0562,084