Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Scientific Reasoning on GPQA (Acc, Time, Peak, Dep)

40.91Accuracy

Vanilla

-1.63649.409320.45531.5007Apr 4, 2026
Updated 11d ago

Evaluation Results

MethodLinks
2026.04
40.917.986,62226
2026.04
36.366.381,7966.4
2026.04
31.826.943,15012
2026.04
31.8112.551,5367.9
2026.04
31.317.198383.7
2026.04
30.8110.768,05539
2026.04
30.38.412,3859.3
2026.04
28.288.263,94018
2026.04
26.760.69680.5
2026.04
24.758.016,71831
2026.04
24.7515.611,2009.8
2026.04
24.750.961,2310.9
2026.04
20.25.243,97216
2026.04
19.79.143,40111
2026.04
2.5312.621,02410
2026.04
011.651,02410