Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Scientific Reasoning on GPQA (Accuracy, Mean, Drop)

73.74Accuracy

Saw-INT4

-2.949616.960236.8756.7798May 18, 2026
Updated 14d ago

Evaluation Results

MethodLinks
2026.05
73.7477.950.06
2026.05
73.5778.160.27
2026.05
73.2377.89-
2026.05
68.0175.14-2.75
2026.05
67.2775.64-
2026.05
66.6778.150.26
2026.05
66.3773.11-2.53
2026.05
64.9571.864-3.78
2026.05
60.474.17-0.02
2026.05
59.2974.430.24
2026.05
58.6971.99-2.2
2026.05
58.4974.19-
2026.05
56.6770.84-
2026.05
55.0556.88-13.96
2026.05
55.0569.416-1.42
2026.05
54.8569.97-0.87
2026.05
54.5560.49-17.4
2026.05
41.4131.74-43.9
2026.05
19.77.9-66.29
2026.05
14.9810.14-60.7
2026.05
0.341.4-74.24
2026.05
00-75.64
2026.05
00-70.84
2026.05
00-74.19