Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Reasoning on GPQA (Acc@1, Avg. Perf.)

44Accuracy@1 (GPQA)

DisambiguSLM

35.47237.68639.942.114Apr 25, 2026
Updated 1mo ago

Evaluation Results

MethodLinks
2026.04
4468.6
2026.04
43.666.9
2026.04
43.366.6
2026.04
42.465
2026.04
41.663.8
2026.04
41.666.3
2026.04
41.365
2026.04
41.164.8
2026.04
41.166
2026.04
40.964.5
2026.04
40.564.8
2026.04
40.261.1
2026.04
40.263.9
2026.04
40.163.9
2026.04
4064
2026.04
39.862.3
2026.04
39.663.1
2026.04
39.161.4
2026.04
38.961.6
2026.04
38.761.2
2026.04
38.662.7
2026.04
38.562.4
2026.04
38.260.4
2026.04
38.262
2026.04
38.162.2
2026.04
3861.7
2026.04
37.958.8
2026.04
37.661.4
2026.04
37.561.6
2026.04
37.157.93
2026.04
36.960.9
2026.04
36.459.2
2026.04
35.858.2