Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Scientific Reasoning on GPQA (Acc, Tok, CR)

66.2Accuracy

DTSR

29.2838.86548.4558.035Mar 13, 2026Mar 17, 2026Mar 21, 2026Mar 26, 2026Mar 30, 2026Apr 3, 2026Apr 8, 2026
Updated 9d ago

Evaluation Results

MethodLinks
2026.04
66.25,916-
2026.04
65.77,513-
2026.04
65.67,138-
2026.03
62.79,539100
2026.03
62.311,765100
2026.03
61.111,31294.1
2026.03
60.67,72776.5
2026.03
59.97,628100
2026.03
58.59,18595
2026.03
58.29,58877.4
2026.03
57.68,594100
2026.03
54.65,75781.9
2026.03
53.96,79887.1
2026.03
52.66,24078.9
2026.03
52.17,54385.7
2026.03
51.89,76683.7
2026.03
51.37,91785.2
2026.04
511,177-
2026.03
50.58,55399.6
2026.03
49.56,02873.8
2026.03
497,45197.4
2026.03
47.37,40671.8
2026.03
45.62,10129.1
2026.03
43.22,45528.4
2026.03
415706.6
2026.03
39.72,10616
2026.03
32.11,26518.8
2026.03
30.71,20415.8