Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Science Reasoning on GPQA Diamond (Accuracy, Avg., Drop)

83Accuracy

GPT-5-Mini

5.613625.704345.79565.8857Jan 21, 2026Feb 5, 2026Feb 20, 2026Mar 8, 2026Mar 23, 2026Apr 7, 2026Apr 23, 2026
Updated 1mo ago

Evaluation Results

MethodLinks
2026.03
83--
83--
2026.03
71.7--
2026.03
71.4--
2026.03
69.7--
2026.03
68.6--
2026.03
68.3--
2026.03
66.9--
2026.03
65.8--
2026.03
65--
2026.03
64.7--
2026.03
64.7--
2026.03
64.7--
2026.03
63.9--
2026.03
63.64--
2026.03
63.6--
2026.03
63.4--
2026.03
63.3--
2026.04
60.1--
2026.03
59.7--
2026.03
59.6--
2026.03
58.1--
2026.04
57.5--
2026.04
56.5--
2026.01
56.0670.63-
2026.03
56.06--
2026.03
55.56--
2026.04
55.1--
2026.04
53.3--
2026.04
53--
2026.04
52--
2026.04
52--
2026.04
51.5--
2026.04
51--
2026.04
48.5--
2026.01
48.4860.78-9.85
2026.04
47.9--
2026.01
47.4758.28-12.35
2026.04
46.4--
2026.04
46.4--
2026.04
45.5--
2026.04
44.9--
2026.04
43.4--
2026.04
42.4--
2026.04
41.4--
2026.04
39.9--
2026.01
36.8748.72-
2026.04
36.4--
2026.01
32.8341.31-7.41
2026.01
31.8238.39-10.33
2026.01
29.817.34-23.76
2026.04
29.8--
2026.01
28.4541.1-
2026.01
26.9421.44-19.66
2026.01
24.244.84-36.26
2026.01
8.592.11-46.61