Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Science Reasoning on GPQA Diamond (Accuracy, Avg., Drop)

83Accuracy

GPT-5-Mini

5.613625.704345.79565.8857Jan 21, 2026Jan 31, 2026Feb 10, 2026Feb 20, 2026Mar 2, 2026Mar 12, 2026Mar 23, 2026
Updated 25d ago

Evaluation Results

MethodLinks
2026.03
83--
83--
2026.03
71.7--
2026.03
71.4--
2026.03
69.7--
2026.03
68.6--
2026.03
68.3--
2026.03
66.9--
2026.03
65.8--
2026.03
65--
2026.03
64.7--
2026.03
64.7--
2026.03
64.7--
2026.03
63.9--
2026.03
63.64--
2026.03
63.6--
2026.03
63.4--
2026.03
63.3--
2026.03
59.7--
2026.03
59.6--
2026.03
58.1--
2026.01
56.0670.63-
2026.03
56.06--
2026.03
55.56--
2026.01
48.4860.78-9.85
2026.01
47.4758.28-12.35
2026.01
36.8748.72-
2026.01
32.8341.31-7.41
2026.01
31.8238.39-10.33
2026.01
29.817.34-23.76
2026.01
28.4541.1-
2026.01
26.9421.44-19.66
2026.01
24.244.84-36.26
2026.01
8.592.11-46.61