Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Science Reasoning on GPQA-D (Acc, ∆Tok)

76.3Accuracy

Qwen3-Next-80B

28.4640.8853.365.72Oct 1, 2025Nov 9, 2025Dec 18, 2025Jan 27, 2026Mar 7, 2026Apr 15, 2026May 25, 2026
Updated 8d ago

Evaluation Results

MethodLinks
2025.10
76.3-
2025.10
72.7-25.6
2025.10
68.2-19.6
2025.10
68.2-1.3
2025.10
67.7-1.1
2025.10
65.7-19.9
2025.10
65.2-
2025.10
64.1-
2025.10
63.1-19.1
2025.10
62.6-22.6
2025.10
62.6-32.7
2025.10
62.6-
2025.10
62.1-20.5
2025.10
61.1-39.9
2025.10
60.6-
2025.10
60.1-12.4
2025.10
59.1-62.8
2025.10
57.6-26.6
2025.10
57.6-66.4
2025.10
55.6-29.7
2025.10
54.5-100
2025.10
54-100
2025.10
53.5-51
2026.05
52.44-
2025.10
51-
2026.05
50.57-
2026.05
50.25-
2026.05
50.11-
2025.10
50-100
2026.05
49.89-
2026.05
49.83-
2025.10
49.5-100
2026.05
49.13-
2026.05
48.81-
2026.05
48.44-
2025.10
48-
2025.10
47.5-54.2
2025.10
47-67.5
2026.05
46.37-
2025.10
46-100
2025.10
45.5-51.5
2026.05
45.4-
2026.05
44.06-
2026.05
43.95-
2025.10
43.9-16.7
2025.10
43.4-35.5
2026.05
42.68-
2026.05
42.07-
2025.10
36.4-76.8
2026.05
35.45-
2025.10
32.3-71.2
2025.10
30.3-100