Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Math Reasoning on GSM8K (Accuracy and Token Delta)

96.7Accuracy

Qwen3-Next-80B

59.88469.4427988.558Oct 1, 2025Nov 2, 2025Dec 5, 2025Jan 6, 2026Feb 8, 2026Mar 12, 2026Apr 14, 2026
Updated 1mo ago

Evaluation Results

MethodLinks
2025.10
96.7-
2025.10
96.5-9.1
2025.10
96.5-24.3
2025.10
96-
2025.10
96-
2025.10
95.8-29.4
2025.10
95.3-23.6
2025.10
95.2-34
2025.10
95.1-
2025.10
95.1-34.1
2025.10
95.1-31.6
2025.10
95-15.4
2025.10
94.8-18.9
2025.10
94.8-23
2025.10
94.5-40.4
2025.10
94.5-30.1
2025.10
94.4-
2025.10
94.2-25.1
2025.10
93.9-37.3
2025.10
93.4-100
2025.10
93.3-100
2025.10
92.8-40.2
2025.10
92.7-
2025.10
92.6-100
2025.10
92.6-100
2025.10
92.4-25
2025.10
92.1-2.4
2025.10
91.7-
2025.10
91.7-25.6
2025.10
91.3-42.5
2025.10
91-100
2025.10
90.6-30.8
2026.04
90.6-
2026.04
89.7-
2026.04
89.1-
2025.10
89-60.5
2026.04
88.5-
2025.10
87.8-54.9
2026.04
87-
2026.04
87-
2026.04
86.7-
2026.04
86.4-
2025.10
85.9-100
2026.04
85.8-
2026.04
85.7-
2026.04
84.1-
2026.04
80.1-
2026.04
77.3-
2025.10
61.3-87.8