Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Mathematical Reasoning on GSM8k (Accuracy, Loss)

95.22Accuracy

QwQ-32B-Preview

-3.392822.208647.8173.4114Oct 14, 2025Nov 19, 2025Dec 25, 2025Jan 31, 2026Mar 8, 2026Apr 13, 2026May 20, 2026
Updated 13d ago

Evaluation Results

MethodLinks
95.22-
2026.05
91.28-
91.28-
2026.05
90.15-
2026.05
90.15-
2026.05
88.2-
2026.05
88.2-
2026.05
87.6-
86.2-
85.8-
2026.05
84.5-
2026.05
84.15-
2026.05
79.38-
2025.10
75.7-
2025.10
75.6-
2025.10
74.9-
2025.10
74.9-
2025.10
73.2-
2026.05
72.02-
2025.10
65.7-
2025.10
64.9-
2025.10
64.3-
2026.04
591.227
2026.04
591.227
2025.10
53.6-
2026.04
471.533
2026.04
461.236
2026.04
441.259
2026.04
441.519
2026.04
441.259
2025.10
37.8-
2026.04
221.749
2026.04
201.396
2026.04
181.708
2026.04
181.429
2025.10
11.5-
2025.10
11.1-
2025.10
0.4-