Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Mathematical Reasoning on GSM8K (test) (Accuracy, Reward)

96.23Accuracy

SIGMA

16.950837.532958.11578.6971Aug 7, 2023Jan 23, 2024Jul 10, 2024Dec 27, 2024Jun 14, 2025Nov 30, 2025May 19, 2026
Updated 14d ago

Evaluation Results

MethodLinks
2026.05
96.23-
2026.05
96.1-
2026.05
95.87-
2026.05
95.31-
2026.05
95.25-
2026.05
95.13-
2026.05
94.76-
2026.05
94.74-
2026.05
94.47-
2026.05
94.34-
2026.05
94.21-
2026.05
94.15-
2026.05
94.12-
2026.05
94.1-
2026.05
94.04-
2026.05
92.8-
2026.05
92.7-
2026.05
92.3-
2026.05
91.16-
2026.05
89.98-
2026.05
89.63-
2026.05
89.52-
2026.05
89.47-
2026.05
89.14-
2026.05
87.62-
2026.05
87.57-
2026.05
87.5-
2026.05
87.45-
2026.05
87.1-
2026.05
86.12-
2023.08
84.5-
2023.08
84.5-
2023.08
82.4-
2026.05
81.65-
2026.05
81.43-
2026.05
79.3-
2026.05
78.24-
2023.08
76.9-
2026.05
72.25-
2026.05
71.42-
2023.08
71.2-
2023.08
71.1-
2026.05
69.66-
2026.05
69.45-
2026.05
69.26-
2026.05
68.16-
2023.08
66.5-
2023.08
58.1-
2025.05
587.28
2025.05
571.78
2026.05
46.7-
2026.05
46.02-
2025.05
460.47
2025.05
42.5-1.22
2025.05
37-0.53
2025.05
32-1.44
2025.05
29.5-4.75
2025.05
20-4.28