Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Mathematical Reasoning on GSM8K (test)

97.72Accuracy

Claude 3.5 Sonnet

86.675289.542692.4195.2774Mar 15, 2023Sep 3, 2023Feb 23, 2024Aug 14, 2024Feb 2, 2025Jul 25, 2025Jan 14, 2026
Updated 3d ago

Evaluation Results

MethodLinks
97.72-1.35-----------
2023.10
97-------------
2023.12
97-------------
2023.11
97-------------
2024.02
97-------------
2024.06
96.43-1.71-----------
2026.01
96.09-------------
2026.01
95.45-------------
2024.01
95.4-------------
2024.05
95.38-------------
2025.03
95.2-------------
2024.01
95.1-------------
2025.02
95-------------
2024.06
95-------------
2026.01
94.91-------------
2026.01
94.89-------------
2024.06
94.7-------------
2026.01
94.66-------------
2024.06
94.5-------------
2023.12
94.4-------------
2024.02
94.4-------------
2024.05
94.31-------------
2024.01
94.2-------------
2026.01
93.98-------------
2026.01
93.92-------------
2024.01
93.9-------------
2024.01
93.9-------------
2026.01
93.88-------------
2024.01
93.8-------------
2024.01
93.7-------------
2026.01
93.68-------------
2024.01
93.5-------------
2026.01
93.42-------------
2024.01
93.3-------------
2025.03
93.3-------------
2026.01
93.17-------------
2024.01
93.1-------------
2024.05
93-------------
2025.02
92.9-------------
2025.03
92.9-------------
2024.05
92.87-------------
2026.01
92.68-------------
2026.01
92.67-------------
2025.03
92.6-------------
2026.01
92.3-------------
2026.01
92.16-------------
2023.08
92-------------
2023.10
92-------------
2023.03
92-------------
2024.02
92-------------
2023.09
92-------------
2025.03
92-------------
2025.03
91.7-------------
2025.03
91.7-------------
2024.09
91.6-------------
2025.02
91.4-------------
2024.06
91.28-2.17-----------
2025.03
91.2-------------
2024.09
90.8-------------
2026.01
90.76-------------
2025.03
90.6-------------
2024.06
90.4-------------
2025.03
90.4-------------
2024.06
90.3-------------
2024.06
90.2-------------
2023.05
90-------------
2024.07
90-------------
2025.03
89.9-------------
2024.06
89.6-------------
2024.09
89.5-------------
2025.03
89.4-------------
2024.06
89.3-------------
2025.02
89.26-------------
2025.03
89.2-------------
2025.02
89.12-------------
2024.09
89.1-------------
2024.09
89-------------
2025.03
88.6-------------
2025.03
88.5-------------
2025.03
88.4-------------
2023.11
88.3-------------
2024.02
88.2-------------
2024.02
88.2-------------
2024.06
88.2-------------
2024.06
88.2-------------
2024.06
88.2-------------
2025.02
88.1-------------
2025.03
88.1-------------
2024.06
88-------------
2024.06
87.8-------------
2025.02
87.64-------------
2024.02
87.6-------------
2024.06
87.6-------------
2025.03
87.6-------------
2025.03
87.5-------------
2023.03
87.3-------------
2023.11
87.3-------------
2024.06
87.3-------------
2023.11
87.1-------------
2024.06
87.1-------------
Showing 100 of 777 rows
...