Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Mathematical Reasoning on GSM-Hard

78Solve Rate

GPT-4o

40.35250.12659.969.674Nov 18, 2022Jun 2, 2023Dec 16, 2023Jun 30, 2024Jan 13, 2025Jul 29, 2025Feb 11, 2026
Updated 3d ago

Evaluation Results

MethodLinks
2025.12
78----
2024.02
77.6----
2025.11
77.03-1,838--
2025.11
77.03-1,785--
2025.11
76.88-2,308--
2025.12
75.7----
2025.11
73.31-4,253--
2025.11
72.86-4,285--
2025.12
72.6----
2025.11
71.95-5,509--
2025.11
71.87-4,429--
2025.11
71.49-5,318--
2025.12
71.3----
2025.11
71.27-4,387--
2025.11
70.96-4,238--
2025.11
70.36-5,674--
2025.11
70.13-4,516--
2026.01
69.3----
2026.01
69.3----
2025.12
69.1----
2024.10
68.372.7---
2026.01
67.9----
2026.01
67.9----
2024.02
67.2----
2026.01
67----
2024.02
66.6----
2026.01
66.6----
2025.11
66.34-4,854--
2024.10
66.372.5---
2026.01
66.2----
2026.01
66----
2025.11
65.88-5,235--
2024.02
65.7----
2025.11
65.43-4,989--
2026.01
65.3----
2025.11
64.9-2,122--
2026.01
64.5----
2026.01
64.4----
2026.01
64.2----
2024.02
64----
2026.01
63.9----
2024.02
63.7----
2024.02
63.7----
2026.01
63.7----
2026.01
63.4----
2026.01
62.9----
2026.01
62.9----
2026.01
62.9----
2026.01
62.5----
2026.01
62.4----
2026.01
62.4----
2026.01
62.4----
2026.01
62.2----
2024.02
61.9----
2025.12
61.8----
2026.01
61.7----
2022.11
61.2----
2025.12
61.2----
2026.01
60.7----
2024.02
60.5----
2026.01
60.5----
2024.02
60.1----
2026.01
59.8----
2026.01
59.3----
2026.01
59.2----
2026.01
58.8----
2026.01
58.5----
2026.01
58.3----
2026.01
58.1----
2026.01
57.5----
2026.01
57.2----
2026.01
57----
2026.01
56.2----
2026.01
56.1----
2026.01
56.1----
2026.01
56.1----
2024.02
56----
2026.01
56----
2026.01
56----
2026.01
56----
2026.01
55.4----
2026.01
55.4----
2026.01
55.4----
2026.01
55----
2026.01
54.8----
2024.01
54.1----
2026.01
53.9----
2026.01
53.8----
2026.01
53.4----
2026.01
53.4----
2026.01
53.4----
2026.01
52.8----
2024.01
51.8----
2026.01
50.5----
2026.02
49.73----
2026.01
49.5----
2026.01
49.5----
2026.01
46.5----
2026.01
42.2----
2026.01
41.8----
Showing 100 of 164 rows