Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Arithmetic Reasoning on Combined Math Datasets (SVAMP, GSM8K, AddSub, MultiArith, AQUA, SingleEq)

92.9Average Score

DUP

27.317644.343861.3778.3962Apr 23, 2024Aug 20, 2024Dec 17, 2024Apr 16, 2025Aug 13, 2025Dec 10, 2025Apr 9, 2026
Updated 8d ago

Evaluation Results

MethodLinks
2024.04
92.92.3
2024.04
91.40.8
2024.04
90.6-
2024.04
89.7-0.9
2024.04
84.94
2024.04
83.42.5
2024.04
82.61.7
2024.04
82.61.7
2024.04
81.20.3
2024.04
80.9-
2026.04
60.65-
2026.04
60.46-
2026.04
60.16-
2026.04
59.59-
2026.04
59.23-
2026.04
59.23-
2026.04
59.16-
2026.04
59.11-
2026.04
58.83-
2026.04
58.75-
2026.04
56.57-
2026.04
56.46-
2026.04
34.22-
2026.04
33.63-
2026.04
33.48-
2026.04
33.36-
2026.04
32.04-
2026.04
29.84-