Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Math Reasoning on SVAMP (Accuracy)

94.2Accuracy

GPT-4o

21.29640.22359.1578.077Mar 10, 2026Mar 15, 2026Mar 20, 2026Mar 25, 2026Mar 30, 2026Apr 4, 2026Apr 9, 2026
Updated 8d ago

Evaluation Results

MethodLinks
2026.03
94.2
2026.03
94
2026.03
93.8
2026.03
92.8
2026.04
91.8
2026.04
87.43
2026.03
87.2
2026.04
86.97
2026.04
86.8
2026.04
86.4
2026.03
86.1
2026.04
85.6
2026.03
82.5
2026.03
79.4
2026.03
76.3
2026.03
74.4
2026.04
73
2026.04
71.2
2026.04
70.4
2026.04
70.17
2026.03
69.9
2026.04
68.1
2026.04
56
2026.04
55.5
2026.04
55.1
2026.04
54.9
2026.04
53.5
2026.04
53.4
2026.04
52.3
2026.04
52.1
2026.04
51.1
2026.04
50.9
2026.04
49.6
2026.04
49
2026.04
29.5
2026.04
28.2
2026.04
27.3
2026.04
27
2026.04
24.9
2026.04
24.1