Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Arithmetic Reasoning on MultiArith (test)

99.3Accuracy

PaLM

-3.847222.931449.7176.4886Oct 28, 2022May 17, 2023Dec 5, 2023Jun 24, 2024Jan 12, 2025Aug 2, 2025Feb 20, 2026
Updated 4d ago

Evaluation Results

MethodLinks
2023.11
99.3
2023.11
99.2
2024.05
99
2023.11
99
2024.05
98.5
2023.11
98.2
2023.11
98.2
2024.05
98.17
2024.05
98
2022.10
97.5
2026.02
96.1
2023.11
94.7
2026.02
92.8
2022.10
92.3
2023.11
92.3
2023.11
88.5
2023.11
88.5
2023.11
87
2023.11
86
2026.02
85
2023.11
82
2023.11
78.7
2026.02
78.3
2022.10
77.3
2023.11
75.7
2026.02
74.4
2023.11
74.3
2023.11
73.5
2023.11
73.5
2023.11
71.5
2026.02
67.8
2026.02
64.4
2026.02
63.3
2022.10
60.5
2023.11
60.5
2023.12
59
2026.02
57.8
2026.02
57.2
2023.12
53.16
2026.02
51.86
2023.11
51.8
2026.02
51.7
2026.02
51.7
2023.12
48.4
2026.02
48.32
2023.12
48.3
2026.02
46.1
2026.02
45.81
2026.02
42.66
2023.12
42.16
2026.02
38.9
2026.02
38.16
2026.02
37.8
2026.02
36.1
2023.12
34
2026.02
33.3
2026.02
32.7
2026.02
32.2
2023.12
29
2026.02
22.8
2023.12
18.83
2023.12
15.51
2023.12
9.55
2023.12
6.79
2023.12
2
2023.12
1.428
2023.12
0.12