Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Mathematical Reasoning on TabMWP (accuracy)

97.61Accuracy

Zero-shot-EI

65.983674.194382.40590.6157Feb 15, 2024Jul 3, 2024Nov 19, 2024Apr 8, 2025Aug 25, 2025Jan 11, 2026May 31, 2026
Updated 1d ago

Evaluation Results

MethodLinks
2026.02
97.61
2026.02
97.43
2025.08
97.2
2026.02
97.1
2025.08
96.9
2026.02
96.73
2026.02
96.15
2026.02
96.12
2024.05
95.9
2024.02
95.9
2026.02
95.67
2026.02
95.62
2026.05
95.2
2026.02
95.12
2026.02
95.1
2026.02
94.76
2026.02
94.17
2025.06
93.9
2025.06
93.9
2026.02
93.88
2026.05
93.7
2025.08
93.2
2025.06
92.9
2026.05
92.3
2026.05
91.9
2026.02
91.23
2024.05
90.8
2026.02
90.55
2025.08
89.9
2025.06
89.2
2026.04
88.4
2026.04
88.4
2026.04
88.4
2026.04
88.2
2026.05
87.4
2026.04
87.1
2026.04
86.3
2026.02
86.21
2026.02
84.71
2024.05
84.7
2026.05
84.5
2025.08
84
2024.05
82
2025.09
81.8
2026.04
81.6
2024.05
80.5
2024.05
79.9
2024.05
79.9
2024.05
79.2
2026.05
78.5
2025.08
78.4
2025.09
77
2024.05
75.6
2026.02
75.18
2024.05
75.1
2024.05
74.8
2026.01
74.3
2024.02
74.2
2026.01
74.2
2024.02
74
2026.01
73.9
2026.01
73.9
2026.01
73.6
2026.01
72.8
2026.01
71.5
2024.02
70.8
2024.05
70.5
2024.02
70.5
2026.01
70.4
2026.01
70.4
2026.01
70.4
2026.01
70.4
2026.01
70.3
2026.01
70.3
2026.01
70.3
2026.01
70.3
2026.01
70.2
2024.05
70.1
2024.05
70
2024.02
70
2026.01
70
2026.01
69.9
2026.01
69.9
2026.01
69.9
2026.01
69.9
2024.05
69.8
2026.01
69.7
2026.01
69.6
2026.01
69.6
2026.01
69.5
2025.09
69.5
2026.01
69.1
2026.05
68.8
2026.01
68.6
2026.01
68.6
2026.01
68.5
2026.01
68.3
2024.05
67.5
2024.05
67.3
2026.01
67.2
Showing 100 of 203 rows