Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Mathematical Reasoning on AGIEval-MATH (test)

93.3Accuracy

Agent 1

23.82841.86459.977.936Jan 30, 2026Feb 15, 2026Mar 4, 2026Mar 20, 2026Apr 6, 2026Apr 22, 2026May 9, 2026
Updated 22d ago

Evaluation Results

MethodLinks
2026.05
93.3
2026.05
93.3
2026.05
89.3
2026.05
87.8
2026.05
80.9
2026.05
79.8
2026.05
77.6
2026.05
73.7
2026.05
73.6
2026.05
73.6
2026.05
73.5
2026.05
73.4
2026.05
72.9
2026.05
68.1
2026.05
65.4
2026.05
64
2026.05
63.8
2026.05
62.8
2026.05
54.7
2026.01
52.1
2026.01
47.5
2026.01
46.1
2026.01
45.9
2026.01
45.3
2026.01
44.5
2026.01
44.4
2026.01
44.4
2026.01
42.1
2026.01
42.1
2026.01
41.4
2026.05
26.5