Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Mathematical Reasoning on AGIEval MATH (Accuracy)
Loading...
95.7
Accuracy
UPA
84.26
87.23
90.2
93.17
Jan 30, 2026
Accuracy
Updated 3d ago
Evaluation Results
Method
Method
Links
Accuracy
UPA
Executor=GPT-5
2026.01
95.7
IO
Executor=GPT-5
2026.01
95.3
SPO
Executor=GPT-5
2026.01
94.9
CoT
Executor=GPT-5
2026.01
94.8
UPA
Executor=DeepSeek-V3.2
2026.01
93.1
CoT
Executor=DeepSeek-V3.2
2026.01
91.7
IO
Executor=DeepSeek-V3.2
2026.01
89.5
UPA
Executor=Claude-4.5-So...
2026.01
86.6
SPO
Executor=DeepSeek-V3.2
2026.01
86.3
CoT
Executor=Claude-4.5-So...
2026.01
86.2
IO
Executor=Claude-4.5-So...
2026.01
85.9
SPO
Executor=Claude-4.5-So...
2026.01
84.7
Feedback
Search any
task
Search any
task