Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Mathematical Reasoning on GSM8K (success rate)
Loading...
93.6
Success Rate
AFlow
67.808
74.504
81.2
87.896
Jun 23, 2024
Oct 1, 2024
Jan 10, 2025
Apr 21, 2025
Jul 30, 2025
Nov 8, 2025
Feb 17, 2026
Success Rate
Updated 4d ago
Evaluation Results
Method
Method
Links
Success Rate
AFlow
Method category=Automa...
2026.01
93.6
OneFlow
Method category=Single...
2026.01
93.3
OneFlow
Method category=Automa...
2026.01
93
AFlow
Method category=Single...
2026.01
92.9
CoT SC
Method category=Manual...
2026.01
92.6
Kimi-K2 Base
Architecture=MoE, Acti...
2026.02
92.1
DeepSeek-V3 Base
Architecture=MoE, Acti...
2026.02
87.6
CoT
Method category=Manual...
2026.01
87.1
MultiPersona
Method category=Manual...
2026.01
87.1
IO
Method category=Manual...
2026.01
87
OptoPrime
implementation=Trace,...
2024.06
82.5
TextGrad
version=24-10-30, opti...
2024.06
82.4
TextGrad
implementation=Trace,...
2024.06
82
TextGrad
source=Reported, optim...
2024.06
81.1
GLM-4.5 Base
Architecture=MoE, Acti...
2026.02
79.4
GLM-5 Base
Architecture=MoE, Acti...
2026.02
68.8
Feedback
Search any
task
Search any
task