Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Multi-step Reasoning on SVAMP (Accuracy)
Loading...
94
Accuracy
eMoT
58.64
67.82
77
86.18
Jun 1, 2026
Accuracy
Updated 1d ago
Evaluation Results
Method
Method
Links
Accuracy
eMoT
Models=Qwen-32B
2026.06
94
BoT
Models=GPT-4
2026.06
91.3
Qwen-32B (Direct)
Models=Qwen-32B
2026.06
83
PaL
Models=Codex (175B)
2026.06
79.4
ToT
Models=GPT-4
2026.06
60
Feedback
Search any
task
Search any
task