Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Mathematical Reasoning on SVAMP (Reusability Score)
Loading...
71.11
Reusability Score
Llama
19.3804
32.8102
46.24
59.6698
Feb 19, 2026
Reusability Score
Updated 4d ago
Evaluation Results
Method
Method
Links
Reusability Score
Llama
Executor=Strong Comm.
2026.02
71.11
Llama
Executor=Full Comm.
2026.02
65.43
Llama
Executor=Weak Comm.
2026.02
59.75
Phi
Executor=Strong Comm.
2026.02
59.56
Gemma
Executor=Strong Comm.
2026.02
56.9
R1
Executor=Strong Comm.
2026.02
45.76
Phi
Executor=Full Comm.
2026.02
43.69
Gemma
Executor=Full Comm.
2026.02
39.14
R1
Executor=Full Comm.
2026.02
33.79
Phi
Executor=Weak Comm.
2026.02
27.82
R1
Executor=Weak Comm.
2026.02
21.83
Gemma
Executor=Weak Comm.
2026.02
21.37
Feedback
Search any
task
Search any
task