Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Mathematical Reasoning on GSM8K (Reusability Score)
Loading...
94.68
Reusability Score
Phi
46.7464
59.1907
71.635
84.0793
Feb 19, 2026
Reusability Score
Updated 4d ago
Evaluation Results
Method
Method
Links
Reusability Score
Phi
Executor=Strong Comm.
2026.02
94.68
Llama
Executor=Strong Comm.
2026.02
87.82
R1
Executor=Strong Comm.
2026.02
84.54
Phi
Executor=Full Comm.
2026.02
83.53
Gemma
Executor=Strong Comm.
2026.02
80.75
Llama
Executor=Full Comm.
2026.02
73.46
Phi
Executor=Weak Comm.
2026.02
72.37
R1
Executor=Full Comm.
2026.02
66.56
Gemma
Executor=Full Comm.
2026.02
66.04
Llama
Executor=Weak Comm.
2026.02
59.1
Gemma
Executor=Weak Comm.
2026.02
51.33
R1
Executor=Weak Comm.
2026.02
48.59
Feedback
Search any
task
Search any
task