Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Interactive environment task execution on AppWorld normal (test)
Loading...
4,554
Avg@8 Success
CuES
-182.16
1,047.42
2,277
3,506.58
Dec 1, 2025
Avg@8 Success
Greedy Success
Updated 4d ago
Evaluation Results
Method
Method
Links
Avg@8 Success
Greedy Success
CuES
Params=14B
2025.12
4,554
4,524
Qwen2.5
Params=32B
2025.12
3,316
3,473
Qwen3
Params=14B
2025.12
3,098
2,848
Qwen3
Params=32B
2025.12
2,879
3,212
Qwen3
Params=8B
2025.12
1,776
2,143
Qwen2.5
Params=14B
2025.12
1,176
1,429
Qwen3
Params=4B
2025.12
300
325
Qwen2.5
Params=7B
2025.12
125
187
Qwen2.5
Params=3B
2025.12
0
23
Feedback
Search any
task
Search any
task