Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Tool-calling on Retail-3I 1.0 (Infeasible)
Loading...
0.578
Pass@1
Qwen3-4B
0.57592
0.57646
0.577
0.57754
Jan 28, 2026
Pass@1
Pass@2
Pass@3
Updated 4d ago
Evaluation Results
Method
Method
Links
Pass@1
Pass@2
Pass@3
Qwen3-4B
Training=Fine-tuned
2026.01
0.578
0.499
0.463
Qwen3-8B
Training=Fine-tuned
2026.01
0.576
0.504
0.463
Feedback
Search any
task
Search any
task