Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Multi-Armed Bandit on Bandit
Loading...
95
Success Rate (pass@1)
Qwen2.5-1.5B-It + Evolving Stage
27.4
44.95
62.5
80.05
Jan 29, 2026
Success Rate (pass@1)
Updated 4d ago
Evaluation Results
Method
Method
Links
Success Rate (pass@1)
Qwen2.5-1.5B-It + Evolving Stage
Backbone=Qwen2.5-1.5B-...
2026.01
95
Scout-DQN
Architecture=Small Neu...
2026.01
93
Qwen2.5-3B-It + Evolving Stage
Backbone=Qwen2.5-3B-It...
2026.01
93
LLaMA3.1-1B-It + Evolving Stage
Backbone=LLaMA3.1-1B-I...
2026.01
81
DeepSeek-V3
Model Type=Proprietary
2026.01
81
Scout-PPO
Architecture=Small Neu...
2026.01
79
Qwen2.5-3B-It
Backbone=Qwen2.5-3B-It
2026.01
77
Qwen2.5-0.5B-It + Evolving Stage
Backbone=Qwen2.5-0.5B-...
2026.01
74
GPT-4o-mini
Model Type=Proprietary
2026.01
73
GPT-5-nano
Model Type=Proprietary
2026.01
71
Gemini-2.5-Pro
Model Type=Proprietary
2026.01
69
GPT-OSS-120B
Model Type=Proprietary
2026.01
66
Qwen2.5-1.5B-It
Backbone=Qwen2.5-1.5B-It
2026.01
63
Qwen2.5-0.5B-It - Multi-turn PPO
Backbone=Qwen2.5-0.5B-...
2026.01
62
Qwen2.5-0.5B-It - Exploration & Distillation Stage
Backbone=Qwen2.5-0.5B-...
2026.01
60
Qwen2.5-0.5B-It - State Estimation RL
Backbone=Qwen2.5-0.5B-...
2026.01
54
LLaMA3.1-1B-It
Backbone=LLaMA3.1-1B-It
2026.01
43
Qwen2.5-0.5B-It
Backbone=Qwen2.5-0.5B-...
2026.01
39
Qwen2.5-0.5B-It - SPA
Backbone=Qwen2.5-0.5B-...
2026.01
30
Feedback
Search any
task
Search any
task