Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Multi-turn RL Task Completion on FrozenLake
Loading...
30
Success Rate
Qwen2.5-0.5B (TSR Beam Search)
19.288
22.069
24.85
27.631
Feb 12, 2026
Success Rate
Updated 4d ago
Evaluation Results
Method
Method
Links
Success Rate
Qwen2.5-0.5B (TSR Beam Search)
Model=Qwen2.5-0.5B, Tr...
2026.02
30
Qwen2.5-0.5B (TSR Lookahead)
Model=Qwen2.5-0.5B, Tr...
2026.02
27.8
GPT-4o
Protocol=zero-shot
2026.02
26.56
Qwen2.5-0.5B (TSR Best-of-N)
Model=Qwen2.5-0.5B, Tr...
2026.02
25
Qwen2.5-72B
Protocol=zero-shot
2026.02
23.83
Qwen2.5-0.5B (Instance Filtering)
Model=Qwen2.5-0.5B, Tr...
2026.02
19.7
Feedback
Search any
task
Search any
task