Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Multi-turn RL Task Completion on Sokoban

38.3Success Rate

Qwen2.5-0.5B (TSR Beam Search)

18.779223.847128.91533.9829Feb 12, 2026
Updated 4d ago

Evaluation Results

MethodLinks
2026.02
38.3
2026.02
36.1
2026.02
33.3
2026.02
29
2026.02
27.73
2026.02
19.53