Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Puzzle Solving on Sokoban (Out Of Distribution)
Loading...
66.9
Avg@128
Markov
-2.676
15.387
33.45
51.513
Mar 20, 2026
Avg@128
Pass@128
Updated 26d ago
Evaluation Results
Method
Method
Links
Avg@128
Pass@128
Markov
Model=Qwen2.5-3B-It, T...
2026.03
66.9
72
Markov
Model=Qwen3-4B, Traini...
2026.03
31.6
37
State-action-sequence
Model=Qwen3-4B, Traini...
2026.03
30.2
34
State-action-sequence
Model=Qwen2.5-3B-It, T...
2026.03
20.4
23
State-action-sequence
Model=Qwen3-4B, Traini...
2026.03
0.2
15
Action-sequence
Model=Qwen3-4B, Traini...
2026.03
0
1
Action-sequence
Model=Qwen3-4B, Traini...
2026.03
0
0
Markov
Model=Qwen3-4B, Traini...
2026.03
0
3
Action-sequence
Model=Qwen2.5-3B-It, T...
2026.03
0
3
Action-sequence
Model=Qwen2.5-3B-It, T...
2026.03
0
0
Markov
Model=Qwen2.5-3B-It, T...
2026.03
0
1
State-action-sequence
Model=Qwen2.5-3B-It, T...
2026.03
0
1
Feedback
Search any
task
Search any
task