Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Puzzle Solving on Sokoban (In Distribution)
Loading...
89.7
Average Score @128
Markov
-3.38
20.785
44.95
69.115
Mar 20, 2026
Average Score @128
Pass@128
Updated 26d ago
Evaluation Results
Method
Method
Links
Average Score @128
Pass@128
Markov
Model=Qwen2.5-3B-It, T...
2026.03
89.7
93
Markov
Model=Qwen3-4B, Traini...
2026.03
76.1
81
State-action-sequence
Model=Qwen3-4B, Traini...
2026.03
57.4
67
State-action-sequence
Model=Qwen2.5-3B-It, T...
2026.03
43.6
50
Action-sequence
Model=Qwen3-4B, Traini...
2026.03
2.3
4
State-action-sequence
Model=Qwen3-4B, Traini...
2026.03
1.4
61
Action-sequence
Model=Qwen2.5-3B-It, T...
2026.03
1
1
State-action-sequence
Model=Qwen2.5-3B-It, T...
2026.03
0.6
41
Action-sequence
Model=Qwen2.5-3B-It, T...
2026.03
0.5
37
Markov
Model=Qwen3-4B, Traini...
2026.03
0.4
28
Action-sequence
Model=Qwen3-4B, Traini...
2026.03
0.2
22
Markov
Model=Qwen2.5-3B-It, T...
2026.03
0.2
14
Feedback
Search any
task
Search any
task