Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Interactive Agent on Sokoban
Loading...
64.6
Pass@1
LEAFE
4.592
20.171
35.75
51.329
Mar 17, 2026
Pass@1
Pass@128
Updated 1mo ago
Evaluation Results
Method
Method
Links
Pass@1
Pass@128
LEAFE
Backbone=Qwen2.5-7B
2026.03
64.6
78.4
LEAFE
Backbone=Llama3.1-8B
2026.03
62
77.2
ACE
Backbone=Qwen2.5-7B
2026.03
61.3
70.8
ACE
Backbone=Llama3.1-8B
2026.03
60.79
73.2
GRPO-RLVR
Backbone=Llama3.1-8B
2026.03
60.43
73.4
EARLY-EXP
Backbone=Qwen2.5-7B
2026.03
60.15
71.6
GRPO-RLVR
Backbone=Qwen2.5-7B
2026.03
58.15
68
EARLY-EXP
Backbone=Llama3.1-8B
2026.03
57.32
68.2
BASE (NO FT)
Backbone=Llama3.1-8B
2026.03
17.7
61.4
BASE (NO FT)
Backbone=Qwen2.5-7B
2026.03
6.9
43.8
Feedback
Search any
task
Search any
task