Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Feasibility Prediction on Sokoban
Loading...
46.4
F1@1
Claude Sonnet 4.6
4.696
15.523
26.35
37.177
May 29, 2026
F1@1
F1@All
F1 (Fail)
Updated 1d ago
Evaluation Results
Method
Method
Links
F1@1
F1@All
F1 (Fail)
Claude Sonnet 4.6
2026.05
46.4
53.6
33.9
Claude Opus 4.7
2026.05
46.3
45.6
16
Gemini 3.1 Pro Preview
2026.05
40
61.9
79.9
GPT-5.2 Instant
2026.05
27.7
40.6
32.8
Qwen3 235B
2026.05
6.3
12.6
20.4
Feedback
Search any
task
Search any
task