Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Sliding Puzzle
Loading...
56.3
Solve Rate
Adaptive (Qwen2.5-VL-7B)
-2.252
12.949
28.15
43.351
May 11, 2026
Solve Rate
Actions Taken
Updated 22d ago
Evaluation Results
Method
Method
Links
Solve Rate
Actions Taken
Adaptive (Qwen2.5-VL-7B)
Group=Ours
2026.05
56.3
37
GPT-5.5
Group=Closed (zero-shot)
2026.05
22.2
35.1
Gemini 3.1 Pro
Group=Closed (zero-shot)
2026.05
11.1
36.1
Claude Sonnet
Group=Closed (zero-shot)
2026.05
0
-
InternVL3-8B/14B/78B
Group=Open (zero-shot)
2026.05
0
-
Qwen2.5-VL-7B/72B
Group=Open (zero-shot)
2026.05
0
-
Qwen3-VL-8B/32B
Group=Open (zero-shot)
2026.05
0
-
Feedback
Search any
task
Search any
task