Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Sequential Planning on Maze
Loading...
100
Score (L=8)
DiffThinker
-4
23
50
77
Dec 30, 2025
Score (L=8)
Score (L=16)
Score (L=32)
Updated 4d ago
Evaluation Results
Method
Method
Links
Score (L=8)
Score (L=16)
Score (L=32)
DiffThinker
Setting=Flow Matching
2025.12
100
97
56
DiffThinker++
Setting=Flow Matching
2025.12
100
100
65
Qwen3-VL-32B
Setting=SFT
2025.12
91
57
3
Qwen3-VL-8B
Setting=SFT
2025.12
53
37
0
GPT-5
Setting=N/A
2025.12
2
0
0
Gemini-3-Flash
Setting=N/A
2025.12
0
0
0
Qwen3-VL-8B
Setting=N/A
2025.12
0
0
0
Qwen3-VL-8B
Setting=GRPO
2025.12
0
0
0
Qwen3-VL-32B
Setting=N/A
2025.12
0
0
0
Qwen3-VL-32B
Setting=GRPO
2025.12
0
0
0
Qwen-Image-Edit-2509
Setting=N/A
2025.12
0
0
0
Qwen-Image-Edit-2511
Setting=N/A
2025.12
0
0
0
Feedback
Search any
task
Search any
task