Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Sequential Planning on VSP-Super
Loading...
99
Success Rate (Length 16)
DiffThinker++
-3.96
22.77
49.5
76.23
Dec 30, 2025
Success Rate (Length 16)
Success Rate (Length 32)
Updated 4d ago
Evaluation Results
Method
Method
Links
Success Rate (Length 16)
Success Rate (Length 32)
DiffThinker++
Setting=Flow Matching
2025.12
99
80
DiffThinker
Setting=Flow Matching
2025.12
96
83
Qwen3-VL-32B
Setting=SFT
2025.12
85
21
Qwen3-VL-8B
Setting=SFT
2025.12
61
8
Gemini-3-Flash
Setting=N/A
2025.12
52
3
GPT-5
Setting=N/A
2025.12
3
0
Qwen3-VL-8B
Setting=N/A
2025.12
1
0
Qwen3-VL-32B
Setting=GRPO
2025.12
1
0
Qwen3-VL-8B
Setting=GRPO
2025.12
0
0
Qwen3-VL-32B
Setting=N/A
2025.12
0
0
Qwen-Image-Edit-2509
Setting=N/A
2025.12
0
0
Qwen-Image-Edit-2511
Setting=N/A
2025.12
0
0
Feedback
Search any
task
Search any
task