Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Maze Navigation on MAZENAVIGATION OOD Both 8x8 Long 1.0
Loading...
3,200
EM
Wan2.2-TI2V-5B
-128
736
1,600
2,464
Jan 28, 2026
EM
PR (Success Rate)
Updated 4d ago
Evaluation Results
Method
Method
Links
EM
PR (Success Rate)
Wan2.2-TI2V-5B
Input=Text + Image, Ou...
2026.01
3,200
4,710
Wan2.2-TI2V-5B (Unseen Visual Icons)
Input=Text + Image, Ou...
2026.01
3,200
4,230
GPT-5.1
Input=Text + Image, Ou...
2026.01
0
0
GPT-5.2
Input=Text + Image, Ou...
2026.01
0
0
Qwen3-VL-8B
Input=Text + Image, Ou...
2026.01
0
890
Qwen3-VL-8B (w/ coordinates)
Input=Text + Image, Ou...
2026.01
0
590
VPRL-7B *
Input=Image, Output=Im...
2026.01
0
70
Feedback
Search any
task
Search any
task