Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Visual Navigation on Visual Navigation (level-4)
Loading...
61.3
Pass@1
VisuoThink
-2.452
14.099
30.65
47.201
Apr 12, 2025
Pass@1
Updated 4d ago
Evaluation Results
Method
Method
Links
Pass@1
VisuoThink
Model=GPT-4o
2025.04
61.3
VisuoThink
Model=Claude-3.5-sonnet
2025.04
61.3
VisuoThink w/o rollout search
Model=Claude-3.5-sonnet
2025.04
38.7
VisuoThink w/o rollout search
Model=GPT-4o
2025.04
32.3
VoT + Executer
Model=Claude-3.5-sonnet
2025.04
22.6
VisuoThink
Model=Qwen2-VL-72B-Ins...
2025.04
12.9
VoT + Executer
Model=GPT-4o
2025.04
9.7
VisuoThink w/o rollout search
Model=Qwen2-VL-72B-Ins...
2025.04
6.5
CoT
Model=GPT-4o
2025.04
3.2
CoT
Model=Qwen2-VL-72B-Ins...
2025.04
3.2
VoT + Executer
Model=Qwen2-VL-72B-Ins...
2025.04
3.2
CoT
Model=Claude-3.5-sonnet
2025.04
3.2
VoT
Model=GPT-4o
2025.04
0
VoT
Model=Qwen2-VL-72B-Ins...
2025.04
0
VoT
Model=Claude-3.5-sonnet
2025.04
0
Feedback
Search any
task
Search any
task