Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Visual Navigation on Visual Navigation level-5
Loading...
5,320
Pass@1
VisuoThink
-212.8
1,223.6
2,660
4,096.4
Apr 12, 2025
Pass@1
Updated 4d ago
Evaluation Results
Method
Method
Links
Pass@1
VisuoThink
Model=Claude-3.5-sonnet
2025.04
5,320
VisuoThink w/o rollout search
Model=Claude-3.5-sonnet
2025.04
4,190
VisuoThink
Model=GPT-4o
2025.04
1,940
VoT + Executer
Model=Claude-3.5-sonnet
2025.04
1,610
VisuoThink w/o rollout search
Model=GPT-4o
2025.04
1,130
VoT + Executer
Model=GPT-4o
2025.04
480
CoT
Model=GPT-4o
2025.04
0
VoT
Model=GPT-4o
2025.04
0
CoT
Model=Claude-3.5-sonnet
2025.04
0
VoT
Model=Claude-3.5-sonnet
2025.04
0
Feedback
Search any
task
Search any
task