Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Web Agent Navigation on VisualWebArena (test)
Loading...
47.9
Success Rate
Gemini 3.1 Pro
16.804
24.877
32.95
41.023
Apr 9, 2026
Success Rate
Updated 9d ago
Evaluation Results
Method
Method
Links
Success Rate
Gemini 3.1 Pro
Agent harness=This work
2026.04
47.9
Qwen3.5-27B
Params=27B, Agent harn...
2026.04
37.4
A3-Qwen3.5-9B
Params=9B, Agent harne...
2026.04
33.9
Qwen3.5-9B (base)
Params=9B, Agent harne...
2026.04
28.5
GPT-4o
Agent harness=BrowserG...
2026.04
26.3
Claude 3.5 Sonnet
Agent harness=BrowserG...
2026.04
22
GPT-4o-mini
Agent harness=BrowserG...
2026.04
18
Feedback
Search any
task
Search any
task