Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Web Agent Navigation on WebArena (test)
Loading...
53.8
Success Rate
Gemini 3.1 Pro
11.992
22.846
33.7
44.554
Apr 9, 2026
Success Rate
Updated 9d ago
Evaluation Results
Method
Method
Links
Success Rate
Gemini 3.1 Pro
Agent harness=This work
2026.04
53.8
Qwen3.5-27B
Params=27B, Agent harn...
2026.04
41.5
A3-Qwen3.5-9B
Params=9B, Agent harne...
2026.04
41.5
Claude 3.5 Sonnet
Agent harness=BrowserG...
2026.04
36
GPT-4o
Agent harness=BrowserG...
2026.04
31.5
Qwen3.5-9B (base)
Params=9B, Agent harne...
2026.04
31
o1-mini
Agent harness=BrowserG...
2026.04
29.9
Llama 3.1 405B
Params=405B, Agent har...
2026.04
22.6
Llama 3.1 70B
Params=70B, Agent harn...
2026.04
17.1
GPT-4o-mini
Agent harness=BrowserG...
2026.04
13.6
Feedback
Search any
task
Search any
task