Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Web Agent Navigation on WorkArena L1 (full)
Loading...
79.4
Success Rate
Gemini 3.1 Pro
24.904
39.052
53.2
67.348
Apr 9, 2026
Success Rate
Updated 9d ago
Evaluation Results
Method
Method
Links
Success Rate
Gemini 3.1 Pro
Agent harness=This work
2026.04
79.4
Qwen3.5-27B
Params=27B, Agent harn...
2026.04
57
o1-mini
Agent harness=BrowserG...
2026.04
56.7
Claude 3.5 Sonnet
Agent harness=BrowserG...
2026.04
56.4
A3-Qwen3.5-9B
Params=9B, Agent harne...
2026.04
51.5
GPT-4o
Agent harness=BrowserG...
2026.04
45.5
Llama 3.1 405B
Params=405B, Agent har...
2026.04
43.3
Qwen3.5-9B (base)
Params=9B, Agent harne...
2026.04
33.3
Llama 3.1 70B
Params=70B, Agent harn...
2026.04
27.9
GPT-4o-mini
Agent harness=BrowserG...
2026.04
27
Feedback
Search any
task
Search any
task