Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Web Agent Navigation on WorkArena L2 147-task (test)
Loading...
40
Success Rate
Gemini 3.1 Pro
0.48
10.74
21
31.26
Apr 9, 2026
Success Rate
Updated 9d ago
Evaluation Results
Method
Method
Links
Success Rate
Gemini 3.1 Pro
Agent harness=This work
2026.04
40
Claude 3.5 Sonnet
Agent harness=BrowserG...
2026.04
38.8
Qwen3.5-27B
Params=27B, Agent harn...
2026.04
18.9
o1-mini
Agent harness=BrowserG...
2026.04
14.3
A3-Qwen3.5-9B
Params=9B, Agent harne...
2026.04
9.7
Llama 3.1 405B
Params=405B, Agent har...
2026.04
8.9
GPT-4o
Agent harness=BrowserG...
2026.04
7.5
Llama 3.1 70B
Params=70B, Agent harn...
2026.04
3.4
Qwen3.5-9B (base)
Params=9B, Agent harne...
2026.04
2.2
GPT-4o-mini
Agent harness=BrowserG...
2026.04
2
Feedback
Search any
task
Search any
task