Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Web task automation on VisualWebArena full
Loading...
49
SR
Gemini 3 Pro
3.552
15.351
27.15
38.949
Feb 17, 2026
Feb 25, 2026
Mar 6, 2026
Mar 14, 2026
Mar 23, 2026
Mar 31, 2026
Apr 9, 2026
SR
Updated 9d ago
Evaluation Results
Method
Method
Links
SR
Gemini 3 Pro
Category=Proprietary,...
2026.04
49
Qwen3.5-27B
Category=Open-weight (...
2026.04
37.4
Gemini 3.1 Flash L.
Category=Proprietary
2026.04
35
A3-Qwen3.5-9B
Category=A3 fine-tuned...
2026.04
33.9
A3-Qwen3.5-4B
Category=A3 fine-tuned...
2026.04
30.1
Qwen3.5-9B
Category=Open-weight (...
2026.04
28.5
Qwen3.5-4B
Category=Open-weight (...
2026.04
24.7
WAC
Model=Qwen3-VL-Plus
2026.02
24.5
ReAct
Model=Qwen3-VL-Plus
2026.02
22.7
WebDreamer
Model=Qwen3-VL-Plus
2026.02
22.7
A3-Qwen3.5-2B
Category=A3 fine-tuned...
2026.04
7.6
Qwen3.5-2B
Category=Open-weight (...
2026.04
5.3
Feedback
Search any
task
Search any
task