Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Browser-use on DeepShop
Loading...
0.62
Success Rate
Gemini computer-use-preview
0.09584
0.23192
0.368
0.50408
Apr 9, 2026
Success Rate
Updated 9d ago
Evaluation Results
Method
Method
Links
Success Rate
Gemini computer-use-preview
Model Category=API onl...
2026.04
0.62
Axtree Agent (Gemini-3-flash)
Model Category=API onl...
2026.04
0.553
SoM Agent (o3)
Model Category=API onl...
2026.04
0.497
SoM Agent (GPT-5)
Model Category=API onl...
2026.04
0.491
Axtree Agent (Gemini-3-flash)
Model Category=API onl...
2026.04
0.451
MolmoWeb-8B
Model Category=Open we...
2026.04
0.423
Axtree Agent (GPT-5)
Model Category=API onl...
2026.04
0.407
MolmoWeb-4B
Model Category=Open we...
2026.04
0.356
GLM-4.1V-9B-Thinking
Model Category=Open we...
2026.04
0.32
Fara-7B
Model Category=Open we...
2026.04
0.262
OpenAI computer-use-preview
Model Category=API onl...
2026.04
0.247
SoM Agent (GPT-4o)
Model Category=API onl...
2026.04
0.16
UI-TARS-1.5-7B
Model Category=Open we...
2026.04
0.116
Feedback
Search any
task
Search any
task