Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Task Success Rate on MobileMiniWob++
Loading...
80.4
Task Success Rate
GUI-Explorer
29.544
42.747
55.95
69.153
Dec 20, 2024
Mar 14, 2025
Jun 7, 2025
Aug 31, 2025
Nov 23, 2025
Feb 16, 2026
May 12, 2026
Task Success Rate
Updated 21d ago
Evaluation Results
Method
Method
Links
Task Success Rate
GUI-Explorer
Type=GPT-4o, Input=SoM
2026.05
80.4
EAM
Type=GPT-4o, Qwen2.5-3...
2026.05
76.1
AppAgentX
Type=GPT-4o, Input=SoM
2026.05
72.8
M3A
Type=GPT-4o, Input=SoM
2026.05
68.5
Qwen2.5-VL-72B
2025.02
68
SoM
Input=Image + AXTree,...
2024.12
67.7
Aguvis-72B
2025.02
66
GPT-4o
2025.02
61
Claude
Set-of-Mark (SoM)=true
2025.02
61
Aria-UI
Input=Image, Planner=G...
2024.12
60.4
Choice
Input=AXTree, Planner=...
2024.12
59.7
Choice
Input=AXTree, Planner=...
2024.12
57.4
GPT-4o
Type=GPT-4o, Input=SoM
2026.05
56.5
UI-TARS-7B
Type=UI-TARS-7B, Input...
2026.05
53.3
AutoDroid-V2
Type=Llama-3-8B-ft, In...
2026.05
53.3
Qwen2-VL-72B
Set-of-Mark (SoM)=true
2025.02
50
UGround
Input=Image, Planner=G...
2024.12
48.4
Gemini 2.0
Set-of-Mark (SoM)=true
2025.02
42
SoM
Input=Image + AXTree,...
2024.12
40.3
Qwen 2.5-VL-3B
Type=Qwen 2.5-VL-3B, I...
2026.05
32.6
UI-TARS-2B
Type=UI-TARS-2B, Input...
2026.05
31.5
Feedback
Search any
task
Search any
task