Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
GUI Agent Task Execution on AndroidWorld, MobileMiniWob++, and DroidTask
Loading...
2.1
Latency (s)
AutoDroid-V2
-0.472
16.889
34.25
51.611
May 12, 2026
Latency (s)
API Token Cost (K)
Updated 21d ago
Evaluation Results
Method
Method
Links
Latency (s)
API Token Cost (K)
AutoDroid-V2
Type=Llama-3-8B-ft, In...
2026.05
2.1
-
EAM
Type=GPT-4o, Qwen2.5-3...
2026.05
2.8
8.3
UI-TARS-2B
Type=UI-TARS-2B, Input...
2026.05
6
-
Qwen2.5-VL-3B
Type=Qwen 2.5-VL-3B, I...
2026.05
7.7
-
UI-TARS-7B
Type=UI-TARS-7B, Input...
2026.05
8.8
32.7
GPT-4o
Type=GPT-4o, Input=SoM
2026.05
9.3
50.8
AppAgentX
Type=GPT-4o, Input=SoM
2026.05
16
6.2
M3A
Type=GPT-4o, Input=SoM
2026.05
16.9
62.6
GUI-Explorer
Type=GPT-4o, Input=SoM
2026.05
66.4
73.1
Feedback
Search any
task
Search any
task