Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

AndroidWorld

Benchmarks

Task NameDataset NameSOTA ResultTrend
GUI Agent TaskAndroidWorld
Success Rate80
188
Mobile Task AutomationAndroidWorld (test)
Average Success Rate1
119
GUI AgentAndroidWorld
Accuracy62
70
Mobile GUI AutomationAndroidWorld
Overall Success Rate76.9
68
GUI navigationAndroidWorld latest (test)
Success Rate76.7
35
Mobile UI ControlAndroidWorld
Overall Task Success Rate71.6
30
Mobile GUI Agent Decision MakingAndroidWorld
Success Rate82.05
27
GUI Agent Task SuccessAndroidWorld (online)
Task Success Rate48.7
25
Android GUI AutomationAndroidWorld M
L1 Score53.8
22
End-to-end GUI NavigationAndroidWorld
Success Rate77.6
21
Agentic ReasoningAndroidWorld
Success Rate82.05
20
Mobile GUI AgentsAndroidWorld 138 tasks (test)
Success Rate71.1
18
Mobile UI Task AutomationAndroidWorld (full)
Success Rate (Easy)82.79
16
Mobile Agent Decision-makingAndroidWorld (Evaluation set 116 templates)
Average Success Rate (SR)62.9
16
Reward ModelingAndroidWorld
Precision92.5
14
End-to-End Environment InteractionAndroidWorld (test)
Pass@180.2
14
GUI Agent AutomationAndroidWorld (AW) (Online)
Success Rate25.1
6
Safe NavigationAndroidWorld core20 safe general tasks
Success Count (out of 20)11
4
Mobile UseAndroidWorld
Score70.7
4
Mobile operating system task executionAndroidWorld (AW)
AUV43.2
4
Agentic Mobile InteractionAndroidWorld unseen tasks (test)
Pass@136.7
3
Evaluator AccuracyAndroidWorld
Overall Acc87.9
3
Showing 22 of 22 rows