Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

AndroidWorld

Benchmarks

Task NameDataset NameSOTA ResultTrend
GUI Agent TaskAndroidWorld
Success Rate80
136
Mobile Task AutomationAndroidWorld (test)
Average Success Rate1
119
GUI AgentAndroidWorld
Accuracy62
70
Mobile GUI AutomationAndroidWorld
Overall Success Rate51.7
41
GUI navigationAndroidWorld latest (test)
Success Rate76.7
35
Mobile UI ControlAndroidWorld
Overall Task Success Rate71.6
22
End-to-end GUI NavigationAndroidWorld
Success Rate77.6
21
Mobile GUI AgentsAndroidWorld 138 tasks (test)
Success Rate71.1
18
Mobile Agent Decision-makingAndroidWorld (Evaluation set 116 templates)
Average Success Rate (SR)62.9
16
Reward ModelingAndroidWorld
Precision92.5
14
End-to-End Environment InteractionAndroidWorld (test)
Pass@180.2
14
GUI Agent AutomationAndroidWorld (AW) (Online)
Success Rate25.1
6
Mobile GUI Agent Decision MakingAndroidWorld
Success Rate59.5
5
Safe NavigationAndroidWorld core20 safe general tasks
Success Count (out of 20)11
4
Mobile UseAndroidWorld
Score70.7
4
Mobile operating system task executionAndroidWorld (AW)
AUV43.2
4
Agentic Mobile InteractionAndroidWorld unseen tasks (test)
Pass@136.7
3
Evaluator AccuracyAndroidWorld
Overall Acc87.9
3
Showing 18 of 18 rows