| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| GUI Agent Task | AndroidWorld | Success Rate80 | 136 | |
| Mobile Task Automation | AndroidWorld (test) | Average Success Rate1 | 119 | |
| GUI Agent | AndroidWorld | Accuracy62 | 70 | |
| Mobile GUI Automation | AndroidWorld | Overall Success Rate51.7 | 41 | |
| GUI navigation | AndroidWorld latest (test) | Success Rate76.7 | 35 | |
| Mobile UI Control | AndroidWorld | Overall Task Success Rate71.6 | 22 | |
| End-to-end GUI Navigation | AndroidWorld | Success Rate77.6 | 21 | |
| Mobile GUI Agents | AndroidWorld 138 tasks (test) | Success Rate71.1 | 18 | |
| Mobile Agent Decision-making | AndroidWorld (Evaluation set 116 templates) | Average Success Rate (SR)62.9 | 16 | |
| Reward Modeling | AndroidWorld | Precision92.5 | 14 | |
| End-to-End Environment Interaction | AndroidWorld (test) | Pass@180.2 | 14 | |
| GUI Agent Automation | AndroidWorld (AW) (Online) | Success Rate25.1 | 6 | |
| Mobile GUI Agent Decision Making | AndroidWorld | Success Rate59.5 | 5 | |
| Safe Navigation | AndroidWorld core20 safe general tasks | Success Count (out of 20)11 | 4 | |
| Mobile Use | AndroidWorld | Score70.7 | 4 | |
| Mobile operating system task execution | AndroidWorld (AW) | AUV43.2 | 4 | |
| Agentic Mobile Interaction | AndroidWorld unseen tasks (test) | Pass@136.7 | 3 | |
| Evaluator Accuracy | AndroidWorld | Overall Acc87.9 | 3 |