Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Mobile-Agent-v3: Fundamental Agents for GUI Automation

About

This paper introduces GUI-Owl, a foundational GUI agent model that achieves state-of-the-art performance among open-source end-to-end models on ten GUI benchmarks across desktop and mobile environments, covering grounding, question answering, planning, decision-making, and procedural knowledge. GUI-Owl-7B achieves 66.4 on AndroidWorld and 29.4 on OSWorld. Building on this, we propose Mobile-Agent-v3, a general-purpose GUI agent framework that further improves performance to 73.3 on AndroidWorld and 37.7 on OSWorld, setting a new state-of-the-art for open-source GUI agent frameworks. GUI-Owl incorporates three key innovations: (1) Large-scale Environment Infrastructure: a cloud-based virtual environment spanning Android, Ubuntu, macOS, and Windows, enabling our Self-Evolving GUI Trajectory Production framework. This generates high-quality interaction data via automated query generation and correctness validation, leveraging GUI-Owl to refine trajectories iteratively, forming a self-improving loop. It supports diverse data pipelines and reduces manual annotation. (2) Diverse Foundational Agent Capabilities: by integrating UI grounding, planning, action semantics, and reasoning patterns, GUI-Owl supports end-to-end decision-making and can act as a modular component in multi-agent systems. (3) Scalable Environment RL: we develop a scalable reinforcement learning framework with fully asynchronous training for real-world alignment. We also introduce Trajectory-aware Relative Policy Optimization (TRPO) for online RL, achieving 34.9 on OSWorld. GUI-Owl and Mobile-Agent-v3 are open-sourced at https://github.com/X-PLUG/MobileAgent.

Jiabo Ye, Xi Zhang, Haiyang Xu, Haowei Liu, Junyang Wang, Zhaoqing Zhu, Ziwei Zheng, Feiyu Gao, Junjie Cao, Zhengxi Lu, Jitong Liao, Qi Zheng, Fei Huang, Jingren Zhou, Ming Yan• 2025

Related benchmarks

TaskDatasetResultRank
GUI GroundingScreenSpot v2
Avg Accuracy93.2
203
GUI GroundingScreenSpot Pro
Average Score5.80e+3
169
GUI Agent TaskAndroidWorld
Success Rate73.3
104
GUI GroundingScreenSpot Pro
Accuracy58
77
Mobile Task AutomationAndroidWorld (test)
Average Success Rate0.733
75
GUI GroundingOSWorld-G
Average Score58
74
Mobile GUI AutomationGUI-Odyssey
Success Rate (SR)60.7
50
GUI GroundingMMBench-GUI L2 (test)
Error (Windows, Basic)85.6
46
GroundingScreenSpot v2
Accuracy93.1
23
GUI GroundingScreenSpot Desktop V2
Text Accuracy97.9
21
Showing 10 of 31 rows

Other info

Follow for update