Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Advancing Mobile GUI Agents: A Verifier-Driven Approach to Practical Deployment

About

We propose V-Droid, a mobile GUI task automation agent. Unlike previous mobile agents that utilize Large Language Models (LLMs) as generators to directly generate actions at each step, V-Droid employs LLMs as verifiers to evaluate candidate actions before making final decisions. To realize this novel paradigm, we introduce a comprehensive framework for constructing verifier-driven mobile agents: the discretized action space construction coupled with the prefilling-only workflow to accelerate the verification process, the pair-wise progress preference training to significantly enhance the verifier's decision-making capabilities, and the scalable human-agent joint annotation scheme to efficiently collect the necessary data at scale. V-Droid obtains a substantial task success rate across several public mobile task automation benchmarks: 59.5% on AndroidWorld, 38.3% on AndroidLab, and 49% on MobileAgentBench, surpassing existing agents by 5.2%, 2.1%, and 9%, respectively. Furthermore, V-Droid achieves a remarkably low latency of 4.3s per step, which is 6.1x faster compared with existing mobile agents. The source code is available at https://github.com/V-Droid-Agent/V-Droid.

Gaole Dai, Shiqi Jiang, Ting Cao, Yuanchun Li, Yuqing Yang, Rui Tan, Mo Li, Lili Qiu• 2025

Related benchmarks

TaskDatasetResultRank
GUI Agent TaskAndroidWorld
Success Rate59.5
104
Mobile Task AutomationAndroidWorld (test)
Average Success Rate0.595
75
Mobile GUI AutomationAndroidLab
Success Rate38.3
18
Mobile GUI Agent Decision MakingAndroidWorld
Success Rate59.5
5
Showing 4 of 4 rows

Other info

Follow for update