MobileIPL: Enhancing Mobile Agents Thinking Process via Iterative Preference Learning

About

The Chain of Action-Planning Thoughts (CoaT) paradigm has been shown to improve the reasoning performance of VLM-based mobile agents in GUI tasks. However, the scarcity of diverse CoaT trajectories limits the expressiveness and generalization ability of such agents. While self-training is commonly employed to address data scarcity, existing approaches either overlook the correctness of intermediate reasoning steps or depend on expensive process-level annotations to construct process reward models (PRM). To address the above problems, we propose an Iterative Preference Learning (IPL) that constructs a CoaT-tree through interative sampling, scores leaf nodes using rule-based reward, and backpropagates feedback to derive Thinking-level Direct Preference Optimization (T-DPO) pairs. To prevent overfitting during warm-up supervised fine-tuning, we further introduce a three-stage instruction evolution, which leverages GPT-4o to generate diverse Q\&A pairs based on real mobile UI screenshots, enhancing both generality and layout understanding. Experiments on three standard Mobile GUI-agent benchmarks demonstrate that our agent MobileIPL outperforms strong baselines, including continual pretraining models such as OS-ATLAS and UI-TARS. It achieves state-of-the-art performance across three standard Mobile GUI-Agents benchmarks and shows strong generalization to out-of-domain scenarios.

Kun Huang, Weikai Xu, Yuxuan Liu, Quandong Wang, Pengzhi Gao, Wei Liu, Jian Luan, Bin Wang, Bo An• 2025

Related benchmarks

Task	Dataset	Result
GUI Agent Navigation and Action	AITZ	Success Rate (SR)69.15	18
High-level instruction following	AndroidControl	Step Accuracy72.7	16
GUI Agent Planning and Execution	AMEX (test)	Success Rate (Gmail)77.3	12
High-level instruction execution	AndroidControl IDD	Step Accuracy73.6	8
High-level instruction execution	AndroidControl task-UN	Step Accuracy72.2	8

Showing 5 of 5 rows

Other info

Follow for update

@wizwand_team Discord