Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

MobileIPL: Enhancing Mobile Agents Thinking Process via Iterative Preference Learning

About

The Chain of Action-Planning Thoughts (CoaT) paradigm has been shown to improve the reasoning performance of VLM-based mobile agents in GUI tasks. However, the scarcity of diverse CoaT trajectories limits the expressiveness and generalization ability of such agents. While self-training is commonly employed to address data scarcity, existing approaches either overlook the correctness of intermediate reasoning steps or depend on expensive process-level annotations to construct process reward models (PRM). To address the above problems, we propose an Iterative Preference Learning (IPL) that constructs a CoaT-tree through interative sampling, scores leaf nodes using rule-based reward, and backpropagates feedback to derive Thinking-level Direct Preference Optimization (T-DPO) pairs. To prevent overfitting during warm-up supervised fine-tuning, we further introduce a three-stage instruction evolution, which leverages GPT-4o to generate diverse Q\&A pairs based on real mobile UI screenshots, enhancing both generality and layout understanding. Experiments on three standard Mobile GUI-agent benchmarks demonstrate that our agent MobileIPL outperforms strong baselines, including continual pretraining models such as OS-ATLAS and UI-TARS. It achieves state-of-the-art performance across three standard Mobile GUI-Agents benchmarks and shows strong generalization to out-of-domain scenarios.

Kun Huang, Weikai Xu, Yuxuan Liu, Quandong Wang, Pengzhi Gao, Wei Liu, Jian Luan, Bin Wang, Bo An• 2025

Related benchmarks

TaskDatasetResultRank
GUI Agent Navigation and ActionAITZ
Success Rate (SR)69.15
18
High-level instruction followingAndroidControl
Step Accuracy72.7
16
GUI Agent Planning and ExecutionAMEX (test)
Success Rate (Gmail)77.3
12
High-level instruction executionAndroidControl IDD
Step Accuracy73.6
8
High-level instruction executionAndroidControl task-UN
Step Accuracy72.2
8
Showing 5 of 5 rows

Other info

Follow for update