Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Building Autonomous GUI Navigation via Agentic-Q Estimation and Step-Wise Policy Optimization

About

Recent advances in Multimodal Large Language Models (MLLMs) have substantially driven the progress of autonomous agents for Graphical User Interface (GUI). Nevertheless, in real-world applications, GUI agents are often faced with non-stationary environments, leading to high computational costs for data curation and policy optimization. In this report, we introduce a novel MLLM-centered framework for GUI agents, which consists of two components: agentic-Q estimation and step-wise policy optimization. The former one aims to optimize a Q-model that can generate step-wise values to evaluate the contribution of a given action to task completion. The latter one takes step-wise samples from the state-action trajectory as inputs, and optimizes the policy via reinforcement learning with our agentic-Q model. It should be noticed that (i) all state-action trajectories are produced by the policy itself, so that the data collection costs are manageable; (ii) the policy update is decoupled from the environment, ensuring stable and efficient optimization. Empirical evaluations show that our framework endows Ovis2.5-9B with powerful GUI interaction capabilities, achieving remarkable performances on GUI navigation and grounding benchmarks and even surpassing contenders with larger scales.

Yibo Wang, Guangda Huzhang, Yuwei Hu, Yu Xia, Shiyin Lu, Qing-Guo Chen, Zhao Xu, Weihua Luo, Kaifu Zhang, Lijun Zhang• 2026

Related benchmarks

TaskDatasetResultRank
GUI GroundingScreenSpot v2
Avg Accuracy92.45
203
GUI GroundingScreenSpot V1
Mobile Text Accuracy95.24
15
GUI NavigationWebVoyager
Success Rate (Allrecipes)88.89
12
GUI NavigationMind2Web Online (Average)
Success Rate64
10
GUI NavigationOnline-Mind2Web (Easy)
Success Rate78.31
9
GUI NavigationOnline-Mind2Web (Medium)
Success Rate65.73
9
GUI NavigationOnline-Mind2Web (Hard)
Success Rate51.35
9
Showing 7 of 7 rows

Other info

Follow for update