UI-AGILE: Advancing GUI Agents with Effective Reinforcement Learning and Precise Inference-Time Grounding

About

The emergence of Multimodal Large Language Models (MLLMs) has driven significant advances in Graphical User Interface (GUI) agent capabilities. Nevertheless, existing GUI agent training and inference techniques still suffer from a dilemma for reasoning designs, ineffective reward, and visual noise. To address these issues, we introduce UI-AGILE for enhancing GUI agents at both training and inference. For training, we propose a suite of improvements to the Supervised Fine-Tuning (SFT) process: 1) a continuous reward function to incentivize high-precision grounding; 2) a ``Simple Thinking'' reward to balance planning with speed and grounding accuracy; and 3) a cropping-based resampling strategy to mitigate the sparse reward problem and improve learning on complex tasks. For inference, we present decomposed grounding with selection to dramatically improve grounding accuracy on high-resolution displays by breaking the image into smaller, manageable parts. Experiments show that UI-AGILE achieves the state-of-the-art grounding performance on two benchmarks ScreenSpot-Pro and ScreenSpot-v2 while it also exhibits strong general agent capabilities. For instance, using both our training and inference enhancement methods brings 23\% grounding accuracy improvement over the best baseline on ScreenSpot-Pro. We provide the code in https://github.com/KDEGroup/UI-AGILE.

Shuquan Lian, Yuhang Wu, Jia Ma, Yifan Ding, Zihan Song, Bingqi Chen, Xiawu Zheng, Hui Li, Rongrong Ji• 2025

Related benchmarks

Task	Dataset	Result
GUI Grounding	ScreenSpot Pro	Average Score44	458
GUI Grounding	ScreenSpot v2	Avg Accuracy92.1	371
GUI Grounding	ScreenSpot Pro	--	195
Grounding	ScreenSpot Pro	Average Grounding Accuracy48.7	82
GUI Grounding	ScreenSpot Mobile V2	Text Accuracy100	60
GUI Grounding	ScreenSpot Web V2	Text Accuracy94.2	60
GUI Grounding	ScreenSpot Desktop V2	Text Accuracy95.6	60
GUI Interaction Control	GUI-Odyssey	SR37	31
GUI Navigation	AITW	Overall Success Rate48.66	27
GUI reasoning	AndroidControl Low	SR77.6	24

Showing 10 of 25 rows

Other info

Follow for update

@wizwand_team Discord