UI-S1: Advancing GUI Automation via Semi-online Reinforcement Learning

About

Graphical User Interface (GUI) agents have demonstrated remarkable progress in automating complex user interface interactions through reinforcement learning. However, current approaches face a fundamental dilemma: offline RL enables stable training on pre-collected trajectories, but struggles with multi-step task execution for lack of trajectory-level reward signals; online RL captures these signals through environment interaction, but suffers from sparse rewards and prohibitive deployment costs. To address it, we present Semi-online Reinforcement Learning, a novel paradigm that simulates online RL on offline trajectories. During each rollout process, we preserve the original model output within the multi-turn dialogue, where a Patch Module adaptively recovers the divergence between rollout and expert trajectories. To capture long-term training signals, Semi-online RL introduces discounted future returns into the reward computation and optimizes the policy with weighted step-level and episode-level advantages. We further introduce Semi-Online Performance (SOP), a metric that aligns better with true online performance, serving as a practical and effective proxy for real-world evaluation. Experiments show that ours Semi-online RL achieves SOTA performance among 7B models across four dynamic benchmarks, with significant gains over the base model (e.g., +12.0% on AndroidWorld, +23.8% on AITW), demonstrating significant progress in bridging the gap between offline training efficiency and online multi-turn reasoning. The code is available at https://github.com/X-PLUG/MobileAgent/tree/main/UI-S1.

Zhengxi Lu, Jiabo Ye, Fei Tang, Yongliang Shen, Haiyang Xu, Ziwei Zheng, Weiming Lu, Ming Yan, Fei Huang, Jun Xiao, Yueting Zhuang• 2025

Related benchmarks

Task	Dataset	Result
GUI Grounding	ScreenSpot Pro	Average Score30.6	458
GUI Grounding	ScreenSpot v2	Avg Accuracy90.1	371
GUI Grounding	ScreenSpot Pro	--	195
Mobile GUI Automation	AndroidWorld	Overall Success Rate34	68
Mobile GUI Automation	GUI-Odyssey	Success Rate (SR)52.8	62
GUI Interaction Control	GUI-Odyssey	SR59.5	31
GUI Automation	AndroidControl High	Task Match (TM)79.9	27
UI Element Grounding	ScreenSpot Overall v2	Overall Accuracy (Avg)90.1	26
GUI Automation	MiniWob++	Success Rate60.9	25
GUI reasoning	AndroidControl Low	SR89.2	24

Showing 10 of 21 rows

Other info

Follow for update

@wizwand_team Discord