UI-S1: Advancing GUI Automation via Semi-online Reinforcement Learning
About
Graphical User Interface (GUI) agents have demonstrated remarkable progress in automating complex user interface interactions through reinforcement learning. However, current approaches face a fundamental dilemma: offline RL enables stable training on pre-collected trajectories, but struggles with multi-step task execution for lack of trajectory-level reward signals; online RL captures these signals through environment interaction, but suffers from sparse rewards and prohibitive deployment costs. To address it, we present Semi-online Reinforcement Learning, a novel paradigm that simulates online RL on offline trajectories. During each rollout process, we preserve the original model output within the multi-turn dialogue, where a Patch Module adaptively recovers the divergence between rollout and expert trajectories. To capture long-term training signals, Semi-online RL introduces discounted future returns into the reward computation and optimizes the policy with weighted step-level and episode-level advantages. We further introduce Semi-Online Performance (SOP), a metric that aligns better with true online performance, serving as a practical and effective proxy for real-world evaluation. Experiments show that ours Semi-online RL achieves SOTA performance among 7B models across four dynamic benchmarks, with significant gains over the base model (e.g., +12.0% on AndroidWorld, +23.8% on AITW), demonstrating significant progress in bridging the gap between offline training efficiency and online multi-turn reasoning. The code is available at https://github.com/X-PLUG/MobileAgent/tree/main/UI-S1.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| GUI Grounding | ScreenSpot Pro | Average Score30.6 | 307 | |
| GUI Grounding | ScreenSpot v2 | Avg Accuracy90.1 | 283 | |
| GUI Grounding | ScreenSpot Pro | -- | 163 | |
| Mobile GUI Automation | GUI-Odyssey | Success Rate (SR)52.8 | 62 | |
| Mobile GUI Automation | AndroidWorld | Overall Success Rate34 | 41 | |
| GUI Interaction Control | GUI-Odyssey | SR59.5 | 31 | |
| GUI Automation | AndroidControl High | Task Match (TM)79.9 | 27 | |
| UI Element Grounding | ScreenSpot Overall v2 | Overall Accuracy (Avg)90.1 | 26 | |
| GUI Automation | MiniWob++ | Success Rate60.9 | 25 | |
| GUI reasoning | AndroidControl Low | SR89.2 | 24 |