ZeroGUI: Automating Online GUI Learning at Zero Human Cost

About

The rapid advancement of large Vision-Language Models (VLMs) has propelled the development of pure-vision-based GUI Agents, capable of perceiving and operating Graphical User Interfaces (GUI) to autonomously fulfill user instructions. However, existing approaches usually adopt an offline learning framework, which faces two core limitations: (1) heavy reliance on high-quality manual annotations for element grounding and action supervision, and (2) limited adaptability to dynamic and interactive environments. To address these limitations, we propose ZeroGUI, a scalable, online learning framework for automating GUI Agent training at Zero human cost. Specifically, ZeroGUI integrates (i) VLM-based automatic task generation to produce diverse training goals from the current environment state, (ii) VLM-based automatic reward estimation to assess task success without hand-crafted evaluation functions, and (iii) two-stage online reinforcement learning to continuously interact with and learn from GUI environments. Experiments on two advanced GUI Agents (UI-TARS and Aguvis) demonstrate that ZeroGUI significantly boosts performance across OSWorld and AndroidLab environments. The code is available at https://github.com/OpenGVLab/ZeroGUI.

Chenyu Yang, Shiqian Su, Shi Liu, Xuan Dong, Yue Yu, Weijie Su, Xuehui Wang, Zhaoyang Liu, Jinguo Zhu, Hao Li, Wenhai Wang, Yu Qiao, Xizhou Zhu, Jifeng Dai• 2025

Related benchmarks

Task	Dataset	Result
Agentic Reasoning	Webshop	Success Rate31.29	59
Agentic Reasoning	AlfWorld	Success Rate35.76	45
Operating System GUI Agentic Reasoning	OSWorld	Success Rate20.2	42
Mobile GUI Agent Decision Making	AndroidWorld	Success Rate47.52	27
Trajectory completion judgment	OGRBench MacOS	Accuracy92.2	20
Trajectory completion judgment	OGRBench Ubuntu	Accuracy86.5	20
Trajectory completion judgment	OGRBench Web	Accuracy85.3	20
Trajectory completion judgment	OGRBench Windows	Accuracy82.2	20
Trajectory completion judgment	OGRBench (Overall)	Accuracy84.5	20
Trajectory completion judgment	OGRBench Mobile	Accuracy81.4	20

Showing 10 of 16 rows

Other info

Follow for update

@wizwand_team Discord