ScaleCUA: Scaling Open-Source Computer Use Agents with Cross-Platform Data

About

Vision-Language Models (VLMs) have enabled computer use agents (CUAs) that operate GUIs autonomously, showing great potential, yet progress is limited by the lack of large-scale, open-source computer use data and foundation models. In this work, we introduce ScaleCUA, a step toward scaling open-source CUAs. It offers a large-scale dataset spanning 6 operating systems and 3 task domains, built via a closed-loop pipeline uniting automated agents with human experts. Trained on this scaled-up data, ScaleCUA can operate seamlessly across platforms. Specifically, it delivers strong gains over baselines (+26.6 on WebArena-Lite-v2, +10.7 on ScreenSpot-Pro) and sets new state-of-the-art results (94.4% on MMBench-GUI L1-Hard, 60.6% on OSWorld-G, 47.4% on WebArena-Lite-v2). These findings underscore the power of data-driven scaling for general-purpose computer use agents. We will release data, models, and code to advance future research: https://github.com/OpenGVLab/ScaleCUA.

Zhaoyang Liu, Jingjing Xie, Zichen Ding, Zehao Li, Bowen Yang, Zhenyu Wu, Xuehui Wang, Qiushi Sun, Shi Liu, Weiyun Wang, Shenglong Ye, Qingyun Li, Xuan Dong, Yue Yu, Chenyu Lu, YunXiang Mo, Yao Yan, Zeyue Tian, Xiao Zhang, Yuan Huang, Yiqian Liu, Weijie Su, Gen Luo, Xiangyu Yue, Biqing Qi, Kai Chen, Bowen Zhou, Yu Qiao, Qifeng Chen, Wenhai Wang• 2025

Related benchmarks

Task	Dataset	Result
GUI Grounding	ScreenSpot Pro	Average Score40.8	458
GUI Agent Task	AndroidWorld	Success Rate23.7	188
Mobile Task Automation	AndroidWorld (test)	Average Success Rate0.237	119
GUI Automation	OSWorld Verified (test)	Overall Success Rate15	40
GUI Web Agent Navigation	Mind2web Online	Overall Average Score23.7	37
GUI Navigation	AndroidWorld latest (test)	Success Rate23.7	35
Windows UI Navigation	WindowsAgentArena (WAA)	Success Rate24.2	33
GUI Agent Task Success	AndroidWorld (online)	Task Success Rate32.2	25
Action Prediction	AndroidControl Low v2	--	22
Step Accuracy	AndroidControl High Level v2	Pass@156.5	20

Showing 10 of 15 rows

Other info

Follow for update

@wizwand_team Discord