OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis
About
Graphical User Interface (GUI) agents powered by Vision-Language Models (VLMs) have demonstrated human-like computer control capability. Despite their utility in advancing digital automation, a critical bottleneck persists: collecting high-quality trajectory data for training. Common practices for collecting such data rely on human supervision or synthetic data generation through executing pre-defined tasks, which are either resource-intensive or unable to guarantee data quality. Moreover, these methods suffer from limited data diversity and significant gaps between synthetic data and real-world environments. To address these challenges, we propose OS-Genesis, a novel GUI data synthesis pipeline that reverses the conventional trajectory collection process. Instead of relying on pre-defined tasks, OS-Genesis enables agents first to perceive environments and perform step-wise interactions, then retrospectively derive high-quality tasks to enable trajectory-level exploration. A trajectory reward model is then employed to ensure the quality of the generated trajectories. We demonstrate that training GUI agents with OS-Genesis significantly improves their performance on highly challenging online benchmarks. In-depth analysis further validates OS-Genesis's efficiency and its superior data quality and diversity compared to existing synthesis methods. Our codes, data, and checkpoints are available at https://qiushisun.github.io/OS-Genesis-Home/.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Web navigation and task completion | WebArena (test) | Average Task Completion14.6 | 137 | |
| GUI Agent Task | AndroidWorld | Success Rate37.9 | 136 | |
| Mobile GUI Automation | GUI-Odyssey | Success Rate (SR)34.5 | 62 | |
| GUI Action Execution | GUI-EDA | Acoustic Score (COMSOL)8 | 60 | |
| Mobile GUI Automation | AndroidWorld | Overall Success Rate17.4 | 41 | |
| GUI Agent Planning and Execution | WebArena | Success Rate (Gitlab)15.87 | 32 | |
| GUI Interaction Control | GUI-Odyssey | SR3.6 | 31 | |
| GUI Automation | AndroidControl High | Task Match (TM)65.9 | 27 | |
| GUI Automation | MiniWob++ | Success Rate19.8 | 25 | |
| GUI reasoning | AndroidControl Low | SR74.2 | 24 |