UI-Venus-1.5 Technical Report
About
GUI agents have emerged as a powerful paradigm for automating interactions in digital environments, yet achieving both broad generality and consistently strong task performance remains challenging. In this report, we present UI-Venus-1.5, a unified, end-to-end GUI Agent designed for robust real-world applications. The proposed model family comprises two dense variants (2B and 8B) and one mixture-of-experts variant (30B-A3B) to meet various downstream application scenarios. Compared to our previous version, UI-Venus-1.5 introduces three key technical advances: (1) a comprehensive Mid-Training stage leveraging 10 billion tokens across 30+ datasets to establish foundational GUI semantics; (2) Online Reinforcement Learning with full-trajectory rollouts, aligning training objectives with long-horizon, dynamic navigation in large-scale environments; and (3) a single unified GUI Agent constructed via Model Merging, which synthesizes domain-specific models (grounding, web, and mobile) into one cohesive checkpoint. Extensive evaluations demonstrate that UI-Venus-1.5 establishes new state-of-the-art performance on benchmarks such as ScreenSpot-Pro (69.6%), VenusBench-GD (75.0%), and AndroidWorld (77.6%), significantly outperforming previous strong baselines. In addition, UI-Venus-1.5 demonstrates robust navigation capabilities across a variety of Chinese mobile apps, effectively executing user instructions in real-world scenarios. Code: https://github.com/inclusionAI/UI-Venus; Model: https://huggingface.co/collections/inclusionAI/ui-venus
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| GUI Grounding | ScreenSpot Pro | Average Score57.7 | 458 | |
| GUI Grounding | ScreenSpot v2 | Avg Accuracy95.9 | 371 | |
| GUI Grounding | ScreenSpot Pro | Accuracy69.6 | 195 | |
| GUI Agent Task | AndroidWorld | Success Rate65.9 | 188 | |
| GUI Grounding | OSWorld-G | Average Score70.6 | 144 | |
| Mobile Task Automation | AndroidWorld (test) | Average Success Rate0.776 | 119 | |
| Grounding | ScreenSpot v2 | -- | 47 | |
| Web navigation | Mind2Web | Overall Success Rate5.88 | 41 | |
| Mobile GUI Automation | AndroidLab | Success Rate55.1 | 25 | |
| Action Prediction | AndroidControl High v2 | Pass@1 Step Accuracy61.06 | 22 |