GUIrilla: A Scalable Framework for Automated Desktop UI Exploration

About

The performance and generalization of foundation models for interactive systems critically depend on the availability of large-scale, realistic training data. While recent advances in large language models (LLMs) have improved GUI understanding, progress in desktop automation remains constrained by the scarcity of high-quality, publicly available desktop interaction data, particularly for macOS. We introduce GUIRILLA, a scalable data crawling framework for automated exploration of desktop GUIs. GUIRILLA is not an autonomous agent; instead, it systematically collects realistic interaction traces and accessibility metadata intended to support the training, evaluation, and stabilization of downstream foundation models and GUI agents. The framework targets macOS, a largely underrepresented platform in existing resources, and organizes explored interfaces into hierarchical MacApp Trees derived from accessibility states and user actions. As part of this work, we release these MacApp Trees as a reusable structural representation of macOS applications, enabling downstream analysis, retrieval, testing, and future agent training. We additionally release macapptree, an open-source library for reproducible accessibility-driven GUI data collection, along with the full framework implementation to support open research in desktop autonomy.

Sofiya Garkot, Maksym Shamrai, Ivan Synytsia, Mariya Hirna• 2025

Related benchmarks

Task	Dataset	Result
Grounding	ScreenSpot Pro	--	82
Grounding	ScreenSpot v2	Grounding Accuracy94.73	50
GUI Action Grounding	ScreenSpot-Pro (test)	Accuracy (Development)30.1	14
Grounding	ScreenSpot-Pro macOS	Grounding Accuracy41.39	13
Agentic UI Interaction	GUIRILLA-TASK agentic	--	8
Element Localization	GUIrilla-Task (test)	Communication Accuracy65.5	7

Showing 6 of 6 rows

Other info

Follow for update

@wizwand_team Discord