Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

GUIrilla: A Scalable Framework for Automated Desktop UI Exploration

About

The performance and generalization of foundation models for interactive systems critically depend on the availability of large-scale, realistic training data. While recent advances in large language models (LLMs) have improved GUI understanding, progress in desktop automation remains constrained by the scarcity of high-quality, publicly available desktop interaction data, particularly for macOS. We introduce GUIRILLA, a scalable data crawling framework for automated exploration of desktop GUIs. GUIRILLA is not an autonomous agent; instead, it systematically collects realistic interaction traces and accessibility metadata intended to support the training, evaluation, and stabilization of downstream foundation models and GUI agents. The framework targets macOS, a largely underrepresented platform in existing resources, and organizes explored interfaces into hierarchical MacApp Trees derived from accessibility states and user actions. As part of this work, we release these MacApp Trees as a reusable structural representation of macOS applications, enabling downstream analysis, retrieval, testing, and future agent training. We additionally release macapptree, an open-source library for reproducible accessibility-driven GUI data collection, along with the full framework implementation to support open research in desktop autonomy.

Sofiya Garkot, Maksym Shamrai, Ivan Synytsia, Mariya Hirna• 2025

Related benchmarks

TaskDatasetResultRank
GroundingScreenSpot Pro--
33
GroundingScreenSpot v2--
32
GUI Action GroundingScreenSpot-Pro (test)
Accuracy (Development)30.1
14
GroundingScreenSpot-Pro macOS
Grounding Accuracy41.39
13
Agentic UI InteractionGUIRILLA-TASK agentic--
8
Element LocalizationGUIrilla-Task (test)
Communication Accuracy65.5
7
Showing 6 of 6 rows

Other info

Follow for update