Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

GUIrilla

Benchmarks

Task NameDataset NameSOTA ResultTrend
Agentic UI InteractionGUIRILLA-TASK agentic
Input Success Rate12.5
8
Element LocalizationGUIrilla-Task (test)
Communication Accuracy65.5
7
Showing 2 of 2 rows