Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

AMEX: Android Multi-annotation Expo Dataset for Mobile GUI Agents

About

AI agents have drawn increasing attention mostly on their ability to perceive environments, understand tasks, and autonomously achieve goals. To advance research on AI agents in mobile scenarios, we introduce the Android Multi-annotation EXpo (AMEX), a comprehensive, large-scale dataset designed for generalist mobile GUI-control agents which are capable of completing tasks by directly interacting with the graphical user interface (GUI) on mobile devices. AMEX comprises over 104K high-resolution screenshots from popular mobile applications, which are annotated at multiple levels. Unlike existing GUI-related datasets, e.g., Rico, AitW, etc., AMEX includes three levels of annotations: GUI interactive element grounding, GUI screen and element functionality descriptions, and complex natural language instructions with stepwise GUI-action chains. We develop this dataset from a more instructive and detailed perspective, complementing the general settings of existing datasets. Additionally, we finetune a baseline model SPHINX Agent and illustrate the effectiveness of AMEX.The project is available at https://yxchai.com/AMEX/.

Yuxiang Chai, Siyuan Huang, Yazhe Niu, Han Xiao, Liang Liu, Dingyu Zhang, Shuai Ren, Hongsheng Li• 2024

Related benchmarks

TaskDatasetResultRank
Mobile AutomationAndroid In The Wild (AITW)
Average Score76.28
21
Element GroundingScreenSpot mobile
Icon/Widget Grounding Score72.6
6
GUI NavigationAMEX (High)
Action Matching Score (AMS)70.7
3
Showing 3 of 3 rows

Other info

Code

Follow for update