Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

AMEX

Benchmarks

Task NameDataset NameSOTA ResultTrend
Mobile UI ReasoningAMEX
Gmail Accuracy76.2
17
GUI Agent Planning and ExecutionAMEX (test)
Success Rate (Gmail)77.3
12
GUI Task ExecutionAmex
Success Rate67.45
8
Mobile GUI AutomationAMEX
Type Success Rate77.57
7
Action PredictionAMEX (test)
Action Accuracy72.61
4
GroundingAMEX (test)
Grounding Score79.19
4
Optical Character RecognitionAMEX (test)
OCR Accuracy79.67
4
Image CaptioningAMEX (test)
Caption Score54.53
4
GUI NavigationAMEX (High)
Action Matching Score (AMS)80.7
3
Showing 9 of 9 rows