Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
GUI Agent Evaluation on Industrial e-commerce dataset
Loading...
95.8
Accuracy
GUIDE
63.7888
72.0994
80.41
88.7206
Apr 6, 2026
Accuracy
Precision
Recall
F1 Score
Updated 12d ago
Evaluation Results
Method
Method
Links
Accuracy
Precision
Recall
F1 Score
GUIDE
2026.04
95.8
95
93.62
94.31
WebJudge
2026.04
90.45
84.04
91.59
87.66
Autonomous Eval
2026.04
84.44
77.93
80.87
79.37
Plan&Solve-like
2026.04
82.94
72.91
85.8
78.83
AgentTrek
2026.04
65.02
51.42
100
67.91
Feedback
Search any
task
Search any
task