| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Web Navigation and Automation | WorkArena Held-out Tasks (test) | Success Rate70 | 16 | |
| Web Navigation and Automation | WorkArena Held-out Goals (test) | Success Rate53.8 | 16 | |
| Enterprise interface task completion | WorkArena L1 | Task Success Rate79.7 | 14 | |
| Reward Modeling | WorkArena | Pairwise Accuracy84.33 | 13 | |
| Web Agent Navigation | WorkArena L2 147-task (test) | Success Rate40 | 10 | |
| Web Agent Navigation | WorkArena L1 (full) | Success Rate79.4 | 10 | |
| Enterprise interface task completion | WorkArena++ L2 | Success Rate41.6 | 9 | |
| Web Task Automation | WorkArena L1 | Average Reward68 | 8 | |
| Enterprise interface interaction | WorkArena L2 full benchmark | Success Rate69.4 | 3 | |
| Enterprise interface interaction | WorkArena L2 (test) | Success Rate9.7 | 2 |