Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Web Task Execution on AppWorld Normal (test)
Loading...
89.5
Task Goal Success Rate
Baseline Agent
52.58
62.165
71.75
81.335
Mar 11, 2026
Task Goal Success Rate
Scenario Goal Success Rate
Updated 1mo ago
Evaluation Results
Method
Method
Links
Task Goal Success Rate
Scenario Goal Success Rate
Baseline Agent
Type=Difficulty 1, Mem...
2026.03
89.5
79
Baseline Agent
Type=Aggregate, Memory...
2026.03
69.6
50
Baseline Agent
Type=Difficulty 2, Mem...
2026.03
66.7
56.2
Baseline Agent
Type=Difficulty 3, Mem...
2026.03
54
19.1
Feedback
Search any
task
Search any
task