Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Web Agent Task Success on WebArena Shopping 187 tasks
Loading...
54
Task Success Rate
ReasoningBank + MaTTS
37.88
42.065
46.25
50.435
Sep 29, 2025
Task Success Rate
Average Steps
Updated 1mo ago
Evaluation Results
Method
Method
Links
Task Success Rate
Average Steps
ReasoningBank + MaTTS
Backbone LLM=Gemini-2....
2025.09
54
5.9
ReasoningBank + MaTTS
Backbone LLM=Gemini-2....
2025.09
53
6.3
ReasoningBank
Backbone LLM=Gemini-2....
2025.09
51.9
6
ReasoningBank
Backbone LLM=Gemini-2....
2025.09
49.7
6.1
AWM
Backbone LLM=Gemini-2....
2025.09
48.1
6.4
ReasoningBank + MaTTS
Backbone LLM=Claude-3....
2025.09
47.1
5.8
Synapse
Backbone LLM=Gemini-2....
2025.09
46.5
6.6
No Memory
Backbone LLM=Gemini-2....
2025.09
45.5
7.6
ReasoningBank
Backbone LLM=Claude-3....
2025.09
44.9
5.6
AWM
Backbone LLM=Gemini-2....
2025.09
44.4
7
Synapse
Backbone LLM=Gemini-2....
2025.09
40.6
7
Synapse
Backbone LLM=Claude-3....
2025.09
39.6
5.8
AWM
Backbone LLM=Claude-3....
2025.09
39.6
7.2
No Memory
Backbone LLM=Gemini-2....
2025.09
39
8.2
No Memory
Backbone LLM=Claude-3....
2025.09
38.5
6.1
Feedback
Search any
task
Search any
task