Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Web Agent Task Success on WebArena Reddit 106 tasks
Loading...
83
Success Rate (SR)
ReasoningBank + MaTTS
51.592
59.746
67.9
76.054
Sep 29, 2025
Success Rate (SR)
Average Steps
Updated 1mo ago
Evaluation Results
Method
Method
Links
Success Rate (SR)
Average Steps
ReasoningBank + MaTTS
Backbone LLM=Gemini-2....
2025.09
83
5.3
ReasoningBank
Backbone LLM=Gemini-2....
2025.09
80.2
5.1
No Memory
Backbone LLM=Gemini-2....
2025.09
71.7
6
ReasoningBank + MaTTS
Backbone LLM=Gemini-2....
2025.09
70.8
5.4
Synapse
Backbone LLM=Gemini-2....
2025.09
68.9
5.9
AWM
Backbone LLM=Gemini-2....
2025.09
68.9
6.4
ReasoningBank
Backbone LLM=Gemini-2....
2025.09
67
5.6
AWM
Backbone LLM=Gemini-2....
2025.09
62.3
6.1
ReasoningBank + MaTTS
Backbone LLM=Claude-3....
2025.09
60.4
5
Synapse
Backbone LLM=Gemini-2....
2025.09
59.4
6.5
ReasoningBank
Backbone LLM=Claude-3....
2025.09
57.5
5.2
No Memory
Backbone LLM=Gemini-2....
2025.09
55.7
6.7
No Memory
Backbone LLM=Claude-3....
2025.09
53.8
5.5
Synapse
Backbone LLM=Claude-3....
2025.09
53.8
6.1
AWM
Backbone LLM=Claude-3....
2025.09
52.8
7
Feedback
Search any
task
Search any
task