Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Web Agent Task Success on WebArena Multi 29 tasks
Loading...
20.7
Task Success Rate (SR)
ReasoningBank + MaTTS
-0.828
4.761
10.35
15.939
Sep 29, 2025
Task Success Rate (SR)
Steps Taken
Updated 1mo ago
Evaluation Results
Method
Method
Links
Task Success Rate (SR)
Steps Taken
ReasoningBank + MaTTS
Backbone LLM=Gemini-2....
2025.09
20.7
7.2
ReasoningBank + MaTTS
Backbone LLM=Gemini-2....
2025.09
17.2
8
ReasoningBank
Backbone LLM=Gemini-2....
2025.09
13.8
8.8
ReasoningBank
Backbone LLM=Gemini-2....
2025.09
13.8
8.2
No Memory
Backbone LLM=Gemini-2....
2025.09
10.3
10
Synapse
Backbone LLM=Gemini-2....
2025.09
10.3
10.5
ReasoningBank + MaTTS
Backbone LLM=Claude-3....
2025.09
10.3
9.1
No Memory
Backbone LLM=Gemini-2....
2025.09
6.9
8.8
Synapse
Backbone LLM=Gemini-2....
2025.09
6.9
9
AWM
Backbone LLM=Gemini-2....
2025.09
3.4
7.7
AWM
Backbone LLM=Gemini-2....
2025.09
3.4
9.3
ReasoningBank
Backbone LLM=Claude-3....
2025.09
3.4
10.5
No Memory
Backbone LLM=Claude-3....
2025.09
0
11.6
Synapse
Backbone LLM=Claude-3....
2025.09
0
11.8
AWM
Backbone LLM=Claude-3....
2025.09
0
12.4
Feedback
Search any
task
Search any
task