Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Web Agent Navigation on Mind2Web (Cross-Website)
Loading...
46.1
Execution Accuracy (EA)
ReasoningBank
38.82
40.71
42.6
44.49
Sep 29, 2025
Execution Accuracy (EA)
Action F1 (AF1)
Success Rate (SSR)
Success Rate (SR)
Updated 1mo ago
Evaluation Results
Method
Method
Links
Execution Accuracy (EA)
Action F1 (AF1)
Success Rate (SSR)
Success Rate (SR)
ReasoningBank
Backbone=Gemini-2.5-pro
2025.09
46.1
54.8
36.9
3.8
ReasoningBank
Backbone=Gemini-2.5-flash
2025.09
44.3
52.6
33.9
2.3
AWM
Backbone=Gemini-2.5-pro
2025.09
41.9
47.9
34.8
2.3
Synapse
Backbone=Gemini-2.5-pro
2025.09
41.8
51.2
35
3.2
No Memory
Backbone=Gemini-2.5-pro
2025.09
41.2
49.8
34.8
3.4
Synapse
Backbone=Gemini-2.5-flash
2025.09
40.3
46
32.1
1.9
No Memory
Backbone=Gemini-2.5-flash
2025.09
39.8
45.1
31.7
1.7
AWM
Backbone=Gemini-2.5-flash
2025.09
39.1
42.2
31.7
2.1
Feedback
Search any
task
Search any
task