Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Web Interaction on WebShop
Loading...
39.2
Pass@1
Mem2Evolve
21.6344
26.1947
30.755
35.3153
Apr 13, 2026
Pass@1
Updated 5d ago
Evaluation Results
Method
Method
Links
Pass@1
Mem2Evolve
Backbone=GPT-5-Chat
2026.04
39.2
AFLOW
Backbone=GPT-5-Chat
2026.04
37.9
EvoAgent
Backbone=GPT-5-Chat
2026.04
37.8
DyLAN
Backbone=GPT-5-Chat
2026.04
36.4
DSPy
Backbone=GPT-5-Chat
2026.04
35.5
SwarmAgentic
Backbone=GPT-5-Chat
2026.04
34.12
AgentVerse
Backbone=GPT-5-Chat
2026.04
32.53
AutoAgents
Backbone=GPT-5-Chat
2026.04
31.4
Alita
Backbone=GPT-5-Chat
2026.04
30.21
GPT-5-Chat (CoT)
Backbone=GPT-5-Chat
2026.04
27.49
GPT-5-Chat (ReAct)
Backbone=GPT-5-Chat
2026.04
25.1
GPT-5-Chat (Direct)
Backbone=GPT-5-Chat
2026.04
22.31
Feedback
Search any
task
Search any
task