Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Planning on TravelPlanner (Pass@1)
Loading...
59.25
Pass@1
Mem2Evolve
37.8572
43.4111
48.965
54.5189
Apr 13, 2026
Pass@1
Updated 5d ago
Evaluation Results
Method
Method
Links
Pass@1
Mem2Evolve
Backbone=GPT-5-Chat
2026.04
59.25
SwarmAgentic
Backbone=GPT-5-Chat
2026.04
59.14
AFLOW
Backbone=GPT-5-Chat
2026.04
53.24
EvoAgent
Backbone=GPT-5-Chat
2026.04
49.2
Alita
Backbone=GPT-5-Chat
2026.04
48.32
AgentVerse
Backbone=GPT-5-Chat
2026.04
47.25
DSPy
Backbone=GPT-5-Chat
2026.04
44.9
AutoAgents
Backbone=GPT-5-Chat
2026.04
43.52
DyLAN
Backbone=GPT-5-Chat
2026.04
43.15
GPT-5-Chat (CoT)
Backbone=GPT-5-Chat
2026.04
39.51
GPT-5-Chat (ReAct)
Backbone=GPT-5-Chat
2026.04
39.13
GPT-5-Chat (Direct)
Backbone=GPT-5-Chat
2026.04
38.68
Feedback
Search any
task
Search any
task