Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Planning on TravelPlanner (test)
Loading...
0.271
Success Rate
AutoRefine
0.0786
0.12855
0.1785
0.22845
Jan 30, 2026
Success Rate
Steps
Updated 1mo ago
Evaluation Results
Method
Method
Links
Success Rate
Steps
AutoRefine
Backbone=GPT-4-turbo
2026.01
0.271
21.8
ReAct
Backbone=GPT-4-turbo
2026.01
0.104
26.1
ReAct + Reflexion
Backbone=GPT-4-turbo
2026.01
0.091
80.2
Reflexion
Backbone=GPT-4-turbo
2026.01
0.086
77.9
Feedback
Search any
task
Search any
task