Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

TravelPlanner

Benchmarks

Task NameDataset NameSOTA ResultTrend
Travel PlanningTravelPlanner
Average Tokens Used14.8
46
Travel planningTravelPlanner (val)
Delivery Rate100
25
PlanningTravelPlanner #180 (val)
CS-Micro95.64
22
Travel PlanningTravelPlanner (test)
Commonsense Constraint (Micro)98.83
18
Long-horizon planningTravelPlanner
Delivery Rate100
13
Travel PlanningTravelPlanner 1000 tasks (test)
Commonsense Score (Micro)94.72
13
PlanningTravelPlanner
Pass@159.25
12
End-to-end planningTravelPlanner
Success Rate (CS/HD Avg)0.225
12
Constraint Satisfaction Plan GenerationTravelPlanner
Delivery Rate100
11
Multi-agent PlanningTravelPlanner (val)
Final Pass Rate3.33
8
Sole PlanningTravelPlanner (val)
Final Pass Rate7.22
8
PlanningTravelPlanner Hard
Delivery Rate100
5
PlanningTravelPlanner Medium
Delivery Rate100
5
PlanningTravelPlanner Easy
Delivery Rate100
5
Travel planning agentTravelPlanner
Commonsense Score (CS)0.833
4
PlanningTravelPlanner (test)
Success Rate0.271
4
PlanningTravelPlanner Avg.
Avg-pass Rate91.93
3
PlanningTravelPlanner (TP-test)
Avg Pass Rate91.76
3
PlanningTravelPlanner (val)
Avg-pass91.04
3
PlanningTravelPlanner (train)
Avg-pass Rate93
3
Showing 20 of 20 rows