| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Travel Planning | TravelPlanner | Average Tokens Used14.8 | 46 | |
| Travel planning | TravelPlanner (val) | Delivery Rate100 | 25 | |
| Planning | TravelPlanner #180 (val) | CS-Micro95.64 | 22 | |
| Travel Planning | TravelPlanner (test) | Commonsense Constraint (Micro)98.83 | 18 | |
| Long-horizon planning | TravelPlanner | Delivery Rate100 | 13 | |
| Travel Planning | TravelPlanner 1000 tasks (test) | Commonsense Score (Micro)94.72 | 13 | |
| Planning | TravelPlanner | Pass@159.25 | 12 | |
| End-to-end planning | TravelPlanner | Success Rate (CS/HD Avg)0.225 | 12 | |
| Constraint Satisfaction Plan Generation | TravelPlanner | Delivery Rate100 | 11 | |
| Multi-agent Planning | TravelPlanner (val) | Final Pass Rate3.33 | 8 | |
| Sole Planning | TravelPlanner (val) | Final Pass Rate7.22 | 8 | |
| Planning | TravelPlanner Hard | Delivery Rate100 | 5 | |
| Planning | TravelPlanner Medium | Delivery Rate100 | 5 | |
| Planning | TravelPlanner Easy | Delivery Rate100 | 5 | |
| Travel planning agent | TravelPlanner | Commonsense Score (CS)0.833 | 4 | |
| Planning | TravelPlanner (test) | Success Rate0.271 | 4 | |
| Planning | TravelPlanner Avg. | Avg-pass Rate91.93 | 3 | |
| Planning | TravelPlanner (TP-test) | Avg Pass Rate91.76 | 3 | |
| Planning | TravelPlanner (val) | Avg-pass91.04 | 3 | |
| Planning | TravelPlanner (train) | Avg-pass Rate93 | 3 |