| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| TravelPlanner | ToRL | Average Tokens Used14.8 | 46 | 1mo ago | |
| Travel Planning | PIVOT | Composite Score96.2 | 30 | 21d ago | |
| TravelPlanner (val) | MIRROR | Delivery Rate100 | 25 | 1mo ago | |
| TravelPlanner (test) | Llama-3.1-8B-Instruct | Commonsense Constraint (Micro)98.83 | 18 | 13d ago | |
| TravelPlanner 1000 tasks (test) | HiMAP-Travel | Commonsense Score (Micro)94.72 | 13 | 2mo ago | |
| ChinaTravel Medium | Behavior Forest | Final Pass Rate77.82 | 8 | 13d ago | |
| ChinaTravel (Human) | Behavior Forest | Delivery Rate96.73 | 5 | 1mo ago | |
| ChinaTravel Easy | Behavior Forest | Delivery Rate98.89 | 5 | 1mo ago |