Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

PDDL

Benchmarks

Task NameDataset NameSOTA ResultTrend
Automated PlanningPDDL
Accuracy83.9
233
Multi-turn Agentic TaskPDDL
Success Rate83
28
Embodied AgenticPDDL
Accuracy70.5
21
Step-level reasoning evaluationPDDL (test)
F1 Score94.5
20
PlanningPDDL
Progress Rate35.07
14
Text-based embodied taskPDDL
Success Rate61.7
13
Generalized PlanningPDDL trading domain
Solution Rate100
10
Generalized PlanningPDDL
Solution Percentage100
10
Generalized PlanningPDDL manymiconic domain
Solution Rate100
10
Generalized PlanningPDDL manygripper domain
Solution Rate100
10
Generalized PlanningPDDL manyferry domain
Solution Percentage100
10
Generalized PlanningPDDL heavypack
Percent Solved100
10
Showing 12 of 12 rows