Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Robust Asynchronous Planning via Auto-Formalization

About

LLMs can plan by either generating action sequences directly as a Planner or translating tasks into domain specific language for an external solver as a Formalizer. While most real-world tasks are asynchronous with non-uniform durations, concurrency, and execution-time constraints, existing benchmarks hardly cover them. We unify these asynchronous planning challenges under a single formulation and introduce the first three benchmarks that address each at scale. We conclude that the choice of formal representation primarily determines whether planning scales: as dependency graphs grow from 5 to 100 actions, Planner collapses from 96% to 5% plan accuracy and PDDL2.1 Formalizer from 13% to 0%, while CP-SAT Formalizer averages 94% and still achieves 83% at 100 actions. Faithfulness diagnostics show that PDDL2.1's predicate-based planning representation becomes brittle compared to general constraint satisfaction programs, when LLMs must keep predicates, effects, and goals consistent. Execution-time updates of planning constraints further degrade performance sharply (Planner 23.9%, PDDL2.1 0.7%, CP-SAT 46.1%), but a state-aware repair strategy that updates only event-induced constraints recovers CP-SAT Formalizer to 84.5%.

Jiayi Zhang, Jianing Yin, Ben Zhou, Li Zhang• 2026

Related benchmarks

TaskDatasetResultRank
Makespan AccuracyAsyncPlan-XXL
Accuracy (S5)98
24
Plan GenerationRobo Challenge (Online)
Plan Accuracy85.7
16
Asynchronous planningAsyncHow
Makespan Accuracy98.44
15
Makespan AccuracyRobotouille
Makespan Accuracy20
12
Plan GenerationRobo Challenge (Offline)
Plan Accuracy100
12
Asynchronous planningRobotouille
Makespan Accuracy17.5
3
Showing 6 of 6 rows

Other info

Follow for update