Textual Planning with Explicit Latent Transitions
About
Planning with LLMs is bottlenecked by token-by-token generation and repeated full forward passes, making multi-step lookahead and rollout-based search expensive in latency and compute. We propose EmbedPlan, which replaces autoregressive next-state generation with a lightweight transition model operating in a frozen language embedding space. EmbedPlan encodes natural language state and action descriptions into vectors, predicts the next-state embedding, and retrieves the next state by nearest-neighbor similarity, enabling fast planning computation without fine-tuning the encoder. We evaluate next-state prediction across nine classical planning domains using six evaluation protocols of increasing difficulty: interpolation, plan-variant, extrapolation, multi-domain, cross-domain, and leave-one-out. Results show near-perfect interpolation performance but a sharp degradation when generalization requires transfer to unseen problems or unseen domains; plan-variant evaluation indicates generalization to alternative plans rather than memorizing seen trajectories. Overall, frozen embeddings support within-domain dynamics learning after observing a domain's transitions, while transfer across domain boundaries remains a bottleneck.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Textual Planning | PDDL domains Untrained protocol | Hit@53.9 | 1 | |
| Textual Planning | PDDL domains Cross-Domain protocol | Hit@56.6 | 1 | |
| Textual Planning | PDDL domains Leave-One-Out protocol | Hit@59.2 | 1 | |
| Textual Planning | PDDL domains Multi-Domain Ex. protocol | Hit@537.2 | 1 | |
| Textual Planning | PDDL domains Single-Domain Ex. protocol | Hit@50.546 | 1 | |
| Textual Planning | PDDL domains Plan-Variant protocol | Hit@551.2 | 1 | |
| Textual Planning | PDDL domains Single-Domain In. protocol | Hit@599.7 | 1 |