DeepLTL: Learning to Efficiently Satisfy Complex LTL Specifications for Multi-Task RL
About
Linear temporal logic (LTL) has recently been adopted as a powerful formalism for specifying complex, temporally extended tasks in multi-task reinforcement learning (RL). However, learning policies that efficiently satisfy arbitrary specifications not observed during training remains a challenging problem. Existing approaches suffer from several shortcomings: they are often only applicable to finite-horizon fragments of LTL, are restricted to suboptimal solutions, and do not adequately handle safety constraints. In this work, we propose a novel learning approach to address these concerns. Our method leverages the structure of B\"uchi automata, which explicitly represent the semantics of LTL specifications, to learn policies conditioned on sequences of truth assignments that lead to satisfying the desired formulae. Experiments in a variety of discrete and continuous domains demonstrate that our approach is able to zero-shot satisfy a wide range of finite- and infinite-horizon specifications, and outperforms existing methods in terms of both satisfaction probability and efficiency. Code available at: https://deep-ltl.github.io/
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Multi-Task Reinforcement Learning (LTL Instruction Following) | Warehouse Finite Horizon | Success Rate99 | 30 | |
| Multi-Task Reinforcement Learning (LTL Instruction Following) | Warehouse Infinite Horizon | Average Visits823.5 | 20 | |
| LTL Instruction Following | Letter Finite-horizon (full) | Success Rate (SR)99 | 19 | |
| LTL Instruction Following | ZoneEnv Finite Horizon | Success Rate (SR)97 | 18 | |
| Multi-Task Reinforcement Learning (LTL Instruction Following) | ZoneEnv Finite Horizon | Success Rate98 | 18 | |
| LTL Instruction Following | Zones Infinite-horizon (full) | µacc914 | 14 | |
| LTL Instruction Following | LetterWorld Finite-horizon | Success Rate (SR)100 | 12 | |
| Multi-Task Reinforcement Learning (LTL Instruction Following) | ZoneEnv Infinite Horizon | Average Visits560.6 | 12 | |
| LTL-guided Reinforcement Learning | Zones Finite-horizon (test) | Success Rate98 | 10 | |
| LTL Instruction Following | Letter Infinite-horizon (full) | µAcc6.15 | 10 |