Synthesizing POMDP Policies: Sampling Meets Model-checking via Learning
About
Partially Observable Markov Decision Processes (POMDPs) are the standard framework for decision-making under uncertainty. While sampling-based methods scale well, they lack formal correctness guarantees, making them unsuitable for safety-critical applications. Conversely, formal synthesis techniques provide correctness-by-construction but often struggle with scalability, as general POMDP synthesis is undecidable. To bridge this gap, we propose a synthesis framework that integrates sampling, automata learning, and model-checking. Inspired by Angluin's $L^*$ algorithm, our approach utilizes sampling as a membership oracle and model-checking as an equivalence oracle. This enables the synthesis of finite-state controllers with formal guarantees, provided the sampling-induced policy is regular. We establish a relative completeness result for this framework. Experimental results from our prototypical implementation demonstrate that this method successfully solves threshold-safety problems that remain challenging for existing formal synthesis tools. We believe our algorithm serves as a valuable component in a portfolio approach to tackling the inherent difficulty of POMDP synthesis problems.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Policy Synthesis | Grid World | Success Rate5 | 16 | |
| POMDP Policy Synthesis | cheese-maze | Execution Time2.83 | 4 | |
| POMDP Policy Synthesis | refuel N=6, E=8 | Execution Time1.46 | 4 | |
| POMDP Policy Synthesis | Hallway | Execution Time49.32 | 4 | |
| POMDP Policy Synthesis | cards-removed 2 | Time1.48 | 4 | |
| POMDP Policy Synthesis | cards-removed 3 | Execution Time2.1 | 4 | |
| POMDP Policy Synthesis | cards removed-4 | Time7.42 | 4 | |
| POMDP Policy Synthesis | cards-added 2 | Execution Time1.77 | 4 | |
| POMDP Policy Synthesis | cards-added-3 | Execution Time10.66 | 4 | |
| POMDP Policy Synthesis | cheese-maze-det | Execution Time1.52 | 4 |