Synthesizing POMDP Policies: Sampling Meets Model-checking via Learning

About

Partially Observable Markov Decision Processes (POMDPs) are the standard framework for decision-making under uncertainty. While sampling-based methods scale well, they lack formal correctness guarantees, making them unsuitable for safety-critical applications. Conversely, formal synthesis techniques provide correctness-by-construction but often struggle with scalability, as general POMDP synthesis is undecidable. To bridge this gap, we propose a synthesis framework that integrates sampling, automata learning, and model-checking. Inspired by Angluin's $L^*$ algorithm, our approach utilizes sampling as a membership oracle and model-checking as an equivalence oracle. This enables the synthesis of finite-state controllers with formal guarantees, provided the sampling-induced policy is regular. We establish a relative completeness result for this framework. Experimental results from our prototypical implementation demonstrate that this method successfully solves threshold-safety problems that remain challenging for existing formal synthesis tools. We believe our algorithm serves as a valuable component in a portfolio approach to tackling the inherent difficulty of POMDP synthesis problems.

Debraj Chakraborty, Anirban Majumdar, Prince Mathew, Sayan Mukherjee, Jean-Fran\c{c}ois Raskin• 2026

Related benchmarks

Task	Dataset	Result
Policy Synthesis	Grid World	Success Rate5	16
POMDP Policy Synthesis	cheese-maze	Execution Time2.83	4
POMDP Policy Synthesis	refuel N=6, E=8	Execution Time1.46	4
POMDP Policy Synthesis	Hallway	Execution Time49.32	4
POMDP Policy Synthesis	cards-removed 2	Time1.48	4
POMDP Policy Synthesis	cards-removed 3	Execution Time2.1	4
POMDP Policy Synthesis	cards removed-4	Time7.42	4
POMDP Policy Synthesis	cards-added 2	Execution Time1.77	4
POMDP Policy Synthesis	cards-added-3	Execution Time10.66	4
POMDP Policy Synthesis	cheese-maze-det	Execution Time1.52	4

Showing 10 of 16 rows

Other info

Follow for update

@wizwand_team Discord