Finite-State Controllers for (Hidden-Model) POMDPs using Deep Reinforcement Learning

About

Solving partially observable Markov decision processes (POMDPs) requires computing policies under imperfect state information. Despite recent advances, the scalability of existing POMDP solvers remains limited. Moreover, many settings require a policy that is robust across multiple POMDPs, further aggravating the scalability issue. We propose the Lexpop framework for POMDP solving. Lexpop (1) employs deep reinforcement learning to train a neural policy, represented by a recurrent neural network, and (2) constructs a finite-state controller mimicking the neural policy through efficient extraction methods. Crucially, unlike neural policies, such controllers can be formally evaluated, providing performance guarantees. We extend Lexpop to compute robust policies for hidden-model POMDPs (HM-POMDPs), which describe finite sets of POMDPs. We associate every extracted controller with its worst-case POMDP. Using a set of such POMDPs, we iteratively train a robust neural policy and consequently extract a robust controller. Our experiments show that on problems with large state spaces, Lexpop outperforms state-of-the-art solvers for POMDPs as well as HM-POMDPs.

David Hud\'ak, Maris F. L. Galesloot, Martin Tappler, Martin Kure\v{c}ka, Nils Jansen, Milan \v{C}e\v{s}ka• 2026

Related benchmarks

Task	Dataset	Result
POMDP Planning	maze-10 POMDP PRISM format (original enlarged)	Value (IQM)8.86	4
POMDP Planning	rocks-16 POMDP PRISM format (original/enlarged)	IQM Value-51.49	4
POMDP Planning	network-3-8-20 POMDP PRISM format (original enlarged)	Value (IQM)-7.36	4
POMDP Planning	network-5-10-8 POMDP PRISM format (original enlarged)	Value (IQM)-12.56	4
POMDP Planning	intercept-16 POMDP PRISM format (original enlarged)	Value (IQM)1	4
POMDP Planning	evade-n17 POMDP PRISM format (original enlarged)	Value (IQM)0.85	4
POMDP Planning	drone-2-8-1 POMDP original enlarged PRISM format	Value (IQM)0.61	4
Robust POMDP Planning	HM-POMDP network	IQM3.78	3
Robust POMDP Planning	HM-POMDP avoid	IQM-161	3
Robust POMDP Planning	HM-POMDP drone-2-6-1	IQM59	3

Showing 10 of 14 rows

Other info

Follow for update

@wizwand_team Discord