Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Finite-State Controllers for (Hidden-Model) POMDPs using Deep Reinforcement Learning

About

Solving partially observable Markov decision processes (POMDPs) requires computing policies under imperfect state information. Despite recent advances, the scalability of existing POMDP solvers remains limited. Moreover, many settings require a policy that is robust across multiple POMDPs, further aggravating the scalability issue. We propose the Lexpop framework for POMDP solving. Lexpop (1) employs deep reinforcement learning to train a neural policy, represented by a recurrent neural network, and (2) constructs a finite-state controller mimicking the neural policy through efficient extraction methods. Crucially, unlike neural policies, such controllers can be formally evaluated, providing performance guarantees. We extend Lexpop to compute robust policies for hidden-model POMDPs (HM-POMDPs), which describe finite sets of POMDPs. We associate every extracted controller with its worst-case POMDP. Using a set of such POMDPs, we iteratively train a robust neural policy and consequently extract a robust controller. Our experiments show that on problems with large state spaces, Lexpop outperforms state-of-the-art solvers for POMDPs as well as HM-POMDPs.

David Hud\'ak, Maris F. L. Galesloot, Martin Tappler, Martin Kure\v{c}ka, Nils Jansen, Milan \v{C}e\v{s}ka• 2026

Related benchmarks

TaskDatasetResultRank
POMDP Planningmaze-10 POMDP PRISM format (original enlarged)
Value (IQM)8.86
4
POMDP Planningrocks-16 POMDP PRISM format (original/enlarged)
IQM Value-51.49
4
POMDP Planningnetwork-3-8-20 POMDP PRISM format (original enlarged)
Value (IQM)-7.36
4
POMDP Planningnetwork-5-10-8 POMDP PRISM format (original enlarged)
Value (IQM)-12.56
4
POMDP Planningintercept-16 POMDP PRISM format (original enlarged)
Value (IQM)1
4
POMDP Planningevade-n17 POMDP PRISM format (original enlarged)
Value (IQM)0.85
4
POMDP Planningdrone-2-8-1 POMDP original enlarged PRISM format
Value (IQM)0.61
4
Robust POMDP PlanningHM-POMDP network
IQM3.78
3
Robust POMDP PlanningHM-POMDP avoid
IQM-161
3
Robust POMDP PlanningHM-POMDP drone-2-6-1
IQM59
3
Showing 10 of 14 rows

Other info

Follow for update