Finite-State Controllers for (Hidden-Model) POMDPs using Deep Reinforcement Learning
About
Solving partially observable Markov decision processes (POMDPs) requires computing policies under imperfect state information. Despite recent advances, the scalability of existing POMDP solvers remains limited. Moreover, many settings require a policy that is robust across multiple POMDPs, further aggravating the scalability issue. We propose the Lexpop framework for POMDP solving. Lexpop (1) employs deep reinforcement learning to train a neural policy, represented by a recurrent neural network, and (2) constructs a finite-state controller mimicking the neural policy through efficient extraction methods. Crucially, unlike neural policies, such controllers can be formally evaluated, providing performance guarantees. We extend Lexpop to compute robust policies for hidden-model POMDPs (HM-POMDPs), which describe finite sets of POMDPs. We associate every extracted controller with its worst-case POMDP. Using a set of such POMDPs, we iteratively train a robust neural policy and consequently extract a robust controller. Our experiments show that on problems with large state spaces, Lexpop outperforms state-of-the-art solvers for POMDPs as well as HM-POMDPs.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| POMDP Planning | maze-10 POMDP PRISM format (original enlarged) | Value (IQM)8.86 | 4 | |
| POMDP Planning | rocks-16 POMDP PRISM format (original/enlarged) | IQM Value-51.49 | 4 | |
| POMDP Planning | network-3-8-20 POMDP PRISM format (original enlarged) | Value (IQM)-7.36 | 4 | |
| POMDP Planning | network-5-10-8 POMDP PRISM format (original enlarged) | Value (IQM)-12.56 | 4 | |
| POMDP Planning | intercept-16 POMDP PRISM format (original enlarged) | Value (IQM)1 | 4 | |
| POMDP Planning | evade-n17 POMDP PRISM format (original enlarged) | Value (IQM)0.85 | 4 | |
| POMDP Planning | drone-2-8-1 POMDP original enlarged PRISM format | Value (IQM)0.61 | 4 | |
| Robust POMDP Planning | HM-POMDP network | IQM3.78 | 3 | |
| Robust POMDP Planning | HM-POMDP avoid | IQM-161 | 3 | |
| Robust POMDP Planning | HM-POMDP drone-2-6-1 | IQM59 | 3 |