Multi-Environment POMDPs with Finite-Horizon Objectives

About

Partially Observable Markov Decision Processes (POMDPs) are systems in which one agent interacts with a stochastic environment, and receives only partial information about the current state. In a multi-environment POMDP (MEPOMDP), the initial state is unknown, and assumed to be adversarially chosen. In this work we focus on computing the optimal value and policy in MEPOMDPs with finite-horizon objectives. That problem is known to be PSPACE-complete in POMDPs. Our main results are as follows: (1) we establish that it is also PSPACE-complete in the more general setting of MEPOMDPs; (2) we present a practical algorithm and evaluate it on classical benchmarks, significantly outperforming the only previously known algorithm.

L\'eonard Brice, Filip Cano, Krishnendu Chatterjee, Thomas A. Henzinger, Stefanie Muroya• 2026

Related benchmarks

Task	Dataset	Result	Rank
Value computation in MEPOMDPs	RockSample	Computation Time (s)0.0581		10
Value computation	Robot Navigation and Identification	Computation Time (s)0.0223		9

Showing 2 of 2 rows

Other info

Follow for update

@wizwand_team Discord