Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Multi-Environment POMDPs with Finite-Horizon Objectives

About

Partially Observable Markov Decision Processes (POMDPs) are systems in which one agent interacts with a stochastic environment, and receives only partial information about the current state. In a multi-environment POMDP (MEPOMDP), the initial state is unknown, and assumed to be adversarially chosen. In this work we focus on computing the optimal value and policy in MEPOMDPs with finite-horizon objectives. That problem is known to be PSPACE-complete in POMDPs. Our main results are as follows: (1) we establish that it is also PSPACE-complete in the more general setting of MEPOMDPs; (2) we present a practical algorithm and evaluate it on classical benchmarks, significantly outperforming the only previously known algorithm.

L\'eonard Brice, Filip Cano, Krishnendu Chatterjee, Thomas A. Henzinger, Stefanie Muroya• 2026

Related benchmarks

TaskDatasetResultRank
Value computation in MEPOMDPsRockSample
Computation Time (s)0.0581
10
Value computationRobot Navigation and Identification
Computation Time (s)0.0223
9
Showing 2 of 2 rows

Other info

Follow for update