Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Partially Observable Monte-Carlo Graph Search

About

Currently, large partially observable Markov decision processes (POMDPs) are often solved by sampling-based online methods which interleave planning and execution phases. However, a pre-computed offline policy is more desirable in POMDP applications with time or energy constraints. But previous offline algorithms are not able to scale up to large POMDPs. In this article, we propose a new sampling-based algorithm, the partially observable Monte-Carlo graph search (POMCGS) to solve large POMDPs offline. Different from many online POMDP methods, which progressively develop a tree while performing (Monte-Carlo) simulations, POMCGS folds this search tree on the fly to construct a policy graph, so that computations can be drastically reduced, and users can analyze and validate the policy prior to embedding and executing it. Moreover, POMCGS, together with action progressive widening and observation clustering methods provided in this article, is able to address certain continuous POMDPs. Through experiments, we demonstrate that POMCGS can generate policies on the most challenging POMDPs, which cannot be computed by previous offline algorithms, and these policies' values are competitive compared with the state-of-the-art online POMDP algorithms.

Yang You, Vincent Thomas, Alex Schutz, Robert Skilton, Nick Hawes, Olivier Buffet• 2025

Related benchmarks

TaskDatasetResultRank
POMDP PlanningRockSample (15, 15)
Expected Return13.45
19
POMDP PlanningRockSample (20, 20)
Expected Return3.91
10
POMDP PlanningRockSample (7, 8)
Expected Return20.17
5
POMDP PlanningRockSample (11, 11)
Expected Return18.53
5
POMDP PlanningLight Dark
Expected Return3.74
4
POMDP PlanningLidar Roomba
Expected Return1.08
4
Showing 6 of 6 rows

Other info

Follow for update