Voronoi Progressive Widening: Efficient Online Solvers for Continuous State, Action, and Observation POMDPs

About

This paper introduces Voronoi Progressive Widening (VPW), a generalization of Voronoi optimistic optimization (VOO) and action progressive widening to partially observable Markov decision processes (POMDPs). Tree search algorithms can use VPW to effectively handle continuous or hybrid action spaces by efficiently balancing local and global action searching. This paper proposes two VPW-based algorithms and analyzes them from theoretical and simulation perspectives. Voronoi Optimistic Weighted Sparse Sampling (VOWSS) is a theoretical tool that justifies VPW-based online solvers, and it is the first algorithm with global convergence guarantees for continuous state, action, and observation POMDPs. Voronoi Optimistic Monte Carlo Planning with Observation Weighting (VOMCPOW) is a versatile and efficient algorithm that consistently outperforms state-of-the-art POMDP algorithms in several simulation experiments.

Michael H. Lim, Claire J. Tomlin, Zachary N. Sunberg• 2020

Related benchmarks

Task	Dataset	Result
Planning	Rotating DDI	Mean Undiscounted Return48.119	66
Control Task	Lunar Lander (test)	Average Reward60.94	31
Planning	3D-Continuous Light-Dark	Mean Return4.8	30
POMDP Navigation	4D-Continuous Light-Dark	Mean Return3.04	30
POMDP Planning	2D-Continuous Light-Dark (test)	Mean Return6.05	30
Reinforcement Learning	Lunar Lander POMDP	Performance Score56.09	30
Hill Car POMDP	Hill Car POMDP	Mean Return78.37	30
Two-Agent 2D-Continuous Light-Dark Navigation	Two-Agent 2D-Continuous Light-Dark	Mean Performance2.57	30
Continuous Control	Mountain Car POMDP	Mean Performance25.62	30
Mountain Car	Mountain Car	Mean Return24.42	20

Showing 10 of 12 rows

Other info

Follow for update

@wizwand_team Discord