Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Voronoi Progressive Widening: Efficient Online Solvers for Continuous State, Action, and Observation POMDPs

About

This paper introduces Voronoi Progressive Widening (VPW), a generalization of Voronoi optimistic optimization (VOO) and action progressive widening to partially observable Markov decision processes (POMDPs). Tree search algorithms can use VPW to effectively handle continuous or hybrid action spaces by efficiently balancing local and global action searching. This paper proposes two VPW-based algorithms and analyzes them from theoretical and simulation perspectives. Voronoi Optimistic Weighted Sparse Sampling (VOWSS) is a theoretical tool that justifies VPW-based online solvers, and it is the first algorithm with global convergence guarantees for continuous state, action, and observation POMDPs. Voronoi Optimistic Monte Carlo Planning with Observation Weighting (VOMCPOW) is a versatile and efficient algorithm that consistently outperforms state-of-the-art POMDP algorithms in several simulation experiments.

Michael H. Lim, Claire J. Tomlin, Zachary N. Sunberg• 2020

Related benchmarks

TaskDatasetResultRank
Control TaskLunar Lander (test)
Average Reward60.94
31
Planning3D-Continuous Light-Dark
Mean Return4.8
30
POMDP Navigation4D-Continuous Light-Dark
Mean Return3.04
30
POMDP Planning2D-Continuous Light-Dark (test)
Mean Return6.05
30
Reinforcement LearningLunar Lander POMDP
Performance Score56.09
30
Hill Car POMDPHill Car POMDP
Mean Return78.37
30
Two-Agent 2D-Continuous Light-Dark NavigationTwo-Agent 2D-Continuous Light-Dark
Mean Performance2.57
30
Continuous ControlMountain Car POMDP
Mean Performance25.62
30
Mountain CarMountain Car
Mean Return24.42
20
Reinforcement LearningHill Car MDP
Performance-59.66
20
Showing 10 of 10 rows

Other info

Follow for update