Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Online algorithms for POMDPs with continuous state, action, and observation spaces

About

Online solvers for partially observable Markov decision processes have been applied to problems with large discrete state spaces, but continuous state, action, and observation spaces remain a challenge. This paper begins by investigating double progressive widening (DPW) as a solution to this challenge. However, we prove that this modification alone is not sufficient because the belief representations in the search tree collapse to a single particle causing the algorithm to converge to a policy that is suboptimal regardless of the computation time. This paper proposes and evaluates two new algorithms, POMCPOW and PFT-DPW, that overcome this deficiency by using weighted particle filtering. Simulation results show that these modifications allow the algorithms to be successful where previous approaches fail.

Zachary Sunberg, Mykel Kochenderfer• 2017

Related benchmarks

TaskDatasetResultRank
Hill Car POMDPHill Car POMDP
Mean Return83.22
30
Reinforcement LearningLunar Lander POMDP
Performance Score54.89
30
POMDP Planning2D-Continuous Light-Dark (test)
Mean Return5.73
30
Planning3D-Continuous Light-Dark
Mean Return3.52
30
POMDP Navigation4D-Continuous Light-Dark
Mean Return1.98
30
Two-Agent 2D-Continuous Light-Dark NavigationTwo-Agent 2D-Continuous Light-Dark
Mean Performance2.28
30
Continuous ControlMountain Car POMDP
Mean Performance25.39
30
POMDP PlanningRockSample (15, 15)
Expected Return11.01
19
POMDP PlanningLightDark 10
Return1.08
15
POMDP PlanningRockSample (20, 20)
Expected Return9.92
10
Showing 10 of 26 rows

Other info

Follow for update