Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Action-Gradient Monte Carlo Tree Search for Non-Parametric Continuous (PO)MDPs

About

Online planning in continuous state, action, and observation spaces remains challenging for autonomous systems. While Monte Carlo Tree Search (MCTS) scales effectively via sampling, most continuous (PO)MDP solvers do not exploit gradient-based action optimization. We propose Action-Gradient MCTS (AGMCTS), a framework that combines global tree search with local gradient-based action refinement, while maintaining consistent value estimates. We provide three key theoretical contributions: (1) an action score gradient theorem for particle belief states; (2) the Multiple Importance Sampling (MIS) Tree that supports frequent action-branch updates by reusing prior samples without introducing estimator drift; and (3) tractable action score gradients for smooth generative models using the Area Formula. Empirical results demonstrate that AGMCTS outperforms state-of-the-art sample-based solvers in multiple challenging continuous MDP and POMDP benchmarks.

Idan Lev-Yehudi, Michael Novitsky, Moran Barenboim, Ron Benchetrit, Vadim Indelman• 2025

Related benchmarks

TaskDatasetResultRank
Control TaskLunar Lander (test)
Average Reward61.28
31
Continuous ControlMountain Car POMDP
Mean Performance26.96
30
Hill Car POMDPHill Car POMDP
Mean Return87.58
30
Two-Agent 2D-Continuous Light-Dark NavigationTwo-Agent 2D-Continuous Light-Dark
Mean Performance2.84
30
POMDP Navigation4D-Continuous Light-Dark
Mean Return2.97
30
Planning3D-Continuous Light-Dark
Mean Return4.17
30
Reinforcement LearningLunar Lander POMDP
Performance Score52
30
POMDP Planning2D-Continuous Light-Dark (test)
Mean Return5.07
30
Mountain CarMountain Car
Mean Return29.97
20
Reinforcement LearningHill Car MDP
Performance56.68
20
Showing 10 of 10 rows

Other info

Follow for update