Action-Gradient Monte Carlo Tree Search for Non-Parametric Continuous (PO)MDPs

About

Online planning in continuous state, action, and observation spaces remains challenging for autonomous systems. While Monte Carlo Tree Search (MCTS) scales effectively via sampling, most continuous (PO)MDP solvers do not exploit gradient-based action optimization. We propose Action-Gradient MCTS (AGMCTS), a framework that combines global tree search with local gradient-based action refinement, while maintaining consistent value estimates. We provide three key theoretical contributions: (1) an action score gradient theorem for particle belief states; (2) the Multiple Importance Sampling (MIS) Tree that supports frequent action-branch updates by reusing prior samples without introducing estimator drift; and (3) tractable action score gradients for smooth generative models using the Area Formula. Empirical results demonstrate that AGMCTS outperforms state-of-the-art sample-based solvers in multiple challenging continuous MDP and POMDP benchmarks.

Idan Lev-Yehudi, Michael Novitsky, Moran Barenboim, Ron Benchetrit, Vadim Indelman• 2025

Related benchmarks

Task	Dataset	Result
Control Task	Lunar Lander (test)	Average Reward61.28	31
Continuous Control	Mountain Car POMDP	Mean Performance26.96	30
Hill Car POMDP	Hill Car POMDP	Mean Return87.58	30
Two-Agent 2D-Continuous Light-Dark Navigation	Two-Agent 2D-Continuous Light-Dark	Mean Performance2.84	30
POMDP Navigation	4D-Continuous Light-Dark	Mean Return2.97	30
Planning	3D-Continuous Light-Dark	Mean Return4.17	30
Reinforcement Learning	Lunar Lander POMDP	Performance Score52	30
POMDP Planning	2D-Continuous Light-Dark (test)	Mean Return5.07	30
Mountain Car	Mountain Car	Mean Return29.97	20
Reinforcement Learning	Hill Car MDP	Performance56.68	20

Showing 10 of 10 rows

Other info

Follow for update

@wizwand_team Discord