Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Extreme Value Monte Carlo Tree Search for Classical Planning

About

Despite being successful in board games and reinforcement learning (RL), Monte Carlo Tree Search (MCTS) combined with Multi Armed Bandits (MABs) has seen limited success in domain-independent classical planning until recently. Previous work (Wissow and Asai 2024) showed that UCB1, designed for bounded rewards, does not perform well as applied to cost-to-go estimates in classical planning, which are unbounded in $\R$, and showed improved performance using a Gaussian reward MAB instead. This paper further sharpens our understanding of ideal bandits for planning tasks. Existing work has two issues: first, Gaussian MABs under-specify the support of cost-to-go estimates as $(-\infty,\infty)$, which we can narrow down. Second, Full Bellman backup (Schulte and Keller 2014), which backpropagates sample max/min, lacks theoretical justification. We use \emph{Peaks-Over-Threashold Extreme Value Theory} to resolve both issues at once, and propose a new bandit algorithm (UCB1-Uniform). We formally prove its regret bound and empirically demonstrate its performance in classical planning.

Masataro Asai, Stephen Wissow• 2024

Related benchmarks

TaskDatasetResultRank
Classical Planning24 Planning Domains
Instances Solved635.6
50
Classical PlanningIPC satisficing 2018 (test)
Agricola Performance11.6
10
Planning (IPC Score)IPC 2018
Agricola Score6.1
4
Showing 3 of 3 rows

Other info

Follow for update