Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Single-Agent Policy Tree Search With Guarantees

About

We introduce two novel tree search algorithms that use a policy to guide search. The first algorithm is a best-first enumeration that uses a cost function that allows us to prove an upper bound on the number of nodes to be expanded before reaching a goal state. We show that this best-first algorithm is particularly well suited for `needle-in-a-haystack' problems. The second algorithm is based on sampling and we prove an upper bound on the expected number of nodes it expands before reaching a set of goal states. We show that this algorithm is better suited for problems where many paths lead to a goal. We validate these tree search algorithms on 1,000 computer-generated levels of Sokoban, where the policy used to guide the search comes from a neural network trained using A3C. Our results show that the policy tree search algorithms we introduce are competitive with a state-of-the-art domain-independent planner that uses heuristic search.

Laurent Orseau, Levi H. S. Lelis, Tor Lattimore, Th\'eophane Weber• 2018

Related benchmarks

TaskDatasetResultRank
Search-based planningBoulderDash hard problems (test)
Solved Rate100
7
Search-based planningCraftWorld hard problems (test)
Success Rate100
7
Search-based planningSokoban Boxoban 1,000 problems (test)
Solved Count1.00e+3
7
Search-based planningTSP GridWorld modified (test)
Solved Rate100
7
Combinatorial SearchCraftWorld (train)
Search Expansions4.23e+8
7
Combinatorial SearchTSP GridWorld (train)
Search Expansions1.23e+8
7
Combinatorial SearchBoulderDash (train)
Expansions2.60e+8
7
Combinatorial SearchSokoban (train)
Search Expansions1.73e+8
7
Showing 8 of 8 rows

Other info

Follow for update