Structure-Induced Information for Rerooting Levin Tree Search
About
Subgoal-based policy tree search, which uses a policy to guide search, is effective for complex single-agent deterministic problems but often relies on explicit subgoal generation that can incur substantial overhead and hinders scalability. In this paper, we overcome these limitations by using a learned ``rerooter'' through the recently-introduced $\sqrt{\text{LTS}}$ algorithm. A rerooter implicitly decomposes the problem into soft subtasks. While previous work focused on the formal guarantees for given or handcrafted rerooters, in this work we propose three rerooter designs: (i) a clustering-based rerooter that exploits global state-space structure, (ii) a heuristic-based rerooter that leverages learned cost-to-go estimates, and (iii) a hybrid that combines both signals. Our framework avoids having to explicitly reconstruct and reason over generated subgoals, thereby enabling scalable allocation of search effort with significantly lower computational overhead. Empirically, our rerooting-based methods scale to complex environments where subgoal-based policy tree search fails, and achieve state-of-the-art online training efficiency on the domains tested.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Combinatorial Search | BoulderDash (train) | Expansions2.42e+7 | 7 | |
| Combinatorial Search | CraftWorld (train) | Search Expansions1.56e+8 | 7 | |
| Combinatorial Search | Sokoban (train) | Search Expansions9.52e+7 | 7 | |
| Combinatorial Search | TSP GridWorld (train) | Search Expansions1.35e+7 | 7 | |
| Search-based planning | BoulderDash hard problems (test) | Solved Rate100 | 7 | |
| Search-based planning | CraftWorld hard problems (test) | Success Rate100 | 7 | |
| Search-based planning | Sokoban Boxoban 1,000 problems (test) | Solved Count1.00e+3 | 7 | |
| Search-based planning | TSP GridWorld modified (test) | Solved Rate100 | 7 |