Learning Abstractions for Hierarchical Planning in Program-Synthesis Agents
About
Humans learn abstractions and use them to plan efficiently to quickly generalize across tasks -- an ability that remains challenging for state-of-the-art large language model (LLM) agents and deep reinforcement learning (RL) systems. Inspired by the cognitive science of how people form abstractions and intuitive theories of their world knowledge, Theory-Based RL (TBRL) systems, such as TheoryCoder, exhibit strong generalization through effective use of abstractions. However, they heavily rely on human-provided abstractions and sidestep the abstraction-learning problem. We introduce TheoryCoder-2, a new TBRL agent that leverages LLMs' in-context learning ability to actively learn reusable abstractions rather than relying on hand-specified ones, by synthesizing abstractions from experience and integrating them into a hierarchical planning process. We conduct experiments on diverse environments, including BabyAI, Minihack and VGDL games like Sokoban. We find that TheoryCoder-2 is significantly more sample-efficient than baseline LLM agents augmented with classical planning domain construction, reasoning-based planning, and prior program-synthesis agents such as WorldCoder. TheoryCoder-2 is able to solve complex tasks that the baselines fail, while only requiring minimal human prompts, unlike prior TBRL systems.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Hierarchical Planning | BabyAI Combined Skills 1 | Token Cost1.96e+3 | 6 | |
| Hierarchical Planning | BabyAI Combined Skills 2 | Token Cost2.53e+3 | 6 | |
| Hierarchical Planning | BabyAI Combined Skills 3 | Token Cost2.45e+3 | 6 | |
| Hierarchical Planning | Minihack 15x15 | Token Cost0.00e+0 | 6 | |
| Hierarchical Planning | Minihack-Traps | Token Cost0.00e+0 | 6 | |
| Hierarchical Planning | Minihack Monster | Token Cost0.00e+0 | 6 | |
| Hierarchical Planning | Labyrinth | Token Cost2.14e+4 | 6 | |
| Hierarchical Planning | Maze | Token Cost1.97e+4 | 6 | |
| Hierarchical Planning | Sokoban | Token Cost7.17e+3 | 6 | |
| Hierarchical Planning | BabyAI Pickup | Token Cost6.66e+3 | 6 |