Open-World Reinforcement Learning over Long Short-Term Imagination
About
Training visual reinforcement learning agents in a high-dimensional open world presents significant challenges. While various model-based methods have improved sample efficiency by learning interactive world models, these agents tend to be "short-sighted", as they are typically trained on short snippets of imagined experiences. We argue that the primary challenge in open-world decision-making is improving the exploration efficiency across a vast state space, especially for tasks that demand consideration of long-horizon payoffs. In this paper, we present LS-Imagine, which extends the imagination horizon within a limited number of state transition steps, enabling the agent to explore behaviors that potentially lead to promising long-term feedback. The foundation of our approach is to build a $\textit{long short-term world model}$. To achieve this, we simulate goal-conditioned jumpy state transitions and compute corresponding affordance maps by zooming in on specific areas within single images. This facilitates the integration of direct long-term values into behavior learning. Our method demonstrates significant improvements over state-of-the-art techniques in MineDojo.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Long-horizon tasks | Minecraft Stone | Success Rate (SR)91.5 | 7 | |
| Long-horizon tasks | Minecraft Wood | Success Rate (SR)95.87 | 7 | |
| Long-horizon tasks | Minecraft Iron | Success Rate (SR)35.82 | 7 | |
| Long-horizon tasks | Minecraft Gold | Success Rate (SR)6.61 | 7 | |
| Long-horizon tasks | Minecraft Overall | Success Rate15.6 | 7 | |
| Long-horizon tasks | Minecraft Diamond | Success Rate (SR)4.36 | 7 | |
| Harvest log in plains | MineDojo | Success Rate (%)80.63 | 6 | |
| Harvest sand | MineDojo | Success Rate (%)62.68 | 6 | |
| Harvest water with bucket | MineDojo | Success Rate (%)77.31 | 6 | |
| Mine iron ore | MineDojo | Success Rate20.28 | 6 |