Open-World Reinforcement Learning over Long Short-Term Imagination

About

Training visual reinforcement learning agents in a high-dimensional open world presents significant challenges. While various model-based methods have improved sample efficiency by learning interactive world models, these agents tend to be "short-sighted", as they are typically trained on short snippets of imagined experiences. We argue that the primary challenge in open-world decision-making is improving the exploration efficiency across a vast state space, especially for tasks that demand consideration of long-horizon payoffs. In this paper, we present LS-Imagine, which extends the imagination horizon within a limited number of state transition steps, enabling the agent to explore behaviors that potentially lead to promising long-term feedback. The foundation of our approach is to build a $\textit{long short-term world model}$. To achieve this, we simulate goal-conditioned jumpy state transitions and compute corresponding affordance maps by zooming in on specific areas within single images. This facilitates the integration of direct long-term values into behavior learning. Our method demonstrates significant improvements over state-of-the-art techniques in MineDojo.

Jiajian Li, Qi Wang, Yunbo Wang, Xin Jin, Yang Li, Wenjun Zeng, Xiaokang Yang• 2024

Related benchmarks

Task	Dataset	Result
Shear sheep	MineDojo	Success Rate (%)54.28	12
Long-horizon tasks	Minecraft Stone	Success Rate (SR)91.5	7
Long-horizon tasks	Minecraft Wood	Success Rate (SR)95.87	7
Long-horizon tasks	Minecraft Iron	Success Rate (SR)35.82	7
Long-horizon tasks	Minecraft Gold	Success Rate (SR)6.61	7
Long-horizon tasks	Minecraft Overall	Success Rate15.6	7
Long-horizon tasks	Minecraft Diamond	Success Rate (SR)4.36	7
Harvest log in plains	MineDojo	Success Rate (%)80.63	6
Harvest sand	MineDojo	Success Rate (%)62.68	6
Harvest water with bucket	MineDojo	Success Rate (%)77.31	6

Showing 10 of 11 rows

Other info

Follow for update

@wizwand_team Discord