Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Open-World Reinforcement Learning over Long Short-Term Imagination

About

Training visual reinforcement learning agents in a high-dimensional open world presents significant challenges. While various model-based methods have improved sample efficiency by learning interactive world models, these agents tend to be "short-sighted", as they are typically trained on short snippets of imagined experiences. We argue that the primary challenge in open-world decision-making is improving the exploration efficiency across a vast state space, especially for tasks that demand consideration of long-horizon payoffs. In this paper, we present LS-Imagine, which extends the imagination horizon within a limited number of state transition steps, enabling the agent to explore behaviors that potentially lead to promising long-term feedback. The foundation of our approach is to build a $\textit{long short-term world model}$. To achieve this, we simulate goal-conditioned jumpy state transitions and compute corresponding affordance maps by zooming in on specific areas within single images. This facilitates the integration of direct long-term values into behavior learning. Our method demonstrates significant improvements over state-of-the-art techniques in MineDojo.

Jiajian Li, Qi Wang, Yunbo Wang, Xin Jin, Yang Li, Wenjun Zeng, Xiaokang Yang• 2024

Related benchmarks

TaskDatasetResultRank
Long-horizon tasksMinecraft Stone
Success Rate (SR)91.5
7
Long-horizon tasksMinecraft Wood
Success Rate (SR)95.87
7
Long-horizon tasksMinecraft Iron
Success Rate (SR)35.82
7
Long-horizon tasksMinecraft Gold
Success Rate (SR)6.61
7
Long-horizon tasksMinecraft Overall
Success Rate15.6
7
Long-horizon tasksMinecraft Diamond
Success Rate (SR)4.36
7
Harvest log in plainsMineDojo
Success Rate (%)80.63
6
Harvest sandMineDojo
Success Rate (%)62.68
6
Harvest water with bucketMineDojo
Success Rate (%)77.31
6
Mine iron oreMineDojo
Success Rate20.28
6
Showing 10 of 11 rows

Other info

Follow for update