Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Discovering and Achieving Goals via World Models

About

How can artificial agents learn to solve many diverse tasks in complex visual environments in the absence of any supervision? We decompose this question into two problems: discovering new goals and learning to reliably achieve them. We introduce Latent Explorer Achiever (LEXA), a unified solution to these that learns a world model from image inputs and uses it to train an explorer and an achiever policy from imagined rollouts. Unlike prior methods that explore by reaching previously visited states, the explorer plans to discover unseen surprising states through foresight, which are then used as diverse targets for the achiever to practice. After the unsupervised phase, LEXA solves tasks specified as goal images zero-shot without any additional learning. LEXA substantially outperforms previous approaches to unsupervised goal-reaching, both on prior benchmarks and on a new challenging benchmark with a total of 40 test tasks spanning across four standard robotic manipulation and locomotion domains. LEXA further achieves goals that require interacting with multiple objects in sequence. Finally, to demonstrate the scalability and generality of LEXA, we train a single general agent across four distinct environments. Code and videos at https://orybkin.github.io/lexa/

Russell Mendonca, Oleh Rybkin, Kostas Daniilidis, Danijar Hafner, Deepak Pathak• 2021

Related benchmarks

TaskDatasetResultRank
Goal ReachingRoboKitchen (test)
Success Rate37.5
16
Visual PickupSkewFit
Goal Reaching Error (m)0.014
10
Visual PusherSkewFit
Goal Reaching Error0.023
10
Goal ReachingRoboBins (test)
Goal Success Rate69.44
6
Goal ReachingRoboYoga Quadruped (test)
Goal Success Rate56.11
6
Goal ReachingRoboYoga Walker (test)
Goal Success Rate73.06
6
PickupSkewFit
Goal Distance (cm)1.4
4
PusherSkewFit
Goal distance (cm)2.3
4
ControlDeepMind Control v1 (test)
Walker Stand957
3
Goal-conditioned Reinforcement LearningDeepMind Control Suite P2E tasks standard (test)
Walker Stand Score957
3
Showing 10 of 12 rows

Other info

Code

Follow for update