Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Mastering Diverse Domains through World Models

About

Developing a general algorithm that learns to solve tasks across a wide range of applications has been a fundamental challenge in artificial intelligence. Although current reinforcement learning algorithms can be readily applied to tasks similar to what they have been developed for, configuring them for new application domains requires significant human expertise and experimentation. We present DreamerV3, a general algorithm that outperforms specialized methods across over 150 diverse tasks, with a single configuration. Dreamer learns a model of the environment and improves its behavior by imagining future scenarios. Robustness techniques based on normalization, balancing, and transformations enable stable learning across domains. Applied out of the box, Dreamer is the first algorithm to collect diamonds in Minecraft from scratch without human data or curricula. This achievement has been posed as a significant challenge in artificial intelligence that requires exploring farsighted strategies from pixels and sparse rewards in an open world. Our work allows solving challenging control problems without extensive experimentation, making reinforcement learning broadly applicable.

Danijar Hafner, Jurgis Pasukonis, Jimmy Ba, Timothy Lillicrap• 2023

Related benchmarks

TaskDatasetResultRank
LocomotionDog & Humanoid suite
IQM0.01
32
Dexterous ManipulationMyoSuite
IQM0.466
28
Humanoid Locomotion and ManipulationHumanoidBench
IQM0.007
28
3D Dynamics PredictionMuJoCo Fall-and-rebound scenario
Translation Error (m)0.056
20
Motion forecastingPush-slide-settle scenario (test)
Translation Error (m)0.079
20
Reinforcement LearningAtari 100k
Alien Score959
18
Physical state predictionDeepMind Control Suite Cheetah Easy tasks (random policy)
MSE0.1925
12
State PredictionTD-MPC2 policy dataset Cheetah
MSE4.5623
12
Physical state predictionDeepMind Control Suite Reacher Easy tasks (random policy)
MSE0.0972
12
Physical state predictionDeepMind Control Suite Humanoid Easy tasks (random policy)
MSE1.3947
12
Showing 10 of 57 rows

Other info

Follow for update