Mastering Diverse Domains through World Models

About

Developing a general algorithm that learns to solve tasks across a wide range of applications has been a fundamental challenge in artificial intelligence. Although current reinforcement learning algorithms can be readily applied to tasks similar to what they have been developed for, configuring them for new application domains requires significant human expertise and experimentation. We present DreamerV3, a general algorithm that outperforms specialized methods across over 150 diverse tasks, with a single configuration. Dreamer learns a model of the environment and improves its behavior by imagining future scenarios. Robustness techniques based on normalization, balancing, and transformations enable stable learning across domains. Applied out of the box, Dreamer is the first algorithm to collect diamonds in Minecraft from scratch without human data or curricula. This achievement has been posed as a significant challenge in artificial intelligence that requires exploring farsighted strategies from pixels and sparse rewards in an open world. Our work allows solving challenging control problems without extensive experimentation, making reinforcement learning broadly applicable.

Danijar Hafner, Jurgis Pasukonis, Jimmy Ba, Timothy Lillicrap• 2023

Related benchmarks

Task	Dataset	Result
Continuous Control	MuJoCo Ant v4	Average Return1.95e+3	46
Reinforcement Learning	Atari 100k	Alien Score1.08e+3	41
Continuous Control	MuJoCo Walker2d v4	--	39
Continuous Control	MuJoCo HalfCheetah v4	Average Return5.50e+3	36
Locomotion	Dog & Humanoid suite	IQM0.01	32
General Competence	G2U Overall	Average Rank (Overall)10.7	30
Curiosity	G2U Curiosity	Mean1.161	30
Survival	G2U Survival	Mean0.097	30
Utility	G2U Utility	Mean Utility0.298	30
Dexterous Manipulation	MyoSuite	IQM0.466	28

Showing 10 of 174 rows

...

Other info

Follow for update

@wizwand_team Discord