TD-MPC2: Scalable, Robust World Models for Continuous Control

About

TD-MPC is a model-based reinforcement learning (RL) algorithm that performs local trajectory optimization in the latent space of a learned implicit (decoder-free) world model. In this work, we present TD-MPC2: a series of improvements upon the TD-MPC algorithm. We demonstrate that TD-MPC2 improves significantly over baselines across 104 online RL tasks spanning 4 diverse task domains, achieving consistently strong results with a single set of hyperparameters. We further show that agent capabilities increase with model and data size, and successfully train a single 317M parameter agent to perform 80 tasks across multiple task domains, embodiments, and action spaces. We conclude with an account of lessons, opportunities, and risks associated with large TD-MPC2 agents. Explore videos, models, data, code, and more at https://tdmpc2.com

Nicklas Hansen, Hao Su, Xiaolong Wang• 2023

Related benchmarks

Task	Dataset	Result
Continuous Control	MuJoCo Ant v4	Average Return4.75e+3	46
Continuous Control	MuJoCo Walker2d v4	--	39
Continuous Control	MuJoCo HalfCheetah v4	Average Return1.51e+4	36
Locomotion	Dog & Humanoid suite	IQM0.527	32
Humanoid Locomotion and Manipulation	HumanoidBench	IQM0.734	28
Dexterous Manipulation	MyoSuite	IQM0.775	28
Continuous Control	Gym MuJoCo Humanoid v4	Average Return6.07e+3	15
Continuous Control	Gym MuJoCo Suite Aggregate	IQM1.05	15
Continuous Control	Gym MuJoCo Hopper v4	Average Return2.08e+3	15
Robotic Manipulation	Meta-World v2	Success Rate60	14

Showing 10 of 101 rows

...

Other info

Follow for update

@wizwand_team Discord