Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

TD-MPC2: Scalable, Robust World Models for Continuous Control

About

TD-MPC is a model-based reinforcement learning (RL) algorithm that performs local trajectory optimization in the latent space of a learned implicit (decoder-free) world model. In this work, we present TD-MPC2: a series of improvements upon the TD-MPC algorithm. We demonstrate that TD-MPC2 improves significantly over baselines across 104 online RL tasks spanning 4 diverse task domains, achieving consistently strong results with a single set of hyperparameters. We further show that agent capabilities increase with model and data size, and successfully train a single 317M parameter agent to perform 80 tasks across multiple task domains, embodiments, and action spaces. We conclude with an account of lessons, opportunities, and risks associated with large TD-MPC2 agents. Explore videos, models, data, code, and more at https://tdmpc2.com

Nicklas Hansen, Hao Su, Xiaolong Wang• 2023

Related benchmarks

TaskDatasetResultRank
LocomotionDog & Humanoid suite
IQM0.527
32
Humanoid Locomotion and ManipulationHumanoidBench
IQM0.734
28
Dexterous ManipulationMyoSuite
IQM0.775
28
Robotic ManipulationMeta-World v2
Success Rate60
14
Continuous ControlDeepMind Control (DMC) Suite (1M steps)
IQM69.6
14
LocomotionHumanoid-Bench Stand (test)
Return749.8
11
Continuous ControlHumanoidBench No Hand
Total Reward580
8
Continuous ControlDeepMind Control (DMC) Suite 200k steps
IQM37.4
8
Continuous ControlDeepMind Control (DMC) Suite 500k steps
IQM56.6
8
Continuous ControlDeepMind Control (DMC) Suite (100k steps)
IQM0.152
8
Showing 10 of 75 rows
...

Other info

Follow for update