Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Temporal Difference Learning for Model Predictive Control

About

Data-driven model predictive control has two key advantages over model-free methods: a potential for improved sample efficiency through model learning, and better performance as computational budget for planning increases. However, it is both costly to plan over long horizons and challenging to obtain an accurate model of the environment. In this work, we combine the strengths of model-free and model-based methods. We use a learned task-oriented latent dynamics model for local trajectory optimization over a short horizon, and use a learned terminal value function to estimate long-term return, both of which are learned jointly by temporal difference learning. Our method, TD-MPC, achieves superior sample efficiency and asymptotic performance over prior work on both state and image-based continuous control tasks from DMControl and Meta-World. Code and video results are available at https://nicklashansen.github.io/td-mpc.

Nicklas Hansen, Xiaolong Wang, Hao Su• 2022

Related benchmarks

TaskDatasetResultRank
Continuous ControlDMControl 100k
DMControl: Finger Spin Score943
29
Continuous ControlDMControl Novel view
Episode Reward527.5
8
Continuous ControlDMControl Shaking view
Episode Reward441.8
4
Continuous ControlDMControl Moving view
Episode Reward606
4
Continuous ControlDMControl All settings
Episode Reward492.7
4
Robotic ManipulationxArm Moving view
Success Rate20
4
Robotic ManipulationAdroit Moving view
Success Rate15
4
Robotic ManipulationAdroit Shaking view
Success Rate30
4
Robotic ManipulationAdroit Novel FOV
Success Rate31
4
Robotic ManipulationAdroit All settings
Success Rate0.21
4
Showing 10 of 15 rows

Other info

Follow for update