Temporal Difference Learning for Model Predictive Control

About

Data-driven model predictive control has two key advantages over model-free methods: a potential for improved sample efficiency through model learning, and better performance as computational budget for planning increases. However, it is both costly to plan over long horizons and challenging to obtain an accurate model of the environment. In this work, we combine the strengths of model-free and model-based methods. We use a learned task-oriented latent dynamics model for local trajectory optimization over a short horizon, and use a learned terminal value function to estimate long-term return, both of which are learned jointly by temporal difference learning. Our method, TD-MPC, achieves superior sample efficiency and asymptotic performance over prior work on both state and image-based continuous control tasks from DMControl and Meta-World. Code and video results are available at https://nicklashansen.github.io/td-mpc.

Nicklas Hansen, Xiaolong Wang, Hao Su• 2022

Related benchmarks

Task	Dataset	Result
Continuous Control	DMControl 100k	DMControl: Finger Spin Score943	38
Complex Control	complex-control target suite (test)	AntMaze AUC60	18
Continuous Control	DMControl Novel view	Episode Reward527.5	8
Trajectory tracking	Excavator Simulation Closed-loop	Angular Error (oSMT) 1st0.94	6
Trajectory tracking	Excavator Simulation Open-loop	Total Steps/Count2.41e+5	5
Continuous Control	DMControl Shaking view	Episode Reward441.8	4
Continuous Control	DMControl Moving view	Episode Reward606	4
Continuous Control	DMControl All settings	Episode Reward492.7	4
Robotic Manipulation	xArm Moving view	Success Rate20	4
Robotic Manipulation	Adroit Moving view	Success Rate15	4

Showing 10 of 18 rows

Other info

Follow for update

@wizwand_team Discord