Neural Network Dynamics for Model-Based Deep Reinforcement Learning with Model-Free Fine-Tuning

About

Model-free deep reinforcement learning algorithms have been shown to be capable of learning a wide range of robotic skills, but typically require a very large number of samples to achieve good performance. Model-based algorithms, in principle, can provide for much more efficient learning, but have proven difficult to extend to expressive, high-capacity models such as deep neural networks. In this work, we demonstrate that medium-sized neural network models can in fact be combined with model predictive control (MPC) to achieve excellent sample complexity in a model-based reinforcement learning algorithm, producing stable and plausible gaits to accomplish various complex locomotion tasks. We also propose using deep neural network dynamics models to initialize a model-free learner, in order to combine the sample efficiency of model-based approaches with the high task-specific performance of model-free methods. We empirically demonstrate on MuJoCo locomotion tasks that our pure model-based approach trained on just random action data can follow arbitrary trajectories with excellent sample efficiency, and that our hybrid algorithm can accelerate model-free learning on high-speed benchmark tasks, achieving sample efficiency gains of 3-5x on swimmer, cheetah, hopper, and ant agents. Videos can be found at https://sites.google.com/view/mbmf

Anusha Nagabandi, Gregory Kahn, Ronald S. Fearing, Sergey Levine• 2017

Related benchmarks

Task	Dataset	Result
Continuous Control	BipedalWalker v3	Episodic Cumulative Reward219.6	15
Continuous Control	HalfCheetah v4	Max Average Return4.13e+3	12
Continuous Control	Pendulum v1	Average Cumulative Reward-188.3	11
Continuous Control	Humanoid v4	Average Cumulative Reward2.78e+3	7
Robotic Control	Pendulum v1	Local Optima Escape Rate42.5	7
Robotic Control	BipedalWalker v3	Local Optima Escape Rate38.3	7
Robotic Control	HalfCheetah v4	Local Optima Escape Rate31.4	7
Robotic Control	Humanoid v4	Local Optima Escape Rate24.9	7
Power System Control	IEEE 39-bus New England test system critical disturbances simulation	Constraint Violations (%)6.2	6
UAV Obstacle Avoidance	UAV Obstacle Avoidance environment 100 trials (test)	Success Rate68.2	6

Showing 10 of 10 rows

Other info

Follow for update

@wizwand_team Discord