Generative Adversarial Imitation Learning

About

Consider learning a policy from example expert behavior, without interaction with the expert or access to reinforcement signal. One approach is to recover the expert's cost function with inverse reinforcement learning, then extract a policy from that cost function with reinforcement learning. This approach is indirect and can be slow. We propose a new general framework for directly extracting a policy from data, as if it were obtained by reinforcement learning following inverse reinforcement learning. We show that a certain instantiation of our framework draws an analogy between imitation learning and generative adversarial networks, from which we derive a model-free imitation learning algorithm that obtains significant performance gains over existing model-free methods in imitating complex behaviors in large, high-dimensional environments.

Jonathan Ho, Stefano Ermon• 2016

Related benchmarks

Task	Dataset	Result
Continuous Control	MuJoCo Ant	Average Reward4.00e+3	26
Continuous Control	MuJoCo HalfCheetah	Average Reward4.28e+3	25
Push Slot	Push Slot 1 Grid 1.0 (test)	Mean Success Rate71	24
Imitation Learning	Mujoco	Hopper Reward7.78	15
Imitation Learning	Dataset 5	MAE4.9	13
Imitation Learning	Dataset 4	MAE3.1	13
Inverse Reinforcement Learning	Dataset 4	MSE25.3	13
Molecule Design	Molecule Design 1,500 samples (train)	Reward (R-10)7.528	13
Imitation Learning	Dataset-1	MAE3.75	13
Imitation Learning	Dataset 3	MAE3.62	13

Showing 10 of 54 rows

Other info

Follow for update

@wizwand_team Discord