Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Generative Adversarial Imitation Learning

About

Consider learning a policy from example expert behavior, without interaction with the expert or access to reinforcement signal. One approach is to recover the expert's cost function with inverse reinforcement learning, then extract a policy from that cost function with reinforcement learning. This approach is indirect and can be slow. We propose a new general framework for directly extracting a policy from data, as if it were obtained by reinforcement learning following inverse reinforcement learning. We show that a certain instantiation of our framework draws an analogy between imitation learning and generative adversarial networks, from which we derive a model-free imitation learning algorithm that obtains significant performance gains over existing model-free methods in imitating complex behaviors in large, high-dimensional environments.

Jonathan Ho, Stefano Ermon• 2016

Related benchmarks

TaskDatasetResultRank
Continuous ControlMuJoCo Ant
Average Reward4.00e+3
12
Continuous ControlMuJoCo HalfCheetah
Average Reward4.28e+3
12
Imputation and Heartbeat DetectionmHealth ECG
MSE0.0571
10
Imputation and Heartbeat DetectionmHealth PPG
MSE0.1102
10
Inverse Optimal ControlDense Tabular MDP (Nominal)
Return0.253
8
Inverse Optimal ControlDense Tabular MDP Windy
Return0.105
8
Inverse Optimal ControlSparse Tabular MDP Nominal
Return1.237
8
Inverse Optimal ControlSparse Tabular MDP Windy
Return0.043
8
Imitation LearningAnt
Mean Score1.36e+3
6
Imitation LearningSwimmer
Mean Score140.2
6
Showing 10 of 22 rows

Other info

Follow for update