Generative Adversarial Imitation Learning
About
Consider learning a policy from example expert behavior, without interaction with the expert or access to reinforcement signal. One approach is to recover the expert's cost function with inverse reinforcement learning, then extract a policy from that cost function with reinforcement learning. This approach is indirect and can be slow. We propose a new general framework for directly extracting a policy from data, as if it were obtained by reinforcement learning following inverse reinforcement learning. We show that a certain instantiation of our framework draws an analogy between imitation learning and generative adversarial networks, from which we derive a model-free imitation learning algorithm that obtains significant performance gains over existing model-free methods in imitating complex behaviors in large, high-dimensional environments.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Continuous Control | MuJoCo Ant | Average Reward4.00e+3 | 12 | |
| Continuous Control | MuJoCo HalfCheetah | Average Reward4.28e+3 | 12 | |
| Imputation and Heartbeat Detection | mHealth ECG | MSE0.0571 | 10 | |
| Imputation and Heartbeat Detection | mHealth PPG | MSE0.1102 | 10 | |
| Inverse Optimal Control | Dense Tabular MDP (Nominal) | Return0.253 | 8 | |
| Inverse Optimal Control | Dense Tabular MDP Windy | Return0.105 | 8 | |
| Inverse Optimal Control | Sparse Tabular MDP Nominal | Return1.237 | 8 | |
| Inverse Optimal Control | Sparse Tabular MDP Windy | Return0.043 | 8 | |
| Imitation Learning | Ant | Mean Score1.36e+3 | 6 | |
| Imitation Learning | Swimmer | Mean Score140.2 | 6 |