Discriminator-Actor-Critic: Addressing Sample Inefficiency and Reward Bias in Adversarial Imitation Learning

About

We identify two issues with the family of algorithms based on the Adversarial Imitation Learning framework. The first problem is implicit bias present in the reward functions used in these algorithms. While these biases might work well for some environments, they can also lead to sub-optimal behavior in others. Secondly, even though these algorithms can learn from few expert demonstrations, they require a prohibitively large number of interactions with the environment in order to imitate the expert for many real-world applications. In order to address these issues, we propose a new algorithm called Discriminator-Actor-Critic that uses off-policy Reinforcement Learning to reduce policy-environment interaction sample complexity by an average factor of 10. Furthermore, since our reward function is designed to be unbiased, we can apply our algorithm to many problems without making any task-specific adjustments.

Ilya Kostrikov, Kumar Krishna Agrawal, Debidatta Dwibedi, Sergey Levine, Jonathan Tompson• 2018

Related benchmarks

Task	Dataset	Result
Imitation Learning	Mujoco	Hopper Reward109.5	15
Imitation Learning	DeepMind Control Suite image-based	Cartpole Score0.08	6
Single-life task completion	cheetah	Average Steps9.92e+4	5
Single-life task completion	Pointmass	Average Steps1.01e+5	5
Single-life task completion	Kitchen	Avg Steps1.11e+5	5
Single-life task completion	Tabletop	Avg Steps8.32e+4	5
Imitation Learning	MuJoCo (test)	Hopper Score3.31e+3	5
Imitation Learning	DeepMind Control Suite state-based	Cartpole Score13	5

Showing 8 of 8 rows

Other info

Follow for update

@wizwand_team Discord