Wasserstein Adversarial Imitation Learning

About

Imitation Learning describes the problem of recovering an expert policy from demonstrations. While inverse reinforcement learning approaches are known to be very sample-efficient in terms of expert demonstrations, they usually require problem-dependent reward functions or a (task-)specific reward-function regularization. In this paper, we show a natural connection between inverse reinforcement learning approaches and Optimal Transport, that enables more general reward functions with desirable properties (e.g., smoothness). Based on our observation, we propose a novel approach called Wasserstein Adversarial Imitation Learning. Our approach considers the Kantorovich potentials as a reward function and further leverages regularized optimal transport to enable large-scale applications. In several robotic experiments, our approach outperforms the baselines in terms of average cumulative rewards and shows a significant improvement in sample-efficiency, by requiring just one expert demonstration.

Huang Xiao, Michael Herman, Joerg Wagner, Sebastian Ziesche, Jalal Etesami, Thai Hong Linh• 2019

Related benchmarks

Task	Dataset	Result
Locomotion	Hopper (test)	Average Return2.61e+3	8
Locomotion	Walker2d (test)	Average Return1.73e+3	8
Manipulation	Fetch-pick (test)	Average Success Rate0.00e+0	8
Navigation	Ant-goal (test)	Average Success Rate61.27	8
Manipulation	Hand-rotate (test)	Average Success Rate23.7	8
Navigation	Maze2D (test)	Average Success Rate29.78	8

Showing 6 of 6 rows

Other info

Follow for update

@wizwand_team Discord