Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Offline Imitation Learning with Variational Counterfactual Reasoning

About

In offline imitation learning (IL), an agent aims to learn an optimal expert behavior policy without additional online environment interactions. However, in many real-world scenarios, such as robotics manipulation, the offline dataset is collected from suboptimal behaviors without rewards. Due to the scarce expert data, the agents usually suffer from simply memorizing poor trajectories and are vulnerable to variations in the environments, lacking the capability of generalizing to new environments. To automatically generate high-quality expert data and improve the generalization ability of the agent, we propose a framework named \underline{O}ffline \underline{I}mitation \underline{L}earning with \underline{C}ounterfactual data \underline{A}ugmentation (OILCA) by doing counterfactual inference. In particular, we leverage identifiable variational autoencoder to generate \textit{counterfactual} samples for expert data augmentation. We theoretically analyze the influence of the generated expert data and the improvement of generalization. Moreover, we conduct extensive experiments to demonstrate that our approach significantly outperforms various baselines on both \textsc{DeepMind Control Suite} benchmark for in-distribution performance and \textsc{CausalWorld} benchmark for out-of-distribution generalization. Our code is available at \url{https://github.com/ZexuSun/OILCA-NeurIPS23}.

Bowei He, Zexu Sun, Jinxin Liu, Shuai Zhang, Xu Chen, Chen Ma• 2023

Related benchmarks

TaskDatasetResultRank
Stacking2CAUSALWORLD (in-distribution (Space A to Space A))
Average Return3.21e+3
14
Cartpole SwingupDeepMind Control Suite (in-distribution)
Average Return608.4
7
Cheetah RunDeepMind Control Suite (in-distribution)
Average Return116
7
Creative Stacked BlocksCAUSALWORLD (in-distribution (Space A to Space A))
Average Return1.48e+3
7
Creative Stacked BlocksCAUSALWORLD space B
Average Return1.35e+3
7
Finger Turn hardDeepMind Control Suite (in-distribution)
Average Return298.7
7
Fish SwimDeepMind Control Suite (in-distribution)
Average Return290.3
7
GeneralCAUSALWORLD space B
Average Return891.1
7
Humanoid RunDeepMind Control Suite (in-distribution)
Average Return461
7
Manipulator Insert BallDeepMind Control Suite (in-distribution)
Average Return296.8
7
Showing 10 of 26 rows

Other info

Code

Follow for update