CEIL: Generalized Contextual Imitation Learning

About

In this paper, we present \textbf{C}ont\textbf{E}xtual \textbf{I}mitation \textbf{L}earning~(CEIL), a general and broadly applicable algorithm for imitation learning (IL). Inspired by the formulation of hindsight information matching, we derive CEIL by explicitly learning a hindsight embedding function together with a contextual policy using the hindsight embeddings. To achieve the expert matching objective for IL, we advocate for optimizing a contextual variable such that it biases the contextual policy towards mimicking expert behaviors. Beyond the typical learning from demonstrations (LfD) setting, CEIL is a generalist that can be effectively applied to multiple settings including: 1)~learning from observations (LfO), 2)~offline IL, 3)~cross-domain IL (mismatched experts), and 4) one-shot IL settings. Empirically, we evaluate CEIL on the popular MuJoCo tasks (online) and the D4RL dataset (offline). Compared to prior state-of-the-art baselines, we show that CEIL is more sample-efficient in most online IL tasks and achieves better or competitive performances in offline tasks.

Jinxin Liu, Li He, Yachen Kang, Zifeng Zhuang, Donglin Wang, Huazhe Xu• 2023

Related benchmarks

Task	Dataset	Result
Offline Reinforcement Learning	D4RL hopper-expert v2	Normalized Score113	66
Offline Reinforcement Learning	D4RL walker2d-expert v2	Normalized Score115.6	66
Offline Reinforcement Learning	D4RL halfcheetah-expert v2	Normalized Score97.1	66
Offline Imitation Learning	D4RL Ant v2 (expert)	Normalized Score126.4	20
Imitation Learning	Hopper one-shot v2	Normalized Score85.6	11
Imitation Learning	HalfCheetah one-shot v2	Normalized Score5.6	11
Imitation Learning	Walker2d one-shot v2	Normalized Score70	11
Imitation Learning	Ant one-shot v2	Normalized Score29.7	11
Cross-domain Offline Imitation Learning from Demonstrations (C-off-LfD)	D4RL MuJoCo reward-free v2 (medium, medium-replay, medium-expert)	Hopper-v2 Return (medium)58.4	7
Single-domain Offline Imitation Learning from Demonstrations (S-off-LfD)	D4RL MuJoCo reward-free v2 (medium, medium-replay, medium-expert)	Hopper-v2 (m) Score110.4	7

Showing 10 of 12 rows

Other info

Code

Follow for update

@wizwand_team Discord