Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

CEIL: Generalized Contextual Imitation Learning

About

In this paper, we present \textbf{C}ont\textbf{E}xtual \textbf{I}mitation \textbf{L}earning~(CEIL), a general and broadly applicable algorithm for imitation learning (IL). Inspired by the formulation of hindsight information matching, we derive CEIL by explicitly learning a hindsight embedding function together with a contextual policy using the hindsight embeddings. To achieve the expert matching objective for IL, we advocate for optimizing a contextual variable such that it biases the contextual policy towards mimicking expert behaviors. Beyond the typical learning from demonstrations (LfD) setting, CEIL is a generalist that can be effectively applied to multiple settings including: 1)~learning from observations (LfO), 2)~offline IL, 3)~cross-domain IL (mismatched experts), and 4) one-shot IL settings. Empirically, we evaluate CEIL on the popular MuJoCo tasks (online) and the D4RL dataset (offline). Compared to prior state-of-the-art baselines, we show that CEIL is more sample-efficient in most online IL tasks and achieves better or competitive performances in offline tasks.

Jinxin Liu, Li He, Yachen Kang, Zifeng Zhuang, Donglin Wang, Huazhe Xu• 2023

Related benchmarks

TaskDatasetResultRank
Offline Reinforcement LearningD4RL hopper-expert v2
Normalized Score113
56
Offline Reinforcement LearningD4RL walker2d-expert v2
Normalized Score115.6
56
Offline Reinforcement LearningD4RL halfcheetah-expert v2
Normalized Score97.1
56
Offline Imitation LearningD4RL Ant v2 (expert)
Normalized Score126.4
20
Imitation LearningHopper one-shot v2
Normalized Score85.6
11
Imitation LearningHalfCheetah one-shot v2
Normalized Score5.6
11
Imitation LearningWalker2d one-shot v2
Normalized Score70
11
Imitation LearningAnt one-shot v2
Normalized Score29.7
11
Cross-domain Offline Imitation Learning from Demonstrations (C-off-LfD)D4RL MuJoCo reward-free v2 (medium, medium-replay, medium-expert)
Hopper-v2 Return (medium)58.4
7
Single-domain Offline Imitation Learning from Demonstrations (S-off-LfD)D4RL MuJoCo reward-free v2 (medium, medium-replay, medium-expert)
Hopper-v2 (m) Score110.4
7
Showing 10 of 12 rows

Other info

Code

Follow for update