Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

BAIL: Best-Action Imitation Learning for Batch Deep Reinforcement Learning

About

There has recently been a surge in research in batch Deep Reinforcement Learning (DRL), which aims for learning a high-performing policy from a given dataset without additional interactions with the environment. We propose a new algorithm, Best-Action Imitation Learning (BAIL), which strives for both simplicity and performance. BAIL learns a V function, uses the V function to select actions it believes to be high-performing, and then uses those actions to train a policy network using imitation learning. For the MuJoCo benchmark, we provide a comprehensive experimental study of BAIL, comparing its performance to four other batch Q-learning and imitation-learning schemes for a large variety of batch datasets. Our experiments show that BAIL's performance is much higher than the other schemes, and is also computationally much faster than the batch Q-learning schemes.

Xinyue Chen, Zijian Zhou, Zheng Wang, Che Wang, Yanqiu Wu, Keith Ross• 2019

Related benchmarks

TaskDatasetResultRank
Offline Reinforcement LearningD4RL halfcheetah-medium-expert
Normalized Score92.6
117
Offline Reinforcement LearningD4RL hopper-medium-expert
Normalized Score106.4
115
Offline Reinforcement LearningD4RL walker2d-medium-expert
Normalized Score75.7
86
Offline Reinforcement LearningD4RL walker2d-random
Normalized Score2.4
77
Offline Reinforcement LearningD4RL halfcheetah-random
Normalized Score2.2
70
Offline Reinforcement LearningD4RL hopper-random
Normalized Score8
62
Offline Reinforcement LearningD4RL Gym walker2d (medium-replay)
Normalized Return51.4
52
Offline Reinforcement LearningD4RL Gym walker2d medium
Normalized Return68.8
42
Offline Reinforcement LearningD4RL antmaze-umaze (diverse)
Normalized Score52
40
Offline Reinforcement LearningD4RL antmaze-large (play)
Normalized Score2.2
26
Showing 10 of 26 rows

Other info

Follow for update