Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Benchmarking Batch Deep Reinforcement Learning Algorithms

About

Widely-used deep reinforcement learning algorithms have been shown to fail in the batch setting--learning from a fixed data set without interaction with the environment. Following this result, there have been several papers showing reasonable performances under a variety of environments and batch settings. In this paper, we benchmark the performance of recent off-policy and batch reinforcement learning algorithms under unified settings on the Atari domain, with data generated by a single partially-trained behavioral policy. We find that under these conditions, many of these algorithms underperform DQN trained online with the same amount of data, as well as the partially-trained behavioral policy. To introduce a strong baseline, we adapt the Batch-Constrained Q-learning algorithm to a discrete-action setting, and show it outperforms all existing algorithms at this task.

Scott Fujimoto, Edoardo Conti, Mohammad Ghavamzadeh, Joelle Pineau• 2019

Related benchmarks

TaskDatasetResultRank
Sudoku SolvingSudoku 2x2
Final Reward1.3
14
PointNavMetaUrban 12K (Unseen)
Success Rate (SR)60
9
PointNavMetaUrban 12K (test)
Success Rate (SR)60
9
SocialNavMetaUrban 12K (test)
Success Rate (SR)17
9
SocialNavMetaUrban 12K (Unseen)
Success Rate (SR)8
9
Constrained Reinforcement LearningGRID
Episodic Reward276.3
8
human-robot task planning and allocationHRTPA H1 R2 (test)
Makespan1.45e+3
8
human-robot task planning and allocationHRTPA H1,R3 (test)
Makespan1.47e+3
8
human-robot task planning and allocationHRTPA H2,R3 (test)
Makespan1.17e+3
8
human-robot task planning and allocationHRTPA H3,R2 (test)
Makespan1.12e+3
8
Showing 10 of 91 rows
...

Other info

Follow for update