Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

D4RL: Datasets for Deep Data-Driven Reinforcement Learning

About

The offline reinforcement learning (RL) setting (also known as full batch RL), where a policy is learned from a static dataset, is compelling as progress enables RL methods to take advantage of large, previously-collected datasets, much like how the rise of large datasets has fueled results in supervised learning. However, existing online RL benchmarks are not tailored towards the offline setting and existing offline RL benchmarks are restricted to data generated by partially-trained agents, making progress in offline RL difficult to measure. In this work, we introduce benchmarks specifically designed for the offline setting, guided by key properties of datasets relevant to real-world applications of offline RL. With a focus on dataset collection, examples of such properties include: datasets generated via hand-designed controllers and human demonstrators, multitask datasets where an agent performs different tasks in the same environment, and datasets collected with mixtures of policies. By moving beyond simple benchmark tasks and data collected by partially-trained RL agents, we reveal important and unappreciated deficiencies of existing algorithms. To facilitate research, we have released our benchmark tasks and datasets with a comprehensive evaluation of existing algorithms, an evaluation protocol, and open-source examples. This serves as a common starting point for the community to identify shortcomings in existing offline RL methods and a collaborative route for progress in this emerging area.

Justin Fu, Aviral Kumar, Ofir Nachum, George Tucker, Sergey Levine• 2020

Related benchmarks

TaskDatasetResultRank
Offline Reinforcement LearningD4RL halfcheetah-medium-expert
Normalized Score64.7
117
Offline Reinforcement LearningD4RL hopper-medium-expert
Normalized Score111.9
115
Offline Reinforcement LearningD4RL walker2d-medium-expert
Normalized Score111
86
Offline Reinforcement LearningD4RL walker2d-random
Normalized Score7.3
77
Offline Reinforcement LearningD4RL halfcheetah-random
Normalized Score35.4
70
Offline Reinforcement LearningD4RL hopper-random
Normalized Score12.2
62
hopper locomotionD4RL hopper medium-replay
Normalized Score33.7
56
walker2d locomotionD4RL walker2d medium-replay
Normalized Score19.2
53
Offline Reinforcement Learningwalker2d medium-replay
Normalized Score26.7
50
Offline Reinforcement Learninghopper medium-replay
Normalized Score48.6
44
Showing 10 of 53 rows

Other info

Follow for update