Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

The Provable Benefits of Unsupervised Data Sharing for Offline Reinforcement Learning

About

Self-supervised methods have become crucial for advancing deep learning by leveraging data itself to reduce the need for expensive annotations. However, the question of how to conduct self-supervised offline reinforcement learning (RL) in a principled way remains unclear. In this paper, we address this issue by investigating the theoretical benefits of utilizing reward-free data in linear Markov Decision Processes (MDPs) within a semi-supervised setting. Further, we propose a novel, Provable Data Sharing algorithm (PDS) to utilize such reward-free data for offline RL. PDS uses additional penalties on the reward function learned from labeled data to prevent overestimation, ensuring a conservative algorithm. Our results on various offline RL tasks demonstrate that PDS significantly improves the performance of offline RL algorithms with reward-free data. Overall, our work provides a promising approach to leveraging the benefits of unlabeled data in offline RL while maintaining theoretical guarantees. We believe our findings will contribute to developing more robust self-supervised RL methods.

Hao Hu, Yiqin Yang, Qianchuan Zhao, Chongjie Zhang• 2023

Related benchmarks

TaskDatasetResultRank
Goal ReachingAntMaze large-play v2
Success Rate50.6
10
Goal ReachingAntMaze medium-play v2
Success Rate66.8
10
Goal ReachingAntMaze umaze v2
Success Rate93
6
Goal ReachingAntMaze large-diverse v2
Success Rate30
6
Goal ReachingAntMaze Medium-Diverse v2
Success Rate22.8
6
Goal ReachingAntMaze umaze-diverse v2
Success Rate50.6
6
Goal ReachingAntMaze Random Cells v2 (large-diverse)
Success Rate58.3
4
Offline Context-conditioned Goal-oriented (CGO) Reinforcement LearningRandom Cells (large-play)
Success Rate48.1
4
Goal ReachingAntMaze Random Cells medium-diverse v2
Success Rate60.9
4
Goal-reaching NavigationFour Rooms medium-play v1 (test)
Average Success Rate0.46
4
Showing 10 of 16 rows

Other info

Follow for update