The Provable Benefits of Unsupervised Data Sharing for Offline Reinforcement Learning

About

Self-supervised methods have become crucial for advancing deep learning by leveraging data itself to reduce the need for expensive annotations. However, the question of how to conduct self-supervised offline reinforcement learning (RL) in a principled way remains unclear. In this paper, we address this issue by investigating the theoretical benefits of utilizing reward-free data in linear Markov Decision Processes (MDPs) within a semi-supervised setting. Further, we propose a novel, Provable Data Sharing algorithm (PDS) to utilize such reward-free data for offline RL. PDS uses additional penalties on the reward function learned from labeled data to prevent overestimation, ensuring a conservative algorithm. Our results on various offline RL tasks demonstrate that PDS significantly improves the performance of offline RL algorithms with reward-free data. Overall, our work provides a promising approach to leveraging the benefits of unlabeled data in offline RL while maintaining theoretical guarantees. We believe our findings will contribute to developing more robust self-supervised RL methods.

Hao Hu, Yiqin Yang, Qianchuan Zhao, Chongjie Zhang• 2023

Related benchmarks

Task	Dataset	Result
Reinforcement Learning	Atari Breakout	Mean Return86.3	23
Robotic Manipulation	D4RL Kitchen-Partial	Normalized Score51.1	23
Reinforcement Learning	Atari 2600 Qbert	Score6.84e+3	20
Reinforcement Learning	Atari Pong	Mean Episode Return8.5	19
HalfCheetah	D4RL Medium v0	Normalized Score41.5	19
Robotic Manipulation	D4RL Kitchen-Mixed	Normalized Score44.9	14
Reinforcement Learning	Atari 2600 Seaquest	Average Score2.46e+3	12
Goal Reaching	AntMaze large-play v2	Success Rate50.6	10
Goal Reaching	AntMaze medium-play v2	Success Rate66.8	10
Goal Reaching	AntMaze umaze v2	Success Rate93	6

Showing 10 of 32 rows

Other info

Follow for update

@wizwand_team Discord