The Provable Benefits of Unsupervised Data Sharing for Offline Reinforcement Learning
About
Self-supervised methods have become crucial for advancing deep learning by leveraging data itself to reduce the need for expensive annotations. However, the question of how to conduct self-supervised offline reinforcement learning (RL) in a principled way remains unclear. In this paper, we address this issue by investigating the theoretical benefits of utilizing reward-free data in linear Markov Decision Processes (MDPs) within a semi-supervised setting. Further, we propose a novel, Provable Data Sharing algorithm (PDS) to utilize such reward-free data for offline RL. PDS uses additional penalties on the reward function learned from labeled data to prevent overestimation, ensuring a conservative algorithm. Our results on various offline RL tasks demonstrate that PDS significantly improves the performance of offline RL algorithms with reward-free data. Overall, our work provides a promising approach to leveraging the benefits of unlabeled data in offline RL while maintaining theoretical guarantees. We believe our findings will contribute to developing more robust self-supervised RL methods.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Goal Reaching | AntMaze large-play v2 | Success Rate50.6 | 10 | |
| Goal Reaching | AntMaze medium-play v2 | Success Rate66.8 | 10 | |
| Goal Reaching | AntMaze umaze v2 | Success Rate93 | 6 | |
| Goal Reaching | AntMaze large-diverse v2 | Success Rate30 | 6 | |
| Goal Reaching | AntMaze Medium-Diverse v2 | Success Rate22.8 | 6 | |
| Goal Reaching | AntMaze umaze-diverse v2 | Success Rate50.6 | 6 | |
| Goal Reaching | AntMaze Random Cells v2 (large-diverse) | Success Rate58.3 | 4 | |
| Offline Context-conditioned Goal-oriented (CGO) Reinforcement Learning | Random Cells (large-play) | Success Rate48.1 | 4 | |
| Goal Reaching | AntMaze Random Cells medium-diverse v2 | Success Rate60.9 | 4 | |
| Goal-reaching Navigation | Four Rooms medium-play v1 (test) | Average Success Rate0.46 | 4 |