Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Self-Supervised Reinforcement Learning that Transfers using Random Features

About

Model-free reinforcement learning algorithms have exhibited great potential in solving single-task sequential decision-making problems with high-dimensional observations and long horizons, but are known to be hard to generalize across tasks. Model-based RL, on the other hand, learns task-agnostic models of the world that naturally enables transfer across different reward functions, but struggles to scale to complex environments due to the compounding error. To get the best of both worlds, we propose a self-supervised reinforcement learning method that enables the transfer of behaviors across tasks with different rewards, while circumventing the challenges of model-based RL. In particular, we show self-supervised pre-training of model-free reinforcement learning with a number of random features as rewards allows implicit modeling of long-horizon environment dynamics. Then, planning techniques like model-predictive control using these implicit models enable fast adaptation to problems with new reward functions. Our method is self-supervised in that it can be trained on offline datasets without reward labels, but can then be quickly deployed on new tasks. We validate that our proposed method enables transfer across tasks on a variety of manipulation and locomotion domains in simulation, opening the door to generalist decision-making agents.

Boyuan Chen, Chuning Zhu, Pulkit Agrawal, Kaiqing Zhang, Abhishek Gupta• 2023

Related benchmarks

TaskDatasetResultRank
Offline multitask Reinforcement LearningFranka Kitchen kitchen-mixed
Average Episodic Return0.00e+0
23
Offline multitask Reinforcement LearningFranka Kitchen kitchen-partial
Average Episodic Return0.00e+0
13
Reinforcement LearningHopper (forward)
Average Episodic Return652
12
Offline multitask Reinforcement LearningHopper backward
Average Episodic Return220
12
Reinforcement LearningAntMaze medium-play D4RL
Average Episodic Return271
8
Reinforcement LearningAntMaze umaze D4RL
Average Episodic Return459
8
Reinforcement LearningAntMaze umaze-diverse D4RL
Average Episodic Return460
8
Reinforcement LearningAntMaze medium-diverse D4RL
Avg Episodic Return266
8
Reinforcement LearningAntMaze large-diverse D4RL
Average Episodic Return132
8
Reinforcement LearningAntMaze large-play D4RL
Average Episodic Return134
8
Showing 10 of 20 rows

Other info

Follow for update