Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

The Laplacian in RL: Learning Representations with Efficient Approximations

About

The smallest eigenvectors of the graph Laplacian are well-known to provide a succinct representation of the geometry of a weighted graph. In reinforcement learning (RL), where the weighted graph may be interpreted as the state transition process induced by a behavior policy acting on the environment, approximating the eigenvectors of the Laplacian provides a promising approach to state representation learning. However, existing methods for performing this approximation are ill-suited in general RL settings for two main reasons: First, they are computationally expensive, often requiring operations on large matrices. Second, these methods lack adequate justification beyond simple, tabular, finite-state settings. In this paper, we present a fully general and scalable method for approximating the eigenvectors of the Laplacian in a model-free RL context. We systematically evaluate our approach and empirically show that it generalizes beyond the tabular, finite-state setting. Even in tabular, finite-state settings, its ability to approximate the eigenvectors outperforms previous proposals. Finally, we show the potential benefits of using a Laplacian representation learned using our method in goal-achieving RL tasks, providing evidence that our technique can be used to significantly improve the performance of an RL agent.

Yifan Wu, George Tucker, Ofir Nachum• 2018

Related benchmarks

TaskDatasetResultRank
Goal-conditioned Reinforcement LearningOGBench scene play (5 tasks) zero-shot
Average Return4
10
Goal-conditioned Reinforcement LearningOGBench cube single play (5 tasks) zero-shot
Average Return6
6
Unsupervised Reinforcement LearningExORL quadruped zero-shot
Average Return462
6
Unsupervised Reinforcement LearningExORL walker (4 tasks) zero-shot
Average Return228
6
Unsupervised Reinforcement LearningExORL cheetah (4 tasks) zero-shot
Average Return125
6
Unsupervised Reinforcement LearningExORL jaco (4 tasks) zero-shot
Average Return3
6
Goal-conditioned Reinforcement LearningOGBench antmaze large navigate (5 tasks) zero-shot
Avg Return9
6
Goal-conditioned Reinforcement LearningOGBench antmaze teleport navigate (5 tasks) zero-shot
Average Return3
6
Showing 8 of 8 rows

Other info

Follow for update