The Laplacian in RL: Learning Representations with Efficient Approximations
About
The smallest eigenvectors of the graph Laplacian are well-known to provide a succinct representation of the geometry of a weighted graph. In reinforcement learning (RL), where the weighted graph may be interpreted as the state transition process induced by a behavior policy acting on the environment, approximating the eigenvectors of the Laplacian provides a promising approach to state representation learning. However, existing methods for performing this approximation are ill-suited in general RL settings for two main reasons: First, they are computationally expensive, often requiring operations on large matrices. Second, these methods lack adequate justification beyond simple, tabular, finite-state settings. In this paper, we present a fully general and scalable method for approximating the eigenvectors of the Laplacian in a model-free RL context. We systematically evaluate our approach and empirically show that it generalizes beyond the tabular, finite-state setting. Even in tabular, finite-state settings, its ability to approximate the eigenvectors outperforms previous proposals. Finally, we show the potential benefits of using a Laplacian representation learned using our method in goal-achieving RL tasks, providing evidence that our technique can be used to significantly improve the performance of an RL agent.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Goal-conditioned Reinforcement Learning | OGBench scene play (5 tasks) zero-shot | Average Return4 | 10 | |
| Goal-conditioned Reinforcement Learning | OGBench cube single play (5 tasks) zero-shot | Average Return6 | 6 | |
| Unsupervised Reinforcement Learning | ExORL quadruped zero-shot | Average Return462 | 6 | |
| Unsupervised Reinforcement Learning | ExORL walker (4 tasks) zero-shot | Average Return228 | 6 | |
| Unsupervised Reinforcement Learning | ExORL cheetah (4 tasks) zero-shot | Average Return125 | 6 | |
| Unsupervised Reinforcement Learning | ExORL jaco (4 tasks) zero-shot | Average Return3 | 6 | |
| Goal-conditioned Reinforcement Learning | OGBench antmaze large navigate (5 tasks) zero-shot | Avg Return9 | 6 | |
| Goal-conditioned Reinforcement Learning | OGBench antmaze teleport navigate (5 tasks) zero-shot | Average Return3 | 6 |