The Laplacian in RL: Learning Representations with Efficient Approximations

About

The smallest eigenvectors of the graph Laplacian are well-known to provide a succinct representation of the geometry of a weighted graph. In reinforcement learning (RL), where the weighted graph may be interpreted as the state transition process induced by a behavior policy acting on the environment, approximating the eigenvectors of the Laplacian provides a promising approach to state representation learning. However, existing methods for performing this approximation are ill-suited in general RL settings for two main reasons: First, they are computationally expensive, often requiring operations on large matrices. Second, these methods lack adequate justification beyond simple, tabular, finite-state settings. In this paper, we present a fully general and scalable method for approximating the eigenvectors of the Laplacian in a model-free RL context. We systematically evaluate our approach and empirically show that it generalizes beyond the tabular, finite-state setting. Even in tabular, finite-state settings, its ability to approximate the eigenvectors outperforms previous proposals. Finally, we show the potential benefits of using a Laplacian representation learned using our method in goal-achieving RL tasks, providing evidence that our technique can be used to significantly improve the performance of an RL agent.

Yifan Wu, George Tucker, Ofir Nachum• 2018

Related benchmarks

Task	Dataset	Result
Reinforcement Learning	DMC PointMass	Top Left Score713.5	13
Reinforcement Learning	DMC Quadruped	Run Score413	13
Reinforcement Learning	DMC Cheetah	Run Score96.3	13
Reinforcement Learning	DMC Walker	Walk Score190.5	13
Zero-shot Reinforcement Learning	ExORL RND (Quadruped environment) v1 (test)	Jump Success272	12
Zero-shot Reinforcement Learning	ExORL RND Walker environment v1 (test)	Flip71	12
Goal-conditioned Reinforcement Learning	OGBench scene play (5 tasks) zero-shot	Average Return4	10
Visual Control	ExORL Cheetah Zero-shot RND	Walk Score294	8
Visual Control	ExORL Jaco Zero-shot RND	Reach Top Left25	8
Goal-conditioned Reinforcement Learning	OGBench cube single play (5 tasks) zero-shot	Average Return6	6

Showing 10 of 20 rows

Other info

Follow for update

@wizwand_team Discord