Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

The Laplacian in RL: Learning Representations with Efficient Approximations

About

The smallest eigenvectors of the graph Laplacian are well-known to provide a succinct representation of the geometry of a weighted graph. In reinforcement learning (RL), where the weighted graph may be interpreted as the state transition process induced by a behavior policy acting on the environment, approximating the eigenvectors of the Laplacian provides a promising approach to state representation learning. However, existing methods for performing this approximation are ill-suited in general RL settings for two main reasons: First, they are computationally expensive, often requiring operations on large matrices. Second, these methods lack adequate justification beyond simple, tabular, finite-state settings. In this paper, we present a fully general and scalable method for approximating the eigenvectors of the Laplacian in a model-free RL context. We systematically evaluate our approach and empirically show that it generalizes beyond the tabular, finite-state setting. Even in tabular, finite-state settings, its ability to approximate the eigenvectors outperforms previous proposals. Finally, we show the potential benefits of using a Laplacian representation learned using our method in goal-achieving RL tasks, providing evidence that our technique can be used to significantly improve the performance of an RL agent.

Yifan Wu, George Tucker, Ofir Nachum• 2018

Related benchmarks

TaskDatasetResultRank
Reinforcement LearningDMC PointMass
Top Left Score713.5
13
Reinforcement LearningDMC Quadruped
Run Score413
13
Reinforcement LearningDMC Cheetah
Run Score96.3
13
Reinforcement LearningDMC Walker
Walk Score190.5
13
Zero-shot Reinforcement LearningExORL RND (Quadruped environment) v1 (test)
Jump Success272
12
Zero-shot Reinforcement LearningExORL RND Walker environment v1 (test)
Flip71
12
Goal-conditioned Reinforcement LearningOGBench scene play (5 tasks) zero-shot
Average Return4
10
Visual ControlExORL Cheetah Zero-shot RND
Walk Score294
8
Visual ControlExORL Jaco Zero-shot RND
Reach Top Left25
8
Goal-conditioned Reinforcement LearningOGBench cube single play (5 tasks) zero-shot
Average Return6
6
Showing 10 of 20 rows

Other info

Follow for update