The Laplacian Keyboard: Beyond the Linear Span
About
Across scientific disciplines, Laplacian eigenvectors serve as a fundamental basis for simplifying complex systems, from signal processing to quantum mechanics. In reinforcement learning (RL), these eigenvectors provide a natural basis for approximating reward functions; however, their use is typically limited to their linear span, which restricts expressivity in complex environments. We introduce the Laplacian Keyboard (LK), a hierarchical framework that goes beyond the linear span. LK constructs a task-agnostic library of options from these eigenvectors, forming a behavior basis guaranteed to contain the optimal policy for any reward within the linear span. A meta-policy learns to stitch these options dynamically, enabling efficient learning of policies outside the original linear constraints. We establish theoretical bounds on zero-shot approximation error and demonstrate empirically that LK surpasses zero-shot solutions while achieving improved sample efficiency compared to standard RL methods.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Flip | DMC Walker Average of APS, Proto, RND datasets | Mean Return507 | 3 | |
| Jump | DMC Quadruped Average of APS, Proto, RND datasets | Mean Return554 | 3 | |
| Run | DMC Quadruped Average of APS, Proto, RND datasets | Mean Return366 | 3 | |
| Stand | DMC Quadruped Average of APS, Proto, RND datasets | Mean Return705 | 3 | |
| Walk | DMC Walker Average of APS, Proto, RND datasets | Mean Return890 | 3 | |
| Run | DMC Cheetah Average of APS, Proto, RND datasets | Mean Return196 | 3 | |
| Run | DMC Walker Average of APS, Proto, RND datasets | Mean Return294 | 3 | |
| Run-B | DMC Cheetah Average of APS, Proto, RND datasets | Mean Return188 | 3 | |
| Stand | DMC Walker Average of APS, Proto, RND datasets | Mean Return635 | 3 | |
| Walk | DMC Cheetah Average of APS, Proto, RND datasets | Mean Return709 | 3 |