The Laplacian Keyboard: Beyond the Linear Span
About
Across scientific disciplines, Laplacian eigenvectors serve as a fundamental basis for simplifying complex systems, from signal processing to quantum mechanics. In reinforcement learning (RL), they similarly form a basis over the state space, enabling reward functions to be approximated by projection onto a small set of eigenvectors. This projection makes zero-shot control possible, but it also imposes a fundamental limitation: the induced policies are only as expressive as the linear span of the chosen eigenvectors. We introduce the Laplacian Keyboard (LK), a hierarchical framework that goes beyond this linear span. LK constructs a task-agnostic library of behaviors from these eigenvectors, forming a behavior basis guaranteed to contain the optimal policy for any reward within the linear span. A meta-policy learns to stitch these behaviors dynamically, enabling efficient learning of policies outside the original linear constraints. We establish theoretical bounds on zero-shot approximation error and demonstrate empirically that LK improves over the zero-shot solution while achieving better sample efficiency compared to standard RL methods.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Flip | DMC Walker Average of APS, Proto, RND datasets | Mean Return507 | 3 | |
| Jump | DMC Quadruped Average of APS, Proto, RND datasets | Mean Return554 | 3 | |
| Run | DMC Quadruped Average of APS, Proto, RND datasets | Mean Return366 | 3 | |
| Stand | DMC Quadruped Average of APS, Proto, RND datasets | Mean Return705 | 3 | |
| Walk | DMC Walker Average of APS, Proto, RND datasets | Mean Return890 | 3 | |
| Run | DMC Cheetah Average of APS, Proto, RND datasets | Mean Return196 | 3 | |
| Run | DMC Walker Average of APS, Proto, RND datasets | Mean Return294 | 3 | |
| Run-B | DMC Cheetah Average of APS, Proto, RND datasets | Mean Return188 | 3 | |
| Stand | DMC Walker Average of APS, Proto, RND datasets | Mean Return635 | 3 | |
| Walk | DMC Cheetah Average of APS, Proto, RND datasets | Mean Return709 | 3 |