Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

The Laplacian Keyboard: Beyond the Linear Span

About

Across scientific disciplines, Laplacian eigenvectors serve as a fundamental basis for simplifying complex systems, from signal processing to quantum mechanics. In reinforcement learning (RL), these eigenvectors provide a natural basis for approximating reward functions; however, their use is typically limited to their linear span, which restricts expressivity in complex environments. We introduce the Laplacian Keyboard (LK), a hierarchical framework that goes beyond the linear span. LK constructs a task-agnostic library of options from these eigenvectors, forming a behavior basis guaranteed to contain the optimal policy for any reward within the linear span. A meta-policy learns to stitch these options dynamically, enabling efficient learning of policies outside the original linear constraints. We establish theoretical bounds on zero-shot approximation error and demonstrate empirically that LK surpasses zero-shot solutions while achieving improved sample efficiency compared to standard RL methods.

Siddarth Chandrasekar, Marlos C. Machado• 2026

Related benchmarks

TaskDatasetResultRank
FlipDMC Walker Average of APS, Proto, RND datasets
Mean Return507
3
JumpDMC Quadruped Average of APS, Proto, RND datasets
Mean Return554
3
RunDMC Quadruped Average of APS, Proto, RND datasets
Mean Return366
3
StandDMC Quadruped Average of APS, Proto, RND datasets
Mean Return705
3
WalkDMC Walker Average of APS, Proto, RND datasets
Mean Return890
3
RunDMC Cheetah Average of APS, Proto, RND datasets
Mean Return196
3
RunDMC Walker Average of APS, Proto, RND datasets
Mean Return294
3
Run-BDMC Cheetah Average of APS, Proto, RND datasets
Mean Return188
3
StandDMC Walker Average of APS, Proto, RND datasets
Mean Return635
3
WalkDMC Cheetah Average of APS, Proto, RND datasets
Mean Return709
3
Showing 10 of 12 rows

Other info

Follow for update