Chebyshev Policies and the Mountain Car Problem: Reinforcement Learning for Low-Dimensional Control Tasks
About
We analytically solve the Mountain Car problem, a canonical benchmark in RL, and derive an optimal control solution, closing a gap after 36 years. This enables us to reveal two surprising insights: The optimal control is quite simple, yet modern RL agents display a large gap to optimality. Motivated by the analysis of the optimal control, we introduce Chebyshev policies as a universal (i.e. dense) class of RL policies from first principles. They can be trained as drop-in replacements of neural nets, reducing the regret by a factor of 4.18, while requiring 277 times fewer parameters, fostering sample efficiency, explainability and realtime capability. Chebyshev policies are evaluated on further RL tasks, including a real-world nonlinear motion control testbed. They consistently improve performance over neural nets with PPO, ARS and REINFORCE. Our results demonstrate how Chebyshev policies offer a compelling and lightweight alternative or addition to neural nets for low-dimensional control tasks.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Continuous Control | Pendulum v1 | Average Cumulative Reward-150.8 | 11 | |
| Continuous Control | Gymnasium Pendulum v1 (Evaluation over 50 × 50 evenly spaced initial angles) | Return (R)-150.8 | 10 | |
| Continuous Control | MountainCar | Average Return98.74 | 7 | |
| Continuous Control | MountainCarContinuous v0 | Return98.74 | 7 | |
| Control | Quanser Aero 2 (simulation) | Average Return-49.2 | 4 | |
| Control | Quanser Aero 2 Real world | Average Return-55.8 | 4 | |
| Pitch Control | Quanser Aero 2 Gymnasium (simulation) | Pitch Deviation (rad)0.0246 | 4 | |
| Pitch Control | Quanser Aero 2 system (real-world evaluation) | Pitch Deviation (rad)0.0279 | 4 |