Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Chebyshev Policies and the Mountain Car Problem: Reinforcement Learning for Low-Dimensional Control Tasks

About

We analytically solve the Mountain Car problem, a canonical benchmark in RL, and derive an optimal control solution, closing a gap after 36 years. This enables us to reveal two surprising insights: The optimal control is quite simple, yet modern RL agents display a large gap to optimality. Motivated by the analysis of the optimal control, we introduce Chebyshev policies as a universal (i.e. dense) class of RL policies from first principles. They can be trained as drop-in replacements of neural nets, reducing the regret by a factor of 4.18, while requiring 277 times fewer parameters, fostering sample efficiency, explainability and realtime capability. Chebyshev policies are evaluated on further RL tasks, including a real-world nonlinear motion control testbed. They consistently improve performance over neural nets with PPO, ARS and REINFORCE. Our results demonstrate how Chebyshev policies offer a compelling and lightweight alternative or addition to neural nets for low-dimensional control tasks.

Stefan Huber, Hannes Unger, Georg Sch\"afer, Jakob Rehrl• 2026

Related benchmarks

TaskDatasetResultRank
Continuous ControlPendulum v1
Average Cumulative Reward-150.8
11
Continuous ControlGymnasium Pendulum v1 (Evaluation over 50 × 50 evenly spaced initial angles)
Return (R)-150.8
10
Continuous ControlMountainCar
Average Return98.74
7
Continuous ControlMountainCarContinuous v0
Return98.74
7
ControlQuanser Aero 2 (simulation)
Average Return-49.2
4
ControlQuanser Aero 2 Real world
Average Return-55.8
4
Pitch ControlQuanser Aero 2 Gymnasium (simulation)
Pitch Deviation (rad)0.0246
4
Pitch ControlQuanser Aero 2 system (real-world evaluation)
Pitch Deviation (rad)0.0279
4
Showing 8 of 8 rows

Other info

Follow for update