Chebyshev Policies and the Mountain Car Problem: Reinforcement Learning for Low-Dimensional Control Tasks

About

We analytically solve the Mountain Car problem, a canonical benchmark in RL, and derive an optimal control solution, closing a gap after 36 years. This enables us to reveal two surprising insights: The optimal control is quite simple, yet modern RL agents display a large gap to optimality. Motivated by the analysis of the optimal control, we introduce Chebyshev policies as a universal (i.e. dense) class of RL policies from first principles. They can be trained as drop-in replacements of neural nets, reducing the regret by a factor of 6.18, while requiring 277 times fewer parameters, fostering sample efficiency, explainability and realtime capability. Chebyshev policies are evaluated on further RL tasks, including a real-world nonlinear motion control testbed. They consistently improve performance over neural nets with PPO, ARS and REINFORCE. Our results demonstrate how Chebyshev policies offer a compelling and lightweight alternative or addition to neural nets for low-dimensional control tasks.

Stefan Huber, Hannes Unger, Georg Sch\"afer, Jakob Rehrl• 2026

Related benchmarks

Task	Dataset	Result
Continuous Control	Pendulum v1	Average Cumulative Reward-150.8	11
Continuous Control	Gymnasium Pendulum v1 (Evaluation over 50 × 50 evenly spaced initial angles)	Return (R)-150.8	10
Continuous Control	MountainCar	Average Return98.74	7
Continuous Control	MountainCarContinuous v0	Return98.74	7
Control	Quanser Aero 2 (simulation)	Average Return-49.2	4
Control	Quanser Aero 2 Real world	Average Return-55.8	4
Pitch Control	Quanser Aero 2 Gymnasium (simulation)	Pitch Deviation (rad)0.0246	4
Pitch Control	Quanser Aero 2 system (real-world evaluation)	Pitch Deviation (rad)0.0279	4

Showing 8 of 8 rows

Other info

Follow for update

@wizwand_team Discord