Efficient Reinforcement Learning using Linear Koopman Dynamics for Nonlinear Robotic Systems

About

This paper presents a model-based reinforcement learning (RL) framework for optimal closed-loop control of nonlinear robotic systems. The proposed approach learns linear lifted dynamics through Koopman operator theory and integrates the resulting model into an actor-critic architecture for policy optimization, where the policy represents a parameterized closed-loop controller. To reduce computational cost and mitigate model rollout errors, policy gradients are estimated using one-step predictions of the learned dynamics rather than multi-step propagation. This leads to an online mini-batch policy gradient framework that enables policy improvement from streamed interaction data. The proposed framework is evaluated on several simulated nonlinear control benchmarks and two real-world hardware platforms, including a Kinova Gen3 robotic arm and a Unitree Go1 quadruped. Experimental results demonstrate improved sample efficiency over model-free RL baselines, superior control performance relative to model-based RL baselines, and control performance comparable to classical model-based methods that rely on exact system dynamics.

Wenjian Hao, Yuxuan Fang, Zehui Lu, Shaoshuai Mou• 2026

Related benchmarks

Task	Dataset	Result
Control	Surface Vehicle 10 initial states	Time (ms)0.042	4
Inverted Pendulum Balancing	Inverted Pendulum 10 initial states	Time (ms)0.038	4
Quadruped robot reference tracking	Unitree Go1 quadruped robot 10 initial states	Tracking Error0.72	4
Inverted Pendulum Balancing	Inverted Pendulum	95% Convergence Steps32.4	3
Optimal Control	LTI	Time (x10^-5)5.1	3
Optimal Control	Lunar	Execution Time (10^-5 units)3.9	2
Optimal Control	Bipedal	Time (x10^-5 units)4.1	2

Showing 7 of 7 rows

Other info

Follow for update

@wizwand_team Discord