Learning Smooth Humanoid Locomotion through Lipschitz-Constrained Policies

About

Reinforcement learning combined with sim-to-real transfer offers a general framework for developing locomotion controllers for legged robots. To facilitate successful deployment in the real world, smoothing techniques, such as low-pass filters and smoothness rewards, are often employed to develop policies with smooth behaviors. However, because these techniques are non-differentiable and usually require tedious tuning of a large set of hyperparameters, they tend to require extensive manual tuning for each robotic platform. To address this challenge and establish a general technique for enforcing smooth behaviors, we propose a simple and effective method that imposes a Lipschitz constraint on a learned policy, which we refer to as Lipschitz-Constrained Policies (LCP). We show that the Lipschitz constraint can be implemented in the form of a gradient penalty, which provides a differentiable objective that can be easily incorporated with automatic differentiation frameworks. We demonstrate that LCP effectively replaces the need for smoothing rewards or low-pass filters and can be easily integrated into training frameworks for many distinct humanoid robots. We extensively evaluate LCP in both simulation and real-world humanoid robots, producing smooth and robust locomotion controllers. All simulation and deployment code, along with complete checkpoints, is available on our project page: https://lipschitz-constrained-policy.github.io.

Zixuan Chen, Xialin He, Yen-Jen Wang, Qiayuan Liao, Yanjie Ze, Zhongyu Li, S. Shankar Sastry, Jiajun Wu, Koushil Sreenath, Saurabh Gupta, Xue Bin Peng• 2024

Related benchmarks

Task	Dataset	Result
Policy Smoothness Evaluation	Footwork	Action Smoothness0.036	7
Policy Smoothness Evaluation	walking	Action Smoothness0.004	7
Rotational Walking	Real Robot TOCABI Rotational Walking	Joint Velocity1.0993	6
Policy Smoothness Evaluation	Backflip	Action Smoothness0.195	6
Humanoid Locomotion	Walking motion random velocity command 1,024 environments	Action Jitter3.52	5
Humanoid Locomotion	Uneven Terrain & Disturbance Configuration Noise Case II	Joint Power976.3	4
Humanoid Locomotion	Uneven Terrain & Disturbance Configurations Noise Case I	Joint Power1.33e+3	4
Robot Locomotion	Uneven Terrain Walking Random Noise Case I (test)	Joint Power2.45e+3	4
Robot Locomotion	Uneven Terrain Walking Random OOD Noise Case II (test)	Joint Power1.77e+3	4
Forward + Rotational Walking	TOCABI real humanoid robot Forward + Rotational Walking (vx = 0.1, wyaw = 0.2)	Joint Velocity1.3514	3

Showing 10 of 12 rows

Other info

Follow for update

@wizwand_team Discord