Learning Smooth Humanoid Locomotion through Lipschitz-Constrained Policies
About
Reinforcement learning combined with sim-to-real transfer offers a general framework for developing locomotion controllers for legged robots. To facilitate successful deployment in the real world, smoothing techniques, such as low-pass filters and smoothness rewards, are often employed to develop policies with smooth behaviors. However, because these techniques are non-differentiable and usually require tedious tuning of a large set of hyperparameters, they tend to require extensive manual tuning for each robotic platform. To address this challenge and establish a general technique for enforcing smooth behaviors, we propose a simple and effective method that imposes a Lipschitz constraint on a learned policy, which we refer to as Lipschitz-Constrained Policies (LCP). We show that the Lipschitz constraint can be implemented in the form of a gradient penalty, which provides a differentiable objective that can be easily incorporated with automatic differentiation frameworks. We demonstrate that LCP effectively replaces the need for smoothing rewards or low-pass filters and can be easily integrated into training frameworks for many distinct humanoid robots. We extensively evaluate LCP in both simulation and real-world humanoid robots, producing smooth and robust locomotion controllers. All simulation and deployment code, along with complete checkpoints, is available on our project page: https://lipschitz-constrained-policy.github.io.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Policy Smoothness Evaluation | Footwork | Action Smoothness0.036 | 7 | |
| Policy Smoothness Evaluation | walking | Action Smoothness0.004 | 7 | |
| Policy Smoothness Evaluation | Backflip | Action Smoothness0.195 | 6 | |
| Humanoid Locomotion | Uneven Terrain & Disturbance Configuration Noise Case II | Joint Power976.3 | 4 | |
| Humanoid Locomotion | Uneven Terrain & Disturbance Configurations Noise Case I | Joint Power1.33e+3 | 4 | |
| Robot Locomotion | Uneven Terrain Walking Random Noise Case I (test) | Joint Power2.45e+3 | 4 | |
| Robot Locomotion | Uneven Terrain Walking Random OOD Noise Case II (test) | Joint Power1.77e+3 | 4 |