Constraint-Enhanced Reinforcement Learning Based on Dynamic Decoupled Spherical Radial Squashing

About

When deploying reinforcement learning policies to physical robots, actuator rate constraints -- hard limits on how fast each joint can move per control step -- are unavoidable. These limits vary substantially across joints due to differences in motor inertia, power bandwidth, and transmission stiffness, creating pronounced heterogeneity that existing methods fail to handle geometrically: the per-joint feasible region forms a high-dimensional box in action-increment space, yet QP projection and spherical parameterization methods impose isotropic ball-shaped constraints, exponentially under-covering the true feasible set as heterogeneity grows. This paper proposes Dynamic Decoupled Spherical Radial Squashing (DD-SRad), which resolves this mismatch by computing a position-adaptive radius independently for each actuator, achieving tight alignment with the true per-joint feasible region. DD-SRad satisfies per-step hard constraints with probability~1, preserves well-conditioned gradients throughout training, and admits exact policy gradient backpropagation with zero runtime solver overhead. MuJoCo benchmark experiments demonstrate the highest task return at zero constraint violation -- matching the unconstrained upper bound -- with 30%--50% improvement in constraint-space coverage over spherical baselines. High-fidelity IsaacLab simulations with Unitree H1 and G1 humanoid robots confirm end-to-end optimality parameterized directly from official joint specifications, validating a systematic pathway from hardware datasheets to safe deployment.

Qijun Liao, Zhaoxin Yu, Jue Yang• 2026

Related benchmarks

Task	Dataset	Result
Reinforcement Learning	Ant delta=[0.2^4, 0.5^4], kappa=2.5 v5 (test)	Return4.26e+3	12
Reinforcement Learning	Humanoid (delta=[0.8^6, 0.5^6, 0.2^5], kappa=4.0) v5 (test)	Return5.62e+3	12
Reinforcement Learning	HalfCheetah delta=[0.2^3, 0.5^3], kappa=2.5 v5 (test)	Return4.33e+3	12
Reinforcement Learning	Hopper delta=[0.2, 0.5, 0.5], kappa=2.5 v5 (test)	Return3.31e+3	12
Humanoid Locomotion	IsaacLab Unitree H1 Rough terrain, κ≈2.2	Return37.14	6
Humanoid Locomotion	IsaacLab Unitree G1 Flat terrain, κ=4.0	Return5.47e+3	6
Reinforcement Learning	Ant tight heterogeneous constraints v5 (test)	Return4.26e+3	6
Reinforcement Learning	Humanoid tight heterogeneous constraints v5 (test)	Return5.50e+3	6
Reinforcement Learning	HalfCheetah tight heterogeneous constraints v5 (test)	Return4.33e+3	6
Reinforcement Learning	Hopper tight heterogeneous constraints v5 (test)	Return3.31e+3	6

Showing 10 of 10 rows

Other info

Follow for update

@wizwand_team Discord