Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Constraint-Enhanced Reinforcement Learning Based on Dynamic Decoupled Spherical Radial Squashing

About

When deploying reinforcement learning policies to physical robots, actuator rate constraints -- hard limits on how fast each joint can move per control step -- are unavoidable. These limits vary substantially across joints due to differences in motor inertia, power bandwidth, and transmission stiffness, creating pronounced heterogeneity that existing methods fail to handle geometrically: the per-joint feasible region forms a high-dimensional box in action-increment space, yet QP projection and spherical parameterization methods impose isotropic ball-shaped constraints, exponentially under-covering the true feasible set as heterogeneity grows. This paper proposes Dynamic Decoupled Spherical Radial Squashing (DD-SRad), which resolves this mismatch by computing a position-adaptive radius independently for each actuator, achieving tight alignment with the true per-joint feasible region. DD-SRad satisfies per-step hard constraints with probability~1, preserves well-conditioned gradients throughout training, and admits exact policy gradient backpropagation with zero runtime solver overhead. MuJoCo benchmark experiments demonstrate the highest task return at zero constraint violation -- matching the unconstrained upper bound -- with 30%--50% improvement in constraint-space coverage over spherical baselines. High-fidelity IsaacLab simulations with Unitree H1 and G1 humanoid robots confirm end-to-end optimality parameterized directly from official joint specifications, validating a systematic pathway from hardware datasheets to safe deployment.

Qijun Liao, Zhaoxin Yu, Jue Yang• 2026

Related benchmarks

TaskDatasetResultRank
Reinforcement LearningAnt delta=[0.2^4, 0.5^4], kappa=2.5 v5 (test)
Return4.26e+3
12
Reinforcement LearningHumanoid (delta=[0.8^6, 0.5^6, 0.2^5], kappa=4.0) v5 (test)
Return5.62e+3
12
Reinforcement LearningHalfCheetah delta=[0.2^3, 0.5^3], kappa=2.5 v5 (test)
Return4.33e+3
12
Reinforcement LearningHopper delta=[0.2, 0.5, 0.5], kappa=2.5 v5 (test)
Return3.31e+3
12
Humanoid LocomotionIsaacLab Unitree H1 Rough terrain, κ≈2.2
Return37.14
6
Humanoid LocomotionIsaacLab Unitree G1 Flat terrain, κ=4.0
Return5.47e+3
6
Reinforcement LearningAnt tight heterogeneous constraints v5 (test)
Return4.26e+3
6
Reinforcement LearningHumanoid tight heterogeneous constraints v5 (test)
Return5.50e+3
6
Reinforcement LearningHalfCheetah tight heterogeneous constraints v5 (test)
Return4.33e+3
6
Reinforcement LearningHopper tight heterogeneous constraints v5 (test)
Return3.31e+3
6
Showing 10 of 10 rows

Other info

Follow for update