Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Hybrid Energy-Aware Reward Shaping: A Unified Lightweight Physics-Guided Methodology for Policy Optimization

About

Deep reinforcement learning for continuous control often suffers from high variance, low energy efficiency, and poor generalization under distribution shift, as purely data-driven exploration ignores available physical structure. This paper proposes Hybrid Energy-Aware Reward Shaping (H-EARS), which encodes dominant energy terms -- assumed known a priori -- directly as reward potentials at O(n) per-step computation. H-EARS decomposes the shaping potential into task-oriented and energy-based components, supplemented by an action regularization term that deliberately modifies the optimization objective to enforce energy-efficient control. A complete theoretical foundation is established: functional independence of shaping and regularization, energy-based gradient enrichment under positive-definite Hessian conditions, convergence guarantees under function approximation, and approximate potential error bounds. Across four continuous control benchmarks and four baseline algorithms, H-EARS achieves consistent gains in convergence speed, policy stability, and final performance. High-fidelity vehicle simulations validate applicability in safety-critical settings under extreme road conditions.

Qijun Liao, Jue Yang, Yiting Kang, Xinxin Zhao, Yong Zhang, Mingan Zhao• 2026

Related benchmarks

TaskDatasetResultRank
Reinforcement LearningHopper v5
Average Return3.35e+3
101
Reinforcement LearningAnt v5
Average Return4.18e+3
57
Reinforcement LearningLunarLander v3
Average Agent Reward289
14
Reinforcement LearningHopper v5
Episodes to Threshold 1500830
8
Reinforcement LearningLunarLander v3
Episodes to Threshold (Score 200)290
8
Reinforcement LearningHumanoid v5
Average Returns5.23e+3
8
Reinforcement LearningAnt v5
Coefficient of Variation4.2
8
Reinforcement LearningHopper v5
Coefficient of Variation10.6
8
Reinforcement LearningLunarLander v3
Coefficient of Variation3.2
8
Reinforcement LearningHumanoid v5
Coefficient of Variation (%)6.3
8
Showing 10 of 12 rows

Other info

Follow for update