Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Hybrid Energy-Aware Reward Shaping: A Unified Lightweight Physics-Guided Methodology for Policy Optimization

About

Deep reinforcement learning excels in continuous control but often requires extensive exploration, while physics-based models demand complete equations and suffer cubic complexity. This study proposes Hybrid Energy-Aware Reward Shaping (H-EARS), unifying potential-based reward shaping with energy-aware action regularization. H-EARS constrains action magnitude while balancing task-specific and energy-based potentials via functional decomposition, achieving linear complexity O(n) by capturing dominant energy components without full dynamics. We establish a theoretical foundation including: (1) functional independence for separate task/energy optimization; (2) energy-based convergence acceleration; (3) convergence guarantees under function approximation; and (4) approximate potential error bounds. Lyapunov stability connections are analyzed as heuristic guides. Experiments across baselines show improved convergence, stability, and energy efficiency. Vehicle simulations validate applicability in safety-critical domains under extreme conditions. Results confirm that integrating lightweight physics priors enhances model-free RL without complete system models, enabling transfer from lab research to industrial applications.

Qijun Liao, Jue Yang, Yiting Kang, Xinxin Zhao, Yong Zhang, Mingan Zhao (2) __INSTITUTION_6__ School of Mechanical Engineering, University of Science, Technology Beijing, China, (2) Jiangsu XCMG Construction Machinery Research Institute Co., Ltd., China)• 2026

Related benchmarks

TaskDatasetResultRank
Reinforcement LearningHopper v5
Average Return3.35e+3
101
Reinforcement LearningAnt v5
Average Return4.18e+3
57
Reinforcement LearningLunarLander v3
Average Agent Reward289
14
Reinforcement LearningHopper v5
Episodes to Threshold 1500830
8
Reinforcement LearningLunarLander v3
Episodes to Threshold (Score 200)290
8
Reinforcement LearningHumanoid v5
Average Returns5.23e+3
8
Reinforcement LearningAnt v5
Coefficient of Variation4.2
8
Reinforcement LearningHopper v5
Coefficient of Variation10.6
8
Reinforcement LearningLunarLander v3
Coefficient of Variation3.2
8
Reinforcement LearningHumanoid v5
Coefficient of Variation (%)6.3
8
Showing 10 of 12 rows

Other info

Follow for update