Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

AdaGamma: State-Dependent Discounting for Temporal Adaptation in Reinforcement Learning

About

The discount factor in reinforcement learning controls both the effective planning horizon and the strength of bootstrapping, yet most deep RL methods use a single fixed value across all states. While state-dependent discounting is conceptually appealing, naive deep actor--critic implementations can become unstable and degenerate toward TD-error collapse. We propose AdaGamma, a practical deep actor--critic method for state-dependent discounting that learns a state-dependent discount function together with a return-consistency objective to regularize the induced backup structure. On the theory side, we analyze the Bellman operator induced by state-dependent discounting and establish its basic well-posedness properties under suitable conditions. Empirically, AdaGamma integrates into both SAC and PPO, yielding consistent improvements on continuous-control benchmarks, and achieves statistically significant gains in an online A/B test on the JD Logistics platform. These results suggest that state-dependent discounting can be made effective in deep RL when coupled with a return-consistency objective that prevents degenerate target manipulation.

Yaomin Wang, Jianting Pan, Ran Tian, Xiaoyang Li, Yu Zhang, Hengle Qin, Tianshu YU• 2026

Related benchmarks

TaskDatasetResultRank
Reinforcement LearningMountainCarContinuous v0
Average Agent Reward94.6
65
Reinforcement LearningAcrobot v1
Mean Return-82.61
42
Reinforcement LearningAnt v4
Average Return3.77e+3
18
Reinforcement LearningCartPole v1
Return500
16
Reinforcement LearningHumanoid v4
Reward457
9
High-Dimensional ControlSafetyPointGoal1 v0 (test)
Reward28.25
8
High-Dimensional LocomotionHumanoid v4 (test)
Reward6.91e+3
8
High-Dimensional LocomotionAnt v4 (test)
Reward4.13e+3
8
Safety Reinforcement LearningSafetyPointGoal1 v0
Reward27.45
8
Reinforcement LearningPendulum v1
Reward-58.557
4
Showing 10 of 10 rows

Other info

Follow for update