Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Tracking Drift: Variation-Aware Entropy Scheduling for Non-Stationary Reinforcement Learning

About

Real-world reinforcement learning often faces environment drift, but most existing methods rely on static entropy coefficients/target entropy, causing over-exploration during stable periods and under-exploration after drift (thus slow recovery), and leaving unanswered the principled question of how exploration intensity should scale with drift magnitude. We prove that entropy scheduling under non-stationarity can be reduced to a one-dimensional, round-by-round trade-off, faster tracking of the optimal solution after drift vs. avoiding gratuitous randomness when the environment is stable, so exploration strength can be driven by measurable online drift signals. Building on this, we propose AES (Adaptive Entropy Scheduling), which adaptively adjusts the entropy coefficient/temperature online using observable drift proxies during training, requiring almost no structural changes and incurring minimal overhead. Across 4 algorithm variants, 12 tasks, and 4 drift modes, AES significantly reduces the fraction of performance degradation caused by drift and accelerates recovery after abrupt changes.

Tongxi Wang, Zhuoyang Xia, Xinran Chen, Shan Liu• 2026

Related benchmarks

TaskDatasetResultRank
AntMujoco
Recovery Time (%)5.9
16
2d multi-goalTOY
Recovery Time (%)3.2
8
ANYmalIsaac Gym
Recovery Time6.5
8
FrankaCabinetIsaac Gym
Recovery Time (%)8.5
8
HalfCheetahMujoco
Recovery Time (%) (Abrupt Change)4.4
8
HopperMujoco
Recovery Time (%)4.7
8
HumanoidMujoco
Recovery Time (%)7.5
8
HumanoidIsaac Gym
Recovery Time (%)7.8
8
IngenuityIsaac Gym
Recovery Time7
8
Non-Stationary Reinforcement LearningToy Environments Non-Stationary
nAUC (Steady)1.13
8
Showing 10 of 14 rows

Other info

Follow for update