Tracking Drift: Variation-Aware Entropy Scheduling for Non-Stationary Reinforcement Learning

About

Real-world reinforcement learning often faces environment drift, but most existing methods rely on static entropy coefficients/target entropy, causing over-exploration during stable periods and under-exploration after drift, and leaving unanswered the principled question of how exploration intensity should scale with drift magnitude. We show that, under standard assumptions, entropy scheduling in non-stationary maximum-entropy RL can be cast as the dynamic-regret trade-off between tracking a drifting comparator and stabilizing updates, yielding a square-root scaling rule for the entropy weight in terms of a online non-stationarity proxy. Building on this, we propose AES--Adaptive Entropy Scheduling--which adaptively adjusts the entropy coefficient/temperature online using observable drift proxies during training, requiring almost no structural changes and incurring minimal overhead. Across 4 algorithm variants, 12 tasks, and 4 drift modes, AES significantly reduces the fraction of performance degradation caused by drift and accelerates recovery after abrupt changes.

Tongxi Wang, Zhuoyang Xia, Xinran Chen, Shan Liu• 2026

Related benchmarks

Task	Dataset	Result
Ant	Mujoco	Recovery Time (%)5.9	16
2d multi-goal	TOY	Recovery Time (%)3.2	8
ANYmal	Isaac Gym	Recovery Time6.5	8
FrankaCabinet	Isaac Gym	Recovery Time (%)8.5	8
HalfCheetah	Mujoco	Recovery Time (%) (Abrupt Change)4.4	8
Hopper	Mujoco	Recovery Time (%)4.7	8
Humanoid	Mujoco	Recovery Time (%)7.5	8
Humanoid	Isaac Gym	Recovery Time (%)7.8	8
Ingenuity	Isaac Gym	Recovery Time7	8
Non-Stationary Reinforcement Learning	Toy Environments Non-Stationary	nAUC (Steady)1.13	8

Showing 10 of 14 rows

Other info

Follow for update

@wizwand_team Discord