Tracking Drift: Variation-Aware Entropy Scheduling for Non-Stationary Reinforcement Learning
About
Real-world reinforcement learning often faces environment drift, but most existing methods rely on static entropy coefficients/target entropy, causing over-exploration during stable periods and under-exploration after drift, and leaving unanswered the principled question of how exploration intensity should scale with drift magnitude. We show that, under standard assumptions, entropy scheduling in non-stationary maximum-entropy RL can be cast as the dynamic-regret trade-off between tracking a drifting comparator and stabilizing updates, yielding a square-root scaling rule for the entropy weight in terms of a online non-stationarity proxy. Building on this, we propose AES--Adaptive Entropy Scheduling--which adaptively adjusts the entropy coefficient/temperature online using observable drift proxies during training, requiring almost no structural changes and incurring minimal overhead. Across 4 algorithm variants, 12 tasks, and 4 drift modes, AES significantly reduces the fraction of performance degradation caused by drift and accelerates recovery after abrupt changes.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Ant | Mujoco | Recovery Time (%)5.9 | 16 | |
| 2d multi-goal | TOY | Recovery Time (%)3.2 | 8 | |
| ANYmal | Isaac Gym | Recovery Time6.5 | 8 | |
| FrankaCabinet | Isaac Gym | Recovery Time (%)8.5 | 8 | |
| HalfCheetah | Mujoco | Recovery Time (%) (Abrupt Change)4.4 | 8 | |
| Hopper | Mujoco | Recovery Time (%)4.7 | 8 | |
| Humanoid | Mujoco | Recovery Time (%)7.5 | 8 | |
| Humanoid | Isaac Gym | Recovery Time (%)7.8 | 8 | |
| Ingenuity | Isaac Gym | Recovery Time7 | 8 | |
| Non-Stationary Reinforcement Learning | Toy Environments Non-Stationary | nAUC (Steady)1.13 | 8 |