Meta-reinforcement learning with minimum attention
About
Minimum attention applies the least action principle to changes of control concerning state and time, first proposed by Brockett. The involved regularization is highly relevant in emulating biological control, such as motor learning. We apply minimum attention in reinforcement learning (RL) as part of the rewards and investigate its connection to meta-learning and stabilization. Specifically, model-based meta-learning with minimum attention is explored in high-dimensional nonlinear dynamics. Ensemble-based model learning and gradient-based meta-policy learning are alternately performed. Empirically, the minimum attention does show outperforming competence in comparison to the state-of-the-art algorithms of model-free and model-based RL, i.e., fast adaptation in few shots and variance reduction from the perturbations of the model and environment. Furthermore, the minimum attention demonstrates an improvement in energy efficiency.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Reinforcement Learning | Humanoid | Zero-Shot Reward2.55e+3 | 32 | |
| Continuous Control | HalfCheetah v1 (train) | Max Average Return9.72e+3 | 9 | |
| Reinforcement Learning | HalfCheetah Meta (train) | Reward9.72e+3 | 4 | |
| Continuous Control | Half-Cheetah (meta-test) | Total Reward6.82e+3 | 2 | |
| Continuous Control | Hopper (meta-train) | Total Reward2.83e+3 | 2 | |
| Continuous Control | Hopper (meta-test) | Total Reward485 | 2 | |
| Continuous Control | Walker2D (meta-train) | Total Reward3.04e+3 | 2 | |
| Continuous Control | Walker2D meta (test) | Total Reward1.12e+3 | 2 | |
| Continuous Control | Humanoid (meta-test) | Total Reward480 | 2 | |
| Reinforcement Learning | HalfCheetah Meta-test crippled-back (test) | Reward6.82e+3 | 2 |