DynaMITE-RL: A Dynamic Model for Improved Temporal Meta-Reinforcement Learning
About
We introduce DynaMITE-RL, a meta-reinforcement learning (meta-RL) approach to approximate inference in environments where the latent state evolves at varying rates. We model episode sessions - parts of the episode where the latent state is fixed - and propose three key modifications to existing meta-RL methods: consistency of latent information within sessions, session masking, and prior latent conditioning. We demonstrate the importance of these modifications in various domains, ranging from discrete Gridworld environments to continuous-control and simulated robot assistive tasks, demonstrating that DynaMITE-RL significantly outperforms state-of-the-art baselines in sample efficiency and inference returns.
Anthony Liang, Guy Tennenholtz, Chih-wei Hsu, Yinlam Chow, Erdem B{\i}y{\i}k, Craig Boutilier• 2024
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Continuous Control | reacher | Average Reward-8.4 | 9 | |
| Locomotion | HalfCheetah Dir | Avg Episode Return-68.5 | 6 | |
| Locomotion | HalfCheetah Vel | Avg Episode Return-146 | 6 | |
| Locomotion | Wind+Vel | Avg Episode Return-42.8 | 6 | |
| Navigation | GridWorld | Avg Episode Return42.9 | 6 | |
| Robot Assistance | ScratchItch | Average Episode Return231.2 | 6 |
Showing 6 of 6 rows