DynaMITE-RL: A Dynamic Model for Improved Temporal Meta-Reinforcement Learning

About

We introduce DynaMITE-RL, a meta-reinforcement learning (meta-RL) approach to approximate inference in environments where the latent state evolves at varying rates. We model episode sessions - parts of the episode where the latent state is fixed - and propose three key modifications to existing meta-RL methods: consistency of latent information within sessions, session masking, and prior latent conditioning. We demonstrate the importance of these modifications in various domains, ranging from discrete Gridworld environments to continuous-control and simulated robot assistive tasks, demonstrating that DynaMITE-RL significantly outperforms state-of-the-art baselines in sample efficiency and inference returns.

Anthony Liang, Guy Tennenholtz, Chih-wei Hsu, Yinlam Chow, Erdem B{\i}y{\i}k, Craig Boutilier• 2024

Related benchmarks

Task	Dataset	Result
Continuous Control	reacher	Average Reward-8.4	9
Reinforcement Learning	Cheetah-Wind-E dynamics changes episodic	Average Return-59	8
Reinforcement Learning	Cheetah-Wind-S dynamics changes, time-step	Average Return-63.4	8
Reinforcement Learning	Cheetah-Vel-E reward changes episodic	Average Return-59.4	8
Reinforcement Learning	Cheetah-Dir-E reward changes, episodic	Average Return938.6	8
Reinforcement Learning	Ant-Dir-E reward changes, episodic	Average Return269.3	8
Locomotion	HalfCheetah Dir	Avg Episode Return-68.5	6
Locomotion	HalfCheetah Vel	Avg Episode Return-146	6
Locomotion	Wind+Vel	Avg Episode Return-42.8	6
Navigation	GridWorld	Avg Episode Return42.9	6

Showing 10 of 11 rows

Other info

Follow for update

@wizwand_team Discord