Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

DynaMITE-RL: A Dynamic Model for Improved Temporal Meta-Reinforcement Learning

About

We introduce DynaMITE-RL, a meta-reinforcement learning (meta-RL) approach to approximate inference in environments where the latent state evolves at varying rates. We model episode sessions - parts of the episode where the latent state is fixed - and propose three key modifications to existing meta-RL methods: consistency of latent information within sessions, session masking, and prior latent conditioning. We demonstrate the importance of these modifications in various domains, ranging from discrete Gridworld environments to continuous-control and simulated robot assistive tasks, demonstrating that DynaMITE-RL significantly outperforms state-of-the-art baselines in sample efficiency and inference returns.

Anthony Liang, Guy Tennenholtz, Chih-wei Hsu, Yinlam Chow, Erdem B{\i}y{\i}k, Craig Boutilier• 2024

Related benchmarks

TaskDatasetResultRank
Continuous Controlreacher
Average Reward-8.4
9
Reinforcement LearningCheetah-Wind-E dynamics changes episodic
Average Return-59
8
Reinforcement LearningCheetah-Wind-S dynamics changes, time-step
Average Return-63.4
8
Reinforcement LearningCheetah-Vel-E reward changes episodic
Average Return-59.4
8
Reinforcement LearningCheetah-Dir-E reward changes, episodic
Average Return938.6
8
Reinforcement LearningAnt-Dir-E reward changes, episodic
Average Return269.3
8
LocomotionHalfCheetah Dir
Avg Episode Return-68.5
6
LocomotionHalfCheetah Vel
Avg Episode Return-146
6
LocomotionWind+Vel
Avg Episode Return-42.8
6
NavigationGridWorld
Avg Episode Return42.9
6
Showing 10 of 11 rows

Other info

Follow for update