DeepMDP: Learning Continuous Latent Space Models for Representation Learning

About

Many reinforcement learning (RL) tasks provide the agent with high-dimensional observations that can be simplified into low-dimensional continuous states. To formalize this process, we introduce the concept of a DeepMDP, a parameterized latent space model that is trained via the minimization of two tractable losses: prediction of rewards and prediction of the distribution over next latent states. We show that the optimization of these objectives guarantees (1) the quality of the latent space as a representation of the state space and (2) the quality of the DeepMDP as a model of the environment. We connect these results to prior work in the bisimulation literature, and explore the use of a variety of metrics. Our theoretical findings are substantiated by the experimental result that a trained DeepMDP recovers the latent structure underlying high-dimensional observations on a synthetic environment. Finally, we show that learning a DeepMDP as an auxiliary task in the Atari 2600 domain leads to large performance improvements over model-free RL.

Carles Gelada, Saurabh Kumar, Jacob Buckman, Ofir Nachum, Marc G. Bellemare• 2019

Related benchmarks

Task	Dataset	Result
Visual Reinforcement Learning	CARLA (#GP scenario)	ER82	15
Autonomous Driving	CARLA (#HW)	Error Rate155	15
Visual Reinforcement Learning	CarRacing v0 (test)	Environment Reward3.56e+5	11
Visual Reinforcement Learning	Atari 10	Assault Score1.26e+3	10
Driving Policy	CARLA JW scenario	Episode Reward146	7
Driving Policy	CARLA HB scenario	Episode Reward101	7
Driving Policy	CARLA HW scenario	Episode Reward182	7
State Abstraction	Multi-Goal	Abstract State Count280	7
State Abstraction	Synth-Fold	Abstract State Count28	7
State Abstraction	Corridor-Rooms-4 (\|S\|=64)	Abstract State Count22	7

Showing 10 of 21 rows

Other info

Follow for update

@wizwand_team Discord